Join our team as a Site Reliability Engineer - Infra at Taboola. This engineering position is crucial for maintaining the high reliability, scalability, and performance of our technology systems. Ideal for candidates interested in remote work opportunities and passionate about advanced technology stacks.
Key Responsibilities
Ensure the reliability, availability, and performance of our infrastructure services.
Manage and maintain our Kubernetes infrastructure, including KubeVirt.
Design, implement, and maintain our monitoring and observability stack (SensuGo, VictoriaMetrics, Prometheus, ELK).
Automate infrastructure provisioning, configuration, and deployment processes using Puppet and Ansible.
Manage and maintain core services such as DNS and networking.
Troubleshoot and resolve complex infrastructure issues in a timely and efficient manner.
Participate in on-call rotations and incident response.
Develop and maintain infrastructure-as-code (IaC).
Identify and implement proactive measures to prevent incidents and improve system reliability.
Collaborate with development teams to ensure smooth and reliable deployments.
Contribute to the design and implementation of new infrastructure solutions.
Drive improvements in system architecture, processes, and tools.
Mentor and coach other team members.
Required Qualifications
5+ years of experience in a Site Reliability Engineering, Systems Engineering, or similar role.
Deep understanding of Site Reliability Engineering principles and practices.
Extensive experience with Kubernetes, including deployment, management, and troubleshooting.
Strong experience with monitoring and observability tools such as SensuGo, Zabbix, VictoriaMetrics, Prometheus, and ELK.
Proficiency in configuration management tools such as Puppet and Ansible.
Solid understanding of Linux internals and networking.
Experience with managing and maintaining core services such as DNS and networking.
Strong programming skills in Python and/or Go.
Experience with both on-premises and cloud environments.
Experience with KubeVirt.
Excellent troubleshooting and problem-solving skills.
Strong communication and collaboration skills.
Ability to work in a fast-paced, dynamic environment.
Ability to participate in on-call rotations including weekends.
Preferred Qualifications
Experience with large-scale, distributed systems.
Experience with other cloud providers (e.g., AWS, Azure, GCP).
Contributions to open-source projects.
About Taboola
At Taboola, our team members are empowered to harness their potential while growing and learning alongside smart and talented colleagues in a supportive environment.
Benefits & Perks
As part of our commitment to our team's success and well-being, Taboola offers competitive benefits and perks, fostering a productive and enjoyable work environment.
Please let Taboola know that you found this role at devopsprojectshq.com as a way to support us, so we can keep providing you with awesome DevOps jobs.