Join Ververica GmbH as a Site Reliability Engineer (m/f/d), a pivotal engineering position that focuses on maintaining and enhancing the infrastructure of our cutting-edge Unified Streaming Data Platform. This role is ideal for a skilled software engineer interested in working with advanced technologies in a remote work environment.
Key Responsibilities
Build and maintain the infrastructure for Ververica’s Unified Streaming Data Platform across AWS, GCP, and Azure.
Design and manage Infrastructure as Code (IaC) using Terraform, ensuring modularity, reusability, and best practices.
Implement and enhance observability tooling, including Grafana, Prometheus, logging systems, traces, metrics, dashboards, and alerts.
Ensure system reliability through SRE best practices, including defining SLIs, SLOs, and error budgets.
Improve infrastructure architecture and engineering efficiency through continuous evaluation and optimization.
Enhance CI/CD pipelines to automate development workflows.
Monitor, identify, and resolve security vulnerabilities (CVE updates and security enhancements).
Contribute to the successful development and launch of new products, features, and services.
Periodically participate in on-call rotations to manage incidents in a 24/7 live infrastructure.
Maintain and update documentation, including architectural designs and changes.
Required Qualifications
Bachelor’s degree in Computer Science, Information Technology, or a related field.
Minimum 2 years of hands-on experience with Kubernetes clusters, Helm charts, controllers, and operators.
Proficiency in designing and maintaining Terraform code with best practices.
Strong knowledge of observability tools and practices, including metrics, logging, and alerting systems.
Experience implementing SRE principles such as SLIs, SLOs, and error budgets.
Solid understanding of Linux systems and networking in cloud environments.
Familiarity with distributed systems or streaming data platforms.
Knowledge of cloud-native security best practices.
About Ververica GmbH
Ververica GmbH, founded by the original creators of Apache Flink™, empowers businesses to unlock the full potential of real-time data processing and analytics. Our platform provides cutting-edge stream processing and event-driven applications, enabling companies worldwide to build scalable and reliable data-driven solutions.
Benefits & Perks
As a Site Reliability Engineer (m/f/d) at Ververica, you will enjoy the flexibility of remote work, a collaborative and innovative work environment, and the opportunity to work with a dynamic technology stack in a growing engineering field.
Please let Ververica GmbH know that you found this role at devopsprojectshq.com as a way to support us, so we can keep providing you with awesome DevOps jobs.