The Site Reliability Engineering team architects, builds, and maintains the rock-solid infrastructure applications rely on. We work hand-in-hand with development teams to ensure scalability, reliability, and efficiency. This collaboration empowers us to deliver exceptional experiences for our customers and helps developers focus on building great features.
What You Will Do
Deploying, automating, maintaining, and managing various cloud-based and on-prem production system
Ability to lead and manage project to achieve targets and deliverables from stakeholders. This includes roadmap prioritization and task definition writing.
Understanding the high-level overview of our architecture, and possessing the ability to systematically document new and existing requirements to ensure a smooth project delivery without miscommunication.
Work closely with the Information security team in ensuring that we are adopting security best practices.
Ensuring the availability, performance, scalability, and security of productions systems
Handle operational day-to-day tasks like being on-call, handling alert, and incident. Also able to make improvement to the process to ensure smooth operational handling in the team
What We Are Looking For
At least 4 years of engineering experience.
BS/MS Degree in any relevant major. (ex: IT, Computer Science, etc) OR a proven track record in DevOps
Desire to get into the engineering management track
Experience with managing cloud servers (AWS, GCP)
Experience with cloud-native tools (Kubernetes, Docker, Nginx, OpenTelemetry)
Experience with databases and storage servers (MySQL, Postgres, and Redis)
Experience with Infrastructure as a Code tools (Terraform, Pulumi)
Experience with managing on-prem physical servers is a big plus
Managing network system in the infrastructure is a big plus