Responsibilities:
Develop, scale, and manage our infrastructure using automation and configuration management tools.
Participate in incident response, diagnosis, and follow-up on system outages and alerts across our production environment.
Measure and optimize system performance.
Plan for the growth of the infrastructure.
Work closely with development teams to ensure that platforms are designed with "operability" in mind.
Qualifications:
Bachelor’s degree in Computer Science, Engineering or relevant field.
Experience with cloud services (AWS, Google Cloud Platform, Azure).
Strong experience with databases, networks (design and implementation), and patch management.
Knowledge of scripting languages such as Python, Perl, or Ruby.
Deep understanding of monitoring solutions for all layers of web infrastructure.
A mindset towards building systems that are resilient and maintainable.
Teamwork skills, flexibility, and a willingness to handle on-call duties.