Responsibilities:
Ensure high availability and performance of critical production systems.
Develop automation tools for efficient management and operation of infrastructure.
Work with development teams to design and implement scalable, resilient systems.
Participate in on-call rotations, troubleshoot and resolve production issues.
Conduct post-mortem analysis to prevent recurrence of incidents.
Qualifications:
Bachelor’s degree in Computer Science, Information Systems, or equivalent work experience.
Experience with cloud computing platforms (AWS, GCP, Azure) and containerization technologies (Docker, Kubernetes).
Proficiency in scripting languages (Python, Shell, Ruby) for automation.
Strong understanding of network fundamentals, security practices, and system administration.
Excellent problem-solving skills and the ability to work under pressure.