Never miss a job!

Join 1,800+ DevOps engineers getting weekly alerts for remote and US, EU roles that don't show up on the big boards. Junior to senior. Kubernetes, AWS, Terraform — filtered for your stack.

🇪🇺 Secure Your EU Traffic

Ensure digital sovereignty for your infrastructure. Get EU static IPs with full data residency for compliance and peace of mind.

🇪🇺 Get an EU IP with OutboundGateway → GDPR-compliant • Static IPs • EU Data Residency

2024-06-25

Site Reliability Engineer (UK)

The Walt Disney Company

↗

Entertainment Providers 👥 10001 employees 📍 Burbank, CA, US

The Walt Disney Company, together with its subsidiaries and affiliates, is a leading diversified international family entertainment and media enterprise that includes three core business segments: Di…

Description

At WALT Labs, we are committed to empowering businesses to leverage the transformative power of cloud technology, facilitating innovation and operational efficiency. Specializing in managed services across Google Cloud Platform (GCP) and Amazon Web Services (AWS), we seek a dedicated Site Reliability Engineer (SRE) who is passionate about technology, excels in problem-solving, and is dedicated to providing unparalleled customer service. You will become the SME to the scale, resiliency and uptime of our own and the customer environments we support.

Role Summary

As a critical member of our team, the SRE will provide technical support and expertise to our managed services clients. This role involves diagnosing and resolving complex issues across diverse cloud environments and technologies, ensuring high performance and reliability. The ideal candidate is a tech enthusiast, eager to expand their knowledge and skills daily, committed to problem-solving and delivering customer-focused solutions within defined Service Level Agreement (SLA) guidelines.

Key Responsibilities:

Ensure high availability and reliability of software systems and infrastructure. Building out SLOs & SLAs and constantly improving reliability of systems.
Design, implement, and maintain monitoring and alerting systems to detect and address issues proactively, using mainly Datadog, GCP Cloud Monitoring and Pagerduty/Incident.io.
Debug and troubleshoot production issues across various customer environments, technology stacks, and cloud providers, primarily focusing on GCP and AWS.
Participate in an on-call rotation to respond to and resolve production incidents and conduct RCAs/Post Mortems to identify and address issues.
Develop and maintain runbooks and playbooks for incident response and troubleshooting.
Proactively optimize systems and application environments to identify bottlenecks and areas of improvements.
Conduct load testing and capacity planning to ensure systems can handle expected traffic and growth.
Develop and maintain IaC (Terraform) and Configuration Management (Ansible, Helm as examples)
Work closely with development teams to understand system architecture, identify potential reliability risks, and implement solutions.
Collaborate with operations teams to ensure smooth deployment and operation of software systems.
Master a broad range of technologies, including but not limited to VMs, container orchestration, networking, security, databases, data warehouses, serverless technologies, and storage solutions.
Proficiently deploy applications into Kubernetes using Helm, and manage Kubernetes administration and troubleshooting.
Provide direct support to clients during production outages, offering expert assistance to swiftly rectify issues, adhering to SLA expectations.
Diligently document solutions and processes, constantly seeking to improve knowledge, skills, and operational efficiency.

Requirements

3+ years experience in an SRE role
From your core you understand how important SLOs, SLIs and KPIs are to the systems you support, using observability to be your grounding point on a daily basis.
Extensive knowledge of all major services in GCP (Cloud Run, BigQuery, GKE etc)
In-depth knowledge of all major services in AWS
Experience in setting up and managing monitoring solutions like Datadog, Google Cloud Operations Suite, Cloudwatch, Nagios, and Zabbix.
Familiarity with various CI/CD systems (Jenkins, Codefresh, GitLab CI, GitHub Actions, Argo CD).
Exceptional problem-solving capabilities, the ability to work under pressure, and strong critical thinking skills.
Be the voice and commander of incidents managed internally and externally to customers

A passion for technology and an unquenchable thirst for learning new skills.
A customer-focused mindset, dedicated to delivering the highest level of service.

Benefits

Paid private medical insurance
PTO policy that increases with longevity (1.5 days every 2 years)
Professional development and advancement opportunities
Bonus incentives

Apply

Please let The Walt Disney Company know that you found this role at devopsprojectshq.com as a way to support us,
so we can keep providing you with awesome DevOps jobs.

💼 Upgrade to Premium

Get instant access to exclusive DevOps jobs with €120K+ salaries

Monthly

€16.50/month

Best value for job search

✓ Access to premium jobs
✓ Priority support
✓ Early access to new jobs

Get Started

Best Value

Yearly

€49.50/year

Only €4.13/month - Save 75%

✓ Everything in Monthly
✓ Maximum savings
✓ Best long-term value

Get Started

View All Plans & Features

Never miss a job!

🇪🇺 Secure Your EU Traffic

Site Reliability Engineer (UK)

The Walt Disney Company

Description

Role Summary

Key Responsibilities:

Requirements

Benefits

You must be logged in to apply for this job

Please let The Walt Disney Company know that you found this role at devopsprojectshq.com as a way to support us,
so we can keep providing you with awesome DevOps jobs.

💼 Upgrade to Premium

Monthly

Yearly

Similar Jobs

Site Reliability Engineer (US)

DevOps/SRE Engineer

On-Demand DevOps Engineer

DevOps Engineer - ClearlyAgile

Built and hosted in the EU 🇪🇺 we keep your data safe

Never miss a job!

🇪🇺 Secure Your EU Traffic

Site Reliability Engineer (UK)

The Walt Disney Company

Description

Role Summary

Key Responsibilities:

Requirements

Benefits

You must be logged in to apply for this job

Please let The Walt Disney Company know that you found this role at devopsprojectshq.com as a way to support us, so we can keep providing you with awesome DevOps jobs.

💼 Upgrade to Premium

Monthly

Yearly

Similar Jobs

Site Reliability Engineer (US)

DevOps/SRE Engineer

On-Demand DevOps Engineer

DevOps Engineer - ClearlyAgile

Built and hosted in the EU 🇪🇺 we keep your data safe

Someone Just Upgraded!

Please let The Walt Disney Company know that you found this role at devopsprojectshq.com as a way to support us,
so we can keep providing you with awesome DevOps jobs.