Never miss a job!

2024-03-05

Senior Site Reliability Engineer

We're hiring engineers at multiple levels from Senior to Principal - fully remote within US & Canada!

Who we are

At Gretel, our mission is to build the world’s first developer platform for synthetic data. Our platform solves the data bottleneck problem for developers, data scientists, and AI/ML researchers across multiple modalities including tabular, time-series, relational, language and image. Gretel's APIs automatically fine-tune AI models to generate synthetic data on-demand while protecting privacy and maintaining the utility and accuracy of the original data.

As a Site Reliability Engineer (SRE) at Gretel you will ensure the safety, security, and reliability of our cloud infrastructure. This includes our compute infrastructure, container orchestration platform, deployment pipelines, and observability stack.

What you will do

Build and maintain Gretel's observability stack. Measure and monitor Gretel's availability, latency, and overall system health
Scale systems sustainably with automation and continuously improve and evolve systems
Manage and lead incident response, recovery, and blameless postmortems
Partner with software engineers to troubleshoot production issues
Build tools and frameworks that help Gretel engineers be more productive
Ship complex ML/AI models in partnership with Gretel's applied science and engineering teams

Minimum Qualifications

Experience with at least one cloud platform (we use AWS heavily)
Experience with Docker and Kubernetes
Ability to write software and tools in Python or Go
Experience with monitoring, alerting and operations
Experience operating highly available distributed systems in the cloud
Experience identifying, diagnosing, and responding to operational outages

Preferred Qualifications

Experience with infrastructure as code (Terraform, CloudFormation, etc)
Experience with build systems such as Bazel
Experiencing shipping application with complex dependencies (Pytorch, Tensorflow)
Software engineering skills beyond script writing (TDD, design patterns, etc)
Experience with DevOps or CI/CD pipelines

Apply

Never miss a job!

Senior Site Reliability Engineer

We're hiring engineers at multiple levels from Senior to Principal - fully remote within US & Canada!

What you will do

Minimum Qualifications

Preferred Qualifications

You must be logged in to apply for this job

Please let Gretel know that you found this role at devopsprojectshq.com as a way to support us,
so we can keep providing you with awesome DevOps jobs.

Similar Jobs

Staff Software Engineer, Backend Cloud Platforms

Staff Software Engineer, Backend and Cloud

Senior DevOps Engineer

Senior Data Platform Engineer (Remote)

Ready to land your dream job?

Create your profile and let companies find you!

Built and hosted in the EU 🇪🇺 we keep your data safe

Never miss a job!

Senior Site Reliability Engineer

We're hiring engineers at multiple levels from Senior to Principal - fully remote within US & Canada!

What you will do

Minimum Qualifications

Preferred Qualifications

You must be logged in to apply for this job

Please let Gretel know that you found this role at devopsprojectshq.com as a way to support us, so we can keep providing you with awesome DevOps jobs.

Similar Jobs

Staff Software Engineer, Backend Cloud Platforms

Staff Software Engineer, Backend and Cloud

Senior DevOps Engineer

Senior Data Platform Engineer (Remote)

Ready to land your dream job?

Create your profile and let companies find you!

Built and hosted in the EU 🇪🇺 we keep your data safe

Please let Gretel know that you found this role at devopsprojectshq.com as a way to support us,
so we can keep providing you with awesome DevOps jobs.