Never miss a job!

2024-04-03

Staff Site Reliability Engineer (SRE)

About us

Character’s mission is to empower everyone with AGI. Our vision is to enable people with our technology so that they can use Character.AI any moment of any day.

Character.AI is one of the world’s leading personal AI platforms. Founded in 2021 by AI pioneers Noam Shazeer and Daniel De Freitas, Character.AI is a full-stack AI company with a globally scaled direct-to-consumer platform. As of 2023 that platform was #2 in the space in user engagement. Character.AI is uniquely centered around people, letting users personalize their experience by interacting with AI “Characters.” The company achieved unicorn status in 2023 and was named Google Play’s AI App of the Year.

Noam co-invented the key tech powering LLMs and was recently named to TIME100’s Most Influential People in AI list. TIME called him “one of the most important and impactful people of the space’s past, present, and future.” Daniel created and led LaMDA, the breakthrough conversational tech project currently powering Bard.

To learn more, please visit beta.character.ai.

About the role

The Role:

As the founding member of our DevOps/Site Reliability Engineer function here at Character, you’ll have the opportunity to support our infrastructure with thousands of nodes, terabytes of data and millions of daily active users on our site. You’ll be responsible for ensuring our product's reliability, scalability, and performance as we aggressively grow our user base, with a goal of growing to 3 billion users. Work closely with our development team to design and implement processes and systems that ensure the stability and availability of our service.

Specific Responsibilities:

Maintain production services and keep them operational.
Develop tools, Instrumentation and automation to monitor and optimize the performance and reliability of our service.
Develop, implement and maintain automation tools and processes to prevent and mitigate service disruptions.
Collaborate with development teams to design and implement scalable, reliable systems, CI/CD processes for deployment.
Establish and support SLAs and SLOs for our site
Provide system monitoring and incident alerts
Participate in on-call rotations to provide support for critical incidents and outages.
Develop plans for site reliability and disaster recovery

Job Requirements:

5+ years of experience in a development focused DevOps/SRE role within a technology organization that has significant scale
Deep experience with and proven success in developing software tools and automation wherever needed using Python and Golang
Expertise with SQL, Linux, CI/CD, Kubernetes, Terraform to support a site/application within a large multi node infrastructure and a growing user base.
Experience working with multiple cloud computing platforms such as GCP is also a must
Demonstrated experience to successfully and reliably troubleshoot technical issues and challenges across a range of platforms and systems
Experience with incident management and event postmortems

Desired Experience:

Familiarity with GPU clusters and/or HPC environments is preferred
Experience with monitoring and logging tools such as Prometheus and Grafana
Hands-on experience scaling a consumer product from early days into hypergrowth

Character is an equal opportunity employer and does not discriminate on the basis of race, religion, national origin, gender, sexual orientation, age, veteran status, disability or any other legally protected status. We value diversity and encourage applicants from a range of backgrounds to apply.

Apply

Please let Character.AI know that you found this role at devopsprojectshq.com as a way to support us,
so we can keep providing you with awesome DevOps jobs.

Never miss a job!

Staff Site Reliability Engineer (SRE)

About us

About the role

You must be logged in to apply for this job

Please let Character.AI know that you found this role at devopsprojectshq.com as a way to support us,
so we can keep providing you with awesome DevOps jobs.

Similar Jobs

Full-stack Software Engineer

Site Reliability Engineer

Senior DevOps Engineer

Senior Data Platform Engineer (Remote)

Ready to land your dream job?

Create your profile and let companies find you!

Built and hosted in the EU 🇪🇺 we keep your data safe

Never miss a job!

Staff Site Reliability Engineer (SRE)

About us

About the role

You must be logged in to apply for this job

Please let Character.AI know that you found this role at devopsprojectshq.com as a way to support us, so we can keep providing you with awesome DevOps jobs.

Similar Jobs

Full-stack Software Engineer

Site Reliability Engineer

Senior DevOps Engineer

Senior Data Platform Engineer (Remote)

Ready to land your dream job?

Create your profile and let companies find you!

Built and hosted in the EU 🇪🇺 we keep your data safe

Please let Character.AI know that you found this role at devopsprojectshq.com as a way to support us,
so we can keep providing you with awesome DevOps jobs.