Never miss a job!

Join 1,800+ DevOps engineers getting weekly alerts for remote and US, EU roles that don't show up on the big boards. Junior to senior. Kubernetes, AWS, Terraform — filtered for your stack.

🇪🇺 Secure Your EU Traffic

Ensure digital sovereignty for your infrastructure. Get EU static IPs with full data residency for compliance and peace of mind.

🇪🇺 Get an EU IP with OutboundGateway → GDPR-compliant • Static IPs • EU Data Residency

2026-06-26

Senior MLOps Engineer

Infrastructure AWS Platform CI/CD Terraform Gitlab CI Docker Cloud Monitoring Python

Job Overview

We're hiring a Senior MLOps Engineer to be the data team's owner of production ML operations. You'll build the pipelines that take models from prototype to production, own the low-latency serving API behind our Next Best Action (NBA) engine, and stand up the monitoring, alerting, and reliability layer that keeps NBA models — and the LLM agents that consume them — healthy in production. This is a builder's role at a builder's moment: NBA is going live, the production ML platform is being shaped now, and you'll define how Clutch ships and operates AI for years to come. When there isn't active MLOps work, you'll also contribute to data engineering and machine learning work across the team.

About the Team

The Data team today is five people: one data scientist, two data engineers, one data analyst, and one product manager. We're small, ambitious, and shipping fast — ML models heading to production, a serving API being built, and AI agents in active development. You'll be the senior MLOps voice inside the team and the operational bridge to HAL, the platform team that runs Clutch's agent runtime. Expect tight feedback loops, real autonomy, and a team that values pragmatism over purity.

Key Responsibilities

Within 3 months, you will:

Take ownership of the ML serving API that serves NBA recommendations, partnering with the data engineer who's been building it, and harden it for low-latency production traffic
Build the first repeatable deployment pipeline: model artifact → versioned, deployable, rollback-able production service, with infrastructure defined as code
Stand up the monitoring foundation: latency/error/drift dashboards, alerting, and audit/trace visibility across models and agents
Build a working relationship with HAL and become the data team's go-to on ML serving and reliability decisions

Within 6 months, you will:

Be the primary owner (with data engineer support) of the ML serving platform and deployment pipelines for NBA and our ML models
Have at least one production model and one production agent fully instrumented — versioning, monitoring, alerting, and multi-tenant gating in place
Define the data team's playbook for shipping a new ML model to production, end-to-end
Drive architectural decisions across APIs, processing pipelines, distributed compute, storage, search, observability, cloud infrastructure, and model-serving workflows
Mentor the data engineers on MLOps patterns so they can confidently support and extend the systems you own

Within 9 months, you will:

Operate as the technical lead within the data team for NBA production ML operations — the person other teams come to when they want to understand how Clutch ships and runs ML reliably
Have measurably improved cost and latency
Be shaping the data team's roadmap for the next generation of ML infrastructure, in partnership with the PM and data scientist
Help us decide what to hire next as the team scales

Required Qualifications

8+ years of experience in software, data, or ML engineering, with 4–5+ years running ML systems in production — you've taken models from prototype to production and own what happens after deploy
Strong Python — most of the work (serving API, pipelines, tooling, data pipelines) is in Python, and you're comfortable in production codebases, not just notebooks. Some TypeScript is involved for integration with our agent runtime — you don't need to be an expert, comfort with a second language is enough
CI/CD & deployment discipline. You build training and deploy pipelines that take a model artifact to a versioned, deployable, rollback-able production service, with automated testing and reproducible builds. You've implemented CI/CD for ML and built and maintained CI/CD pipelines (GitHub Actions, Bamboo, GitLab CI, or similar)
Infrastructure as code. You manage cloud infrastructure (AWS Lambda, ECS) with Terraform or equivalent — no click-ops, everything reviewable and reproducible
Monitoring & observability discipline. You instrument serving systems for latency, error rates, drift, and cost; you read audit rows and distributed traces; you set up alerting so regressions are caught before users feel them. You treat monitoring as a first-class deliverable, not an afterthought
Reliability rigor. You design for failure: structured error handling, graceful degradation, rollback paths, and runbooks. You have a story about a production incident you handled and how you hardened the system afterward
Experience building and operating low-latency production APIs (FastAPI, BentoML, or equivalent), with opinions on serving, batching, and caching
Comfortable in AWS (Lambda especially), containers (Docker), and GitHub-based workflows
Security & governance. You ensure security and governance across systems: IAM, KMS, access policies, and Secrets Manager/SSM
DevOps / infrastructure knowledge, plus data manipulation and feature engineering
Solid understanding of ML concepts: models, pipelines, metrics, and supervised/unsupervised learning
Integrate and optimize AI/ML services with the company's other systems
You use AI tooling actively in your engineering workflow — not as a novelty, but as a default. You'll be expected to demonstrate this during the technical evaluation
Databricks, PySpark

Desired Qualifications

Production agent observability: reading audit rows, distributed traces, per-tool latency and error metrics
Cost and latency tradeoff intuition in production ML/agent systems — has measurably reduced per-inference or per-conversation cost or P95 latency on a live system
Familiarity with an agent runtime framework (Vercel AI SDK, LangChain, LlamaIndex, or equivalent) from a serving/operations angle
Multi-tenant agent gating experience
Agentic AI operations experience: Agent Ops, LLM Ops
Prior SaaS and/or FinTech experience

Apply

Please let WithClutch know that you found this role at devopsprojectshq.com as a way to support us,
so we can keep providing you with awesome DevOps jobs.

💼 Upgrade to Premium

Get instant access to exclusive DevOps jobs with €120K+ salaries

Monthly

€16.50/month

Best value for job search

✓ Access to premium jobs
✓ Priority support
✓ Early access to new jobs

Get Started

Best Value

Yearly

€49.50/year

Only €4.13/month - Save 75%

✓ Everything in Monthly
✓ Maximum savings
✓ Best long-term value

Get Started

View All Plans & Features

More like this

→ All MLOps jobs → AWS jobs → Terraform jobs → Docker jobs

Never miss a job!

🇪🇺 Secure Your EU Traffic

Senior MLOps Engineer

Job Overview

About the Team

Key Responsibilities

Required Qualifications

Desired Qualifications

You must be logged in to apply for this job

Please let WithClutch know that you found this role at devopsprojectshq.com as a way to support us,
so we can keep providing you with awesome DevOps jobs.

💼 Upgrade to Premium

Monthly

Yearly

More like this

Similar Jobs

Network Engineer

Senior Cloud Security Engineer

Lead Kamino Market Infrastructure Software Engineer

On-Demand DevOps Engineer

Built and hosted in the EU 🇪🇺 we keep your data safe

Never miss a job!

🇪🇺 Secure Your EU Traffic

Senior MLOps Engineer

Job Overview

About the Team

Key Responsibilities

Required Qualifications

Desired Qualifications

You must be logged in to apply for this job

Please let WithClutch know that you found this role at devopsprojectshq.com as a way to support us, so we can keep providing you with awesome DevOps jobs.

💼 Upgrade to Premium

Monthly

Yearly

More like this

Similar Jobs

Network Engineer

Senior Cloud Security Engineer

Lead Kamino Market Infrastructure Software Engineer

On-Demand DevOps Engineer

Built and hosted in the EU 🇪🇺 we keep your data safe

Someone Just Upgraded!

Please let WithClutch know that you found this role at devopsprojectshq.com as a way to support us,
so we can keep providing you with awesome DevOps jobs.