Join SCUBA, the pioneering Decentralized Collaborative Decision Intelligence platform, and become an instrumental part of a team trusted by global giants like Microsoft, McDonald's, Twitter, and Warner Bros. SCUBA offers unparalleled, real-time decision intelligence across billions of touchpoints, all while maintaining the utmost respect for privacy. Our roots are planted firmly in innovation, with a founding team of former Facebook executives and leadership that boasts veterans from Kantar, Sonos, and Splunk.
Who You Are
You're a seasoned Site Reliability Engineer with a minimum of three years deep-diving into the realms of Infrastructure, DevOps, or SRE. You have a keen eye for deploying and maintaining robust systems that are effortlessly scalable, with a knack for diagnosing and swiftly resolving issues. Your expertise in observability tools means you understand the inner workings of the services you're dedicated to, inside and out. You're committed to a blameless culture when incidents happen, focusing on continuous improvement and learning. You excel in providing mentorship, driven by your strong belief in the principles of Site Reliability Engineering.
What We Value
We're looking for a Site Reliability Engineer who thrives in cloud environments and communicates with clarity and impact. At SCUBA, you will:
Deploy, maintain, and troubleshoot large-scale, critical infrastructure deployments in AWS and Azure.
Ensure the highest levels of availability, performance, and reliability for our customer deployments.
Utilize a detailed incident management process to keep our business partners informed.
Promote adherence to SLAs, SLIs, and SLOs with proactive monitoring, alerting, and scaling.
Securely manage customer data and deployments, upholding their confidentiality, integrity, and availability.
Must-Have Skills
Proficient in Linux OS administration, particularly Debian-based systems like Ubuntu.
Solid experience with cloud services (AWS or Azure).
Skilled in applying SLAs, SLIs, and SLOs for effective production environment monitoring using tools such as Prometheus, PagerDuty, and Grafana.
Expertise in configuration management and automation with Ansible and Terraform.
Experience in scripting with Python and Bash for automation of routine tasks.
Familiarity with blameless incident management processes and a commitment to continuous process improvement.
Knowledge in release engineering, including CI/CD and artifact packaging.
Demonstrable experience in programming languages such as C, C++, Java, Python, Go, or JavaScript.
Experience in vulnerability and system security management.
Why You'll Love Working Here
Be a part of a forward-thinking team that values innovation and collaboration.
Opportunity to work on high-impact projects with global influence.
Embrace continuous learning in a high-growth, dynamic environment.
Gain experience with cutting-edge technologies and practices.
Extra Awesome
Experience in a startup or high-growth tech environment.
Direct involvement in instrumenting metrics, logging, and bug fixes in production software.
Knowledge of Kubernetes and container orchestration.
Position Perks
Remote flexibility across the US & Canada.
Participation in an on-call rotation, contributing to our mission of delivering unparalleled service reliability.
At SCUBA, we're not just building systems; we're empowering real-time decisions for some of the world's most influential brands. If you're ready to dive deep and make a substantial impact, we'd love to hear from you.