The Site Reliability Engineer (SRE) role at GoodLeap is a pivotal engineering position that merges the expertise of a software engineer with systems engineering to enhance the reliability, scalability, and performance of our applications and services. This remote work-capable role focuses on designing and implementing automation, monitoring, and incident response processes to ensure high service availability and support our DevOps initiatives, thereby improving the health of our production environments.
Key Responsibilities
Collaborate with engineering, DevOps, and product teams to understand system requirements, promote reliability best practices, and foster a culture of shared ownership.
Lead incident response efforts, conduct thorough root cause analysis, and implement continuous improvements post-incident.
Reduce manual work by developing and maintaining internal tools and automation pipelines.
Utilize DataDog to enhance system visibility, refine alerting strategies, and ensure comprehensive observability across services.
Create and maintain essential documentation such as runbooks, service readiness guides, and knowledge articles to uphold operational excellence.
Work alongside teams to support scaling efforts and optimize system performance through data-driven insights.
Required Qualifications
Deep understanding of the Software Development Lifecycle (SDLC), including aspects like source control, defect tracking, automated build systems, and production control processes.
Extensive knowledge of CI/CD and DevOps principles, tools, and integrations.
Proven hands-on experience with Amazon Web Services (AWS), utilizing services such as DynamoDB, CloudFormation, CloudFront, S3, Route53, Lambda, and YAML configuration.
Expertise in containerization and serverless technologies.
Familiarity with infrastructure as code tools, particularly Terraform and Kubernetes.
Strong grasp of observability concepts, including tracing, structured logging, and metrics.
About GoodLeap
GoodLeap is a leading technology firm dedicated to delivering superior financing and software products for sustainable solutions. Our innovative, AI-powered platform supports over 1 million homeowners and thousands of professionals in adopting and deploying solar and home efficiency solutions. Since 2018, GoodLeap has facilitated over $30 billion in financing for sustainable technologies. We are also committed to global social impact through our nonprofit, GivePower, which provides essential water and clean electricity systems worldwide.
Benefits & Perks
Joining GoodLeap as a Site Reliability Engineer offers the opportunity to work in a dynamic, supportive environment with a competitive benefits package. You will be part of a culture that values innovation, efficiency, and sustainability, all while contributing to meaningful projects that make a real difference.
Please let GoodLeap know that you found this role at devopsprojectshq.com as a way to support us, so we can keep providing you with awesome DevOps jobs.