Gorilla Logic provides nearshore Agile teams to Fortune 500 and SMB companies, bringing unparalleled expertise in the delivery of full-stack web, mobile, and enterprise applications. Our highly collaborative Agile Gorillas are uniquely qualified to implement complex software initiatives. With offices in the United States, Costa Rica, Colombia and Mexico, Gorilla Logic helps clients gain competitive advantages to achieve results faster.
Senior Site Reliability Engineer (SRE)
Gorilla Logic is looking for a Senior Site Reliability Engineer (SRE) responsible for automation, instrumentation, and stability of our client's platforms to achieve operational health and performance. Our environment will require you to work effectively with your teammates, of course. But your real success will be measured by how well you couple critical thinking with self-motivation, enthusiasm, and determination.
Responsibilities
*Focus on platform monitoring, analytics, observability, dashboarding, and alerting
*Combine sysadmin and development skills to automate Platform infrastructure and operations
*Responsible for core SRE tenants of availability, latency, performance, efficiency, change management, monitoring, emergency response, and capacity planning
*Focus on Platform infrastructure to optimize existing systems and eliminating work through automation that defines clear processes for a given operation/business solution
*Sustain scalable and highly reliable software systems for infrastructure and operations
*Applying a software engineering mindset to systems administration
*Partner with development teams to bring reliability into the development cycle
*Partner with engineering teams to identify and instrument SLAs and SLOs
*Must have the ability to work in a dynamic, fast-paced environment
*Strong communication skills to interact with Agile team members
*Good analytical thinking and problem-solving skills
Technical Requirements
*Bachelor's degree in Computer Science or related field (or equivalent experience)
*5+ years of web application development background or DevOps
*3+ years of experience as a site reliability engineer
*Primary programming language skill in Python
*2+ years of working in Azure, Azure DevOps (ADO), CI/CD, and Pipelines
*Experience with Dynatrace for monitoring, observability, and security
*Extensive experience with infrastructure monitoring and performance tools
*A proactive approach to spotting problems, solving complex problems, identifying areas for improvement, and performance bottlenecks
*Demonstrated track record of maintaining and building large scale distributed systems
*Experience with TCP/IP networking protocols and security principles
*Proficient in scripting, coding, and deployment automation
Bonus Skills
*Experience with dynamic resource orchestration frameworks (Docker, Kubernetes)
*Linux background with both Debian and Ubuntu
*Familiar with Jenkins, Spinnaker, Artifactory, Terraform, Datadog and Sumologic
*Familiar with web technologies such as HTTP, TLS, REST, Nginx and HAProxy