Nisum is a leading global digital commerce firm headquartered in California, with services spanning digital strategy and transformation, insights and analytics, blockchain, business agility, and custom software development. Founded in 2000 with the customer-centric motto “Building Success Together®,” Nisum has grown to over 1,800 professionals across the United States, Chile,Colombia, India, Pakistan and Canada. A preferred advisor to leading Fortune 500 brands, Nisum enables clients to achieve direct business growth by building the advanced technology they need to reach end customers in today’s world, with immersive and seamless experiences across digital and physical channels.
The Staff Site Reliability Engineer is a position of technical expertise, influence, and leadership in the technology realm. The position requires the individual to apply their expert knowledge to ensure best practices and well-engineered architecture across multiple teams. They will also be a key stakeholder and initiator of major changes to processes, engineering practices, and system administration. This position will be required to work in a space of solving critical issues and initiatives across multiple teams. It will require an extensive and deep understanding of cutting-edge practices and innovative approaches to problems. Staff Site Reliability Engineers are also tasked with establishing and maintaining a positive and productive culture based on the client Leadership Principles.
Essential Functions and Responsibilities:
- Identify and reduce redundant, unnecessary processes by asking the question, “Is this process improving our ability to reliably deliver software to our customers?”
- Help resolve bugs causing consistent reliability issues by working to identify the root cause and proposing short-term and long-term solutions to triage and resolve them.
- Advocate for reliable design patterns in our distributed systems (graceful failure in the event of absent dependencies, share-nothing architecture, loose coupling of services, etc.) and work to get reliability features prioritized in team backlogs.
- Identify reliability concerns before they become an outage via engagements with teams regarding major code revisions.
- Collaborate with developers, designers, testing, and product management to address issues with service reliability or performance.
- Work with Product to create and update KPIs for their products to allow business process level observability of throughput.
- Work with product teams to set SLO, SLI, and error budgets for their services.
- Participate in an on-call rotation and respond to major incidents.
- Performs other related duties as assigned.
Knowledge, Skill and Abilities:
- Demonstrated ability to work with monitoring and alerting platforms; New Relic and Opsgenie.
- Practical knowledge of CI/CD technologies; Github, CodePipeline.
- Practical knowledge of programming languages; Python and Java.
- Practical knowledge of databases; Oracle, Postgres, DynamoDB.
- Demonstrated ability with chaos engineering technologies.
- Demonstrated ability to perform technical analysis in a discovery fashion, resulting in architecture artifacts such as a logical system deployment diagram, sequence diagram, state diagram, ERD, etc. This must be performed in UML.
- Practical knowledge of design patterns, the conditions that indicate their usage and an ability to identify anti-patterns when presented a diagram that contains the relevant information.
- Practical knowledge of various application architectures; must be well familiar with Martin Fowler’s Enterprise Application Design Patterns.
- Can execute through soft skills as this position is one of leadership but has no direct reports.
Competencies:
Organizational or Student Impact:
- Recommends and implements changes in technical/business processes; identifies areas for improvement.
- Helps lead/coordinate extremely complex technical projects and programs and leads development and implementation of innovative solutions for specialized technical issues.
- Works proactively; identifies and helps prevent/ solve problems that may cross disciplines.
- Fully understands and quantifies project risks with impact. Identifies, generates, and implements innovative solutions.
Problem Solving & Decision Making:
- This individual accomplishes goals and objectives independently.
- Builds and leads teams, influencing decisions and results.
- Uses discretion to fully scope, design, and implement solutions to complex technical problems.
- The individual provides regular technical advice and direction to technical teams and management.
- Models and helps set high standards for effective interactions with internal and external individuals.
Communication & Influence:
- Communicates with parties within and outside of their job function and typically has responsibilities for communicating with parties external to the organization.
- Works to influence others to accept and understand new concepts, practices, and approaches. Requires ability to communicate with executive leadership regarding matters of significant importance to the organization.
- This individual may conduct briefings with senior leaders within the technical function.
Leadership:
- Frequently responsible for providing guidance, coaching, and training to other employees across the Company within the area of expertise.
- Responsible for managing large, complex project initiatives or strategically important solutions to the organization, involving large cross-functional teams.
- May have direct reports but generally fewer than three.
Minimum Qualifications:
- The individual is acknowledged within the group as a subject matter expert.
- Typically requires a University Degree or equivalent experience.
- 9 years of prior relevant experience.
Department Specific Minimum Qualifications:
- Bachelor's degree in computer science, information technology, or a related field. - Appropriate prior experience and depth may be substituted for this qualification.
- 9 years experience in site reliability engineering, systems administration, or software development; automating approaches and technologies in engineering
- Experience in web-based applications and integrations to enable those application using Java, Python, REST, JSON/YAML, XML, SQL and other technologies, including experience integrating third-party products
- Experience in monitoring and alerting platforms, especially New Relic and Opsgenie.
Preferred Qualifications:
- Technical Experience in any of the following:
Amazon Web Services (AWS), Jira, Agile/Scrum, Python, OAuth 2, IDM/OSSO, Ruby/Rails, PHP, Hibernate/Seam, J2EE, Tomcat, jQuery, JavaScript, NOSQL, Angular, New Relic, Opsgenie.
Technical Certifications
- Strong experience with distance education and distance learning students is preferred.
¿What can we offer you?
- Belong to an international and multicultural company that supports diversity.
- Be part of international projects with a presence in North America, Pakistan, India and Latam.
- Work environment with extensive experience in remote and distributed work, using agile methodologies.
- Culture of constant learning and development in current technologies.- Pleasant and collaborative environment, with a focus on teamwork.
- Access to learning platforms, Google Cloud certifications, Databricks, Tech Talks, etc.
- Being part of various initiatives and continuous participation in internal and external activities of innovation, hackathon, technology, agility, talks, webinars, well-being and culture with the possibility not only to participate but also to be an exhibitor.
- Since you live in Chile you will also have access to several benefits related to our center :)!
Nisum is an Equal Opportunity Employer and we are proud of our ongoing efforts to foster diversity and inclusion in the workplace.