Nivoda are seeking a highly skilled and experienced Site Reliability Engineering (SRE) Manager to join our dynamic team. As the SRE Manager, you will play a critical role in ensuring the reliability, scalability, and performance of our infrastructure and services through both direct technical contribution along with team building and management. This position is fully remote, allowing you to work from anywhere while collaborating with a talented team distributed globally.
Nivoda is the industry leading B2B diamond and gemstones marketplace, connecting jewellery retailers to gemstone supplies, in order to save time and money whilst gaining access to a global diamond supply at the best prices, all with zero inventory risk.
With a team of over 300 dedicated employees around the world and a wealth of experience in the industry, Nivoda has developed an award-winning solution that enables jewellery businesses of any size, in any location, to buy and sell diamonds in the most profitable, efficient and hassle-free manner.
Over the course of the last six years, Nivoda has evolved into a global platform recognised for its innovation, customer service and ability to deliver a seamless, reliable and efficient experience.
Dynamic working environment in an extremely fast-growing company
Intellectually challenging work, play a massive role in Nivoda’s success and scalability
Connect with peers globally in a true no central location team
Collaborative and supportive work environment with very little hierarchy
Flexible working hours
Take full ownership of the production estate from both a technical and process perspective. Provide a consistent smooth operation of live systems and drive all on-call support issues. Design and operate a new incident tracking process to ensure root causes are found and remediated in a timely fashion by the development team.
Create and maintain high end monitoring and automation tooling. Drive automation initiatives to streamline operational workflows and improve efficiency. Develop and maintain tools, scripts, and dashboards to monitor system health, performance, and reliability.
Build a first class SRE team. Through a combination of leading by example, coaching and mentoring, mould the team would want to have around you. Provide leadership and guidance to the SRE team, fostering a culture of collaboration, innovation, and continuous improvement.
Proven experience in a senior or lead SRE role, with a strong track record of building and maintaining highly reliable infrastructure and services.
Expertise in incident management, including incident response, resolution, and post-mortem analysis.
Proficiency in monitoring, alerting, and observability tools such as Prometheus, Grafana, ELK stack or Datadog.
Experience with cloud platforms such as AWS, Azure, or GCP, including infrastructure as code tools like Terraform or CloudFormation.
Strong scripting and automation skills, with proficiency in languages such as Python, Bash, or Go.
Excellent communication and collaboration skills, with the ability to work effectively with cross-functional teams in a remote environment.
Demonstrated leadership capabilities, with a passion for mentoring and developing team members.
If you are a results-driven individual with a passion for reliability engineering and leadership, we encourage you to apply for this exciting opportunity to join our team as the SRE Manager.