Senior Platform Engineer
London - Hybrid, 2 days per week in the office
Full Time
The RVU London cloud infrastructure team
We are committed to Open Source software in order to build services that help millions of customers to save money and make confident decisions. As well as helping our customers, we also give back to the community by open sourcing interesting projects that we build that might benefit others. https://www.rvu.co.uk/open-source
Here at RVU we’re looking for more people to join our infrastructure platform team, known internally as ‘Airship’.
Our goal as a team is to enable our development teams to deliver services quickly, reliably and securely. We do this by running multiple Kubernetes EKS and Fargate clusters in AWS, creating common tooling to aid in development tasks and running shared services such as Opensearch, Envoy, Vault and Prometheus to name a few. The team has also recently expanded its scope to simplify Data engineering in the organisation using the same techniques we used to ease creating web applications on data pipelines, leveraging Argo Workflows.
Day to day tasks will include:
The ideal candidate will have some of the following skills:
Our team has been featured in a few conferences:
CNCF: https://www.youtube.com/watch?v=-v9tiGTH86Q
PlatformCon: https://www.youtube.com/watch?v=8YNXTI8E13s and
https://platformcon.com/talks/from-cloud-cost-management-to-finops
We have also been featured in the London AWS Summit 2023 for contribution to the EKS tooling community https://surajincloud.com/announcing-kubectl-eks-plugin-v0-1-0
We also hosted and held the Terraform Hashicorp User Group meetup in London in April.
Examples of some projects we have worked on:
Our running services previously relied on having long lived credentials to access data that were rarely, if ever, rotated. We wanted human and pod identity to be used to grant short-lived credentials based on policies. We used Vault to build a solution to this problem, creating tooling such as vault-creds/vault-webhook to make it as easy as possible for developers to use these credentials with their services. (Blog)
We have a lot of existing AWS resource that have their access limited using IAM. We used Kube2IAM initially but experienced race conditions that would hand different role credentials to pods. We started work on a replacement and have worked with the community to get it used in other places.
For some of our more important applications it was important to have them survive a total cluster outage. This meant we needed a way to easily route traffic to an application spread out across multiple clusters so we created Yggdrasil, a tool to configure Envoy nodes to route our traffic between clusters based on Ingress resources. (Blog)
It tracks deployments as they roll out and posts useful status updates into Slack. It does this by watching the Kubernetes api for namespaces and deployments with the correct annotations. When a new deployment rollout begins and completes updates are posted to the Slack API. Any errors during the deployment rollout are captured and included in the Slack message (see example below). This can be very useful to help quickly debug a failing deployment.
You can also check out our medium page to see a number of blogs on what we’ve been up to.
Our commitment to you
At RVU, we are dedicated to developing valuable, inclusive, and user-friendly products and services for all. To achieve this it’s essential that our teams reflect the diverse range of people in our community. We believe in being the change we wish to see in the world, by embracing our differences and holding ourselves accountable to being open and inclusive teammates and wider community members.
Benefits
We want to give you a great work environment; contribute back to both your personal and professional development; and give you great benefits to make your time at RVU even more enjoyable. Some of these benefits include:
#LI-JM1
#LI-Hybrid