Boson AI is looking for a Site Reliability Engineer to manage our cluster of GPU, CPU and storage servers, used to train and serve large AI models. You'll get to work on the latest NVIDIA GPUs and deal with tens of PB of storage. Your set of responsibilities include configuration, administration and maintenance of the system.
You will join a team responsible for Boson AI's datacenter and beyond. Ideally you should live in the Toronto region and be prepared to go to the datacenter for an oncall that requires physical presence (this workload is shared with other team members).
$150,000 - $300,000 a year
Compensation is competitive and will depend on the level of seniority of the role.
You should have worked in a related role before. A GitHub profile with publicly visible code is a plus, so are other artifacts that can be reviewed. If you are a fresh graduate, please let us know as we might have similar roles for you, too.