Aleph Alpha Research’s mission is to deliver category-defining AI innovation that enables open, accessible, and trustworthy deployment of GenAI in industrial applications. Our organization develops foundational models and next-generation methods that make it easy and affordable for Aleph Alpha’s customers to increase productivity in finance, administration, R&D, logistics, and manufacturing processes.
We are hiring to grow our org in Heidelberg, Germany, and are looking for well-rounded, experienced AI Software Engineers with experience in DevOps/MLOps.
As an AI Software Engineer in Aleph Alpha Research, you help the research teams take model and algorithm development to the next level. You own significant portions of the research infrastructure, including the pipelines related to data processing, our testing infrastructure, and engineering-heavy parts of our distributed training software. You also contribute your software engineering experience to research projects that have a significant influence on our ability to deliver novel category-defining AI capabilities.
As part of our Infrastructure and Platform Engineering Team, you maintain and develop cluster infrastructure, manage cloud infrastructure, build and manage DevOps and MLOps pipelines, implement SE best practices (CI/CD, monitoring, testing frameworks), and collaborate and co-develop with our Data Center Infrastructure Team as well as Product Teams on shared platform components or projects.
In our Data Engineering and Distributed Training Team, you engineer and optimize data processing pipelines, develop and maintain components of distributed training software, and build and maintain infrastructure and support for data-heavy tasks.
As part of our Research SE Teams, you work alongside Researchers and other SW Engineers on model and algorithm development, collaborate on ablation studies, Proof of Concepts (POCs), and model optimizations. You create robust, maintainable codebases that support efficient transition of new technologies and R&D artifacts from research to production, and co-own efforts that aim to make parts of our code source available to the broader research community.
Depending on your profile, you will contribute to one or more of the following areas:
Design and (continuous) development of the research infrastructure, establish mechanisms that improve code quality, testing, and feature delivery
Support the development, training, and maintenance of deep learning models, in collaboration with the researchers as well as the SW/HW engineers at our distributed computation centers
Developing and optimizing lower-level code for data processing, tokenization, or research projects
Contributing your software-engineering expertise to research projects (this could be, for example, in areas such as agent interfaces or data generation)
Help production AI research innovations into real-world applications
Engaging in our hiring process and otherwise mentoring engineers and researchers in terms of software development best practices
Most of our training code is written in Python, with PyTorch being our main deep learning framework. Some of our lower-level code is written in Rust.
Basic Qualifications
3+ years of non-internship professional software development experience, with demonstrated ability to solve complex and novel problems independently using state-of-the-art scientific approaches.
2+ years of experience in the design or architecture (e.g., design patterns, reliability, scaling) of new and existing systems, contributing critical path code to impactful projects.
Proficiency in programming with at least one major software programming language and the ability to independently implement complex changes to foundational systems or algorithms.
Strong communication skills, with the ability to convey technical solutions and anticipate scientific or engineering limitations to diverse audiences.
Bachelor’s degree in computer science, engineering, or a closely related field.
Ready to relocate to Heidelberg, Germany.
Preferred Qualifications
5+ years of experience across the full software development lifecycle, including coding standards, code reviews, source control management, build processes, testing, and operational excellence.
Demonstrated skills in integrating complex systems with cross-team collaboration to enhance solution consistency and overall impact.
Proven experience designing and delivering high-performance, scalable systems into production environments, with contributions to research outputs or top-tier publications being a plus.
Familiarity with systems programming and low-level languages such as Rust, with a focus on performance and reliability.
Master’s degree in computer science or related fields is preferred.
We do not necessarily require prior experience in machine learning for this role, but we do value your eagerness to learn. If you have prior experience in ML, we will be particularly excited about:
Experience productizing AI research innovations into real-world applications, especially in areas such as large-scale data processing and distributed computation for foundational model training or inference.
Familiarity with popular NLP tools and frameworks such as PyTorch or HF Transformers, with knowledge of transformer architectures.
Ability to write clear proposals or publications, and demonstrated excellence in explaining research contributions to both technical and non-technical stakeholders.
Proven ability to apply advanced scientific methods to novel problems, resulting in impactful outputs such as publications or projects.
We believe embodying these values would make you a great fit in our team:
We own work end-to-end, from idea to production: You take responsibility for every stage of the process, ensuring that our work is complete, scalable, and of the highest quality.
We ship what matters: Your focus is on solving real problems for our customers and the research community. You prioritize delivering impactful solutions that bring value and make a difference.
We work transparently: You collaborate and share your results openly with the team, partners, customers, and the broader community through publishing and sharing results and insight including blogposts, papers, checkpoints, and more.
We innovate through leveraging our intrinsic motivations and talents: We strive for technical depth and to balance ideas and interests of our team with our mission-backwards approach, and leverage the interdisciplinary, diverse perspectives in our teamwork.
Become part of an AI revolution!
30 days of paid vacation
Public transport subsidy
Fitness and wellness offerings (Wellhub)
Mental health platform (nilo.health)
Flexible working hours and hybrid working model