✨ About The Role
- The Senior Site Reliability Engineer will be responsible for measuring and maintaining the uptime of critical services for the development of autonomous vehicles.
- This role involves all phases of rolling out a service, from designing maintainable and fault-tolerant systems to deployment and continual improvement.
- The engineer will work with systems that handle large volumes of data and data-processing pipelines performing compute-intensive tasks on CPUs and GPUs.
- Experience with AWS architecture and operational experience with technologies like RDS, ECS, and EKS is a bonus.
- The role includes deploying and managing Kafka/MSK as a service and establishing CI/CD best practices.
âš¡ Requirements
- The ideal candidate will have experience in supporting production service infrastructure and utilizing configuration management tools such as Ansible, Terraform, or Salt.
- Proficiency with microservice architecture and tooling around Kubernetes is essential for success in this role.
- A strong ability to extract and report useful performance or service metrics using tools like ELK, Prometheus, and Grafana is required.
- Familiarity with Linux operating systems is a must, regardless of the specific flavor.
- The candidate should have programming experience in Python or C/C++ to effectively contribute to the team.
- A bachelor's degree in engineering, mathematics, or a related field, along with at least 2 years of relevant experience, is necessary for this position.