Back to jobs

Site Reliability Engineer

Job description

As a Site Reliability Engineer, you will help us achieve our goals by continuously improving our SaaS offering’s features and robustness. You will participate in designing, developing, deploying, monitoring, supporting, documenting, and troubleshooting our SaaS solution.
This is an exciting opportunity to collaborate closely with the Cloud Operations team, the wider organization, and external vendors and customers.
This is a hybrid role based in our Cambridge or London office, so you will ideally be comfortable coming into the office once or twice a week. If you’re interested in the role but require more flexibility, please speak to us!
Key Responsibilities:
  • Deploying, maintaining, monitoring, and upgrading production deployments of our SaaS solutions
  • Building software and systems to manage platform infrastructure and applications
  • Continually evaluating and improving our technology and processes to increase quality, decrease costs, and improve time-to-market
  • Periodically testing the service with predictable and unpredictable failures
  • Providing 2nd-line operational support for our SaaS customers
  • Gathering data and generating reports on the service performance
  • Developing and documenting internal processes
  • Working with engineering/data science to drive and develop new capabilities
  • Providing out-of-hours support for critical service issues as part of our on-call engineer rota
Preferred Skills/Experience:
While not all are essential, ideally you will have experience with the following:
  • Administering cloud infrastructure or developing cloud applications (preferably in AWS)
  • Configuration management, including Infrastructure as Code
  • Linux, shell-scripting, and command-line tools
  • Programming in one or more high-level programming languages (e.g. Python)
  • Networking (e.g. DNS, routing, firewalls)
  • Source-control management (e.g. Git)
  • Continuous Integration / Continuous Deployment (CI/CD)
  • Monitoring, metrics, and alerting
  • Containerization (e.g. Docker)
  • Administering, developing applications for, or deploying applications to Kubernetes
  • Using or developing applications with service mesh (e.g. Istio)
  • Object-oriented programming and design
  • Operating production-grade services
  • Providing technical support
  • Building serverless or cloud-native applications
  • Writing technical documentation
  • Developing processes and procedures
  • Securing applications, services, and data (e.g. authentication, authorization, encryption, and TLS)
  • Experience with any of the following: Terraform, SaltStack, MongoDB, Elasticsearch, Kafka, Prometheus, Grafana, HashiCorp Vault
We are looking for candidates who are passionate about technology, keen to continuously learn, and excited to contribute to a dynamic team environment. If you have the required skills and are looking for a challenging and rewarding role, we encourage you to apply.