- You are here:
- Reliability Expert
A Reliability Expert is expert in the reliability of a service or set of features (from here on, we'll just call it service for brevity).
Reliability Experts typically help to develop the service (in which they may be Specialists) but with explicit attention to the reliability of the service in production. This is measured by the availability and performance of the service on GitLab.com, its impact on the availability and performance of GitLab.com as a whole, and feedback from customers on the reliability of the service on their on-premises installations.
- work within a team to develop a service or set of features ("service" for brevity).
- develop monitoring and alerting to measure and act on improving the availability, and scalability of the service on GitLab.com.
- develop those aspects of the service's codebase and deployment that contribute to its reliability.
- take care of the infrastructure related to the service. An expert will be able to mostly build and maintain infrastructure that is specific to the service, but work with the Production Team where infrastructure cannot be isolated for the service.
- radiate knowledge to the infrastructure team about the service, and radiate knowledge of the service's infrastructure and reliability to the rest of the development team.
- take part in on-call. On-call is not split out by the service that triggers the on-call alert. Doing so would be too much of a burden on the individuals associated with those individual services. This means that Reliability Experts are familiar with GitLab.com's infrastructure, and emergency response processes.