- Drive Site Reliability Engineering agenda to improve availability, reliability, and performance of services
- Drive observability for our applications
- Drive, optimise and operate initiative, example, reduction of operation toil
- Work with application teams in setting up SLI, SLO and Error budget for their applications
- Work with enterprise team in deploying SRE enablers/initiatives
- Experience in one or more of the following: Java Script, Java and Python
- Experience with APM system as ELK, Grafana, Prometheus, Dynatrace and AppDynamics, etc
- Understands key SRE concepts such as Toil, SLI, SLO, Error Budgets, MTTD, MTTR, etc
- Possess strong interpersonal and communication skills to be able to deal with and form good relationships with other technology teams through day-to-day support and project work