Job Responsibilities:
Demonstrates extensive abilities and/or a proven record of success in the following areas:
- Providing SRE support for multiple distributed software applications (client-facing – internal & external);
- Managing and continually improving platform infrastructure and applications with high reliability, resiliency, performance & quality, and faster time-to-market taking a holistic view of system health into account;
- Gathering and analyzing metrics from both systems and applications for performance tuning and fault finding;
- Partnering with development teams to improve services through rigorous testing and release procedures meeting security, compliance & performance requirements;
- Participating in systems design, platform management, and capacity planning. Ensure that platforms are designed with "operability " in mind;
- Pursuing the discovery of system faults throughout the application lifecycle – before & after release;
- Defining, Implementing and being accountable for Velocity & Reliability (SLIs, SLOs, Error Budgets);
- Creating & supporting sustainable systems and se...