Job Description
- Design, implement, and maintain observability platforms (logging, metrics, tracing, alerting) to ensure system reliability and performance.
- Develop and optimize dashboards, visualizations, and reports to provide actionable insights to engineering and operations teams.
- Configure and manage monitoring tools (e.g., Prometheus, Grafana, New Relic, Datadog, Elastic, Splunk) for real-time visibility into applications and infrastructure.
- Define and track key Service Level Indicators (SLIs) and Service Level Objectives (SLOs) to measure system health and performance.
- Collaborate with developers, SREs, and platform teams to instrument applications and services for observability (e.g., distributed tracing, structured logging).
- Establish and maintain automated alerting and incident response workflows to reduce MTTR (Mean Time to Recovery) .
- Partner with teams to analyze telemetry data and identify root causes of performance bottlenecks, errors, or outages.
- Promote best practices and develop standards for observability across the organization.
- Automate observability infrastructure provisioning and configuration through Infrastructure as Code (IaC) tools (e.g., Terraform, Ansible, Helm).
- Continuously evaluate and integrate new observability technologies to improve monitoring and diagnostics capabilities.
- Ensure observability solutions align with compliance, security, and data governance requirements.
- Provide technical guidance, training, and support to engineering teams on using observability tools effectively.
- Support post-incident reviews by providing observability data for root cause analysis and future prevention.
- Contribute to a culture of proactive monitoring, resilience, and operational excellence.
Qualifications
Bachelor's Degree in Computer Science or related field.Minimum three (3) years of progressive relevant industry experience.Minimum three (3) years of experience with Observability and Orchestration (New Relic preferred)Minimum three (3) years of experience with Configuration Management and Automation tools (Ansible and Terraform preferred)Minimum three (3) years of experience with Monitoring and Telemetry tools.Harness and Gearset experience (Preferred)Three (3) years of JavaScript experience (Preferred)Three (3) years CI / CD experience (Preferred)Ability to interact professionally with a variety of institutions.Excellent written and verbal communication skills.Ability to work independently and within a team.Desire to grow knowledge and skill set through on-the-job training, formal classroom training and independent research.Additional Information
In support of the pay transparency laws enacted across the country, the expected salary range for this position is between $74,064.94 and $130,001.91. Actual pay will be adjusted based on job-related factors permitted by law, such as experience and training; geographic location; licensure and certifications; market factors; departmental budgets; and responsibility. Our Talent Acquisition Team will be happy to answer any questions you may have, and we look forward to learning more about your salary requirements. The position qualifies for the below benefits.
Adtalem offers a robust suite of benefits including :
Health, dental, vision, life and disability insurance401k Retirement Program + 6% employer matchParticipation in Adtalem’s Flexible Time Off (FTO) Policy12 Paid Holidays
For more information related to our benefits please visit : https : / / careers.adtalem.com / benefits.
You are also eligible to participate in an annual incentive program, subject to the rules governing the program, whereby an award, if any, depends on various factors, including, without limitation, individual and organizational performance.
Equal Opportunity – Minority / Female / Disability / V / Gender Identity / Sexual Orientation