Talent.com
Principal Site Reliability Engineer

Principal Site Reliability Engineer

Black Rock GroupsWashington, DC, United States
job_description.job_card.variable_hours_ago
serp_jobs.job_preview.job_type
  • serp_jobs.job_card.full_time
  • serp_jobs.filters_job_card.quick_apply
job_description.job_card.job_description

The Principal Site Reliability Engineer will be a critical technical leader responsible for driving the operational excellence, resilience, and security of our core systems for a key Randstad client in the Washington D.C. area. This senior role merges deep expertise in infrastructure automation (IaC), CI / CD architecture, and cloud security with the foundational principles of Site Reliability Engineering (SRE), including defining SLOs, managing error budgets, and leading incident response. You will mentor cross-functional teams, implement cost-efficient cloud practices, and build the foundational tools and platforms that enable our developers to deliver secure, highly available, and scalable services with velocity.

Responsibilities

  • Reliability Engineering & Operations : Define, implement, and maintain rigorous Service Level Objectives (SLOs) and Service Level Indicators (SLIs), establish effective error budgeting, and lead incident response, root cause analysis, and postmortem processes to ensure continuous service improvement.
  • Infrastructure Automation : Architect, implement, and manage secure, scalable, and repeatable cloud environments leveraging Infrastructure-as-Code (IaC) tools such as Terraform, Ansible, and CloudFormation.
  • CI / CD Optimization & Security : Design and optimize secure, high-performance CI / CD pipelines (e.g., GitHub Actions, Jenkins) incorporating advanced deployment techniques like automated rollback, canary, and blue / green strategies, and ensuring artifact validation.
  • Observability & Telemetry : Develop comprehensive observability solutions, including building robust dashboards, configuring alerts, implementing synthetic checks, and maintaining telemetry pipelines (metrics, logs, traces) to ensure deep visibility into system performance, availability, and cost.
  • Security & Compliance Enforcement : Integrate security tooling (SAST, DAST, SBOM, secrets scanning) directly into the deployment lifecycle and enforce security policies-as-code within deployment workflows to maintain strict compliance and a secure posture.
  • Cost & Capacity Management : Implement tooling and financial practices to proactively monitor cloud cost trends, perform right-sizing of infrastructure resources, and strategically plan capacity to ensure optimal cost-to-performance ratio and high availability.
  • Internal Platform Enablement : Design and build reusable internal tools, shared playbooks, and self-service platforms that significantly enhance developer productivity and enforce consistent, high-quality delivery standards across engineering teams.
  • Mentorship & Technical Leadership : Serve as a senior technical mentor and subject matter expert across platform, security, and engineering teams, establishing and promoting best practices in operational readiness, fault tolerance, and secure delivery.

Qualifications

  • Experience :
  • Bachelor's degree in Computer Science, Engineering, or a related technical discipline.

  • Minimum of 5 years of progressive experience in DevOps, Site Reliability Engineering (SRE), or Platform Engineering, with proven leadership experience in infrastructure reliability and automation.
  • 3+ years of direct, hands-on experience managing high-availability production environments with modern cloud-native security and observability tooling.
  • Technical Expertise :
  • Deep expertise in a major cloud platform (e.g., AWS, Azure, GCP), particularly in core services like Compute, Networking, Identity and Access Management (IAM), and monitoring.

  • Proficiency with Infrastructure-as-Code tools, specifically Terraform and CloudFormation, and container orchestration technologies like Kubernetes and Docker.
  • Strong working knowledge of Linux systems and shell scripting.
  • In-depth familiarity with observability stacks (e.g., Prometheus, Grafana, ELK, Datadog, CloudWatch).
  • Demonstrated experience designing, implementing, and managing CI / CD systems that incorporate security tollgates, rollback logic, and GitOps patterns.
  • Skills & Knowledge :
  • Strong scripting and programming skills in Python, Go, or Bash for automation and tooling development.

  • In-depth understanding of core SRE practices, including incident response, SLO / SLA management, chaos engineering, and capacity modeling.
  • Proven track record of creating shared tooling, documentation, and best practices that drive operational excellence and knowledge transfer across an organization.
  • Required Skills :

    Basic Qualification :

    Additional Skills :

    This is a high PRIORITY requisition. This is a PROACTIVE requisition

    Background Check : No

    Drug Screen : No

    serp_jobs.job_alerts.create_a_job

    Site Reliability Engineer • Washington, DC, United States

    Job_description.internal_linking.related_jobs
    • serp_jobs.job_card.promoted
    Staff Site Reliability Engineer

    Staff Site Reliability Engineer

    VisaAshburn, VA, United States
    serp_jobs.job_card.full_time
    Visa is a world leader in payments and technology, with over 259 billion payments transactions flowing safely between consumers, merchants, financial institutions, and government entities in more t...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_30
    • serp_jobs.job_card.promoted
    Reliability Engineer

    Reliability Engineer

    JobotFrederick, MD, US
    serp_jobs.job_card.full_time
    Manufacturing company hiring Reliability Engineer in Frederick County!.This Jobot Job is hosted by : Christine McNamara.Are you a fit? Easy Apply now by clicking the "Apply Now" buttonand ...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_variable_days
    • serp_jobs.job_card.promoted
    Staff Site Reliability Engineer (Federal)

    Staff Site Reliability Engineer (Federal)

    OktaWashington, DC, United States
    serp_jobs.job_card.full_time
    Okta is The World's Identity Company.We free everyone to safely use any technology, anywhere, on any device or app.Our flexible and neutral products, Okta Platform and Auth0 Platform, provide secur...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_30
    • serp_jobs.job_card.promoted
    • serp_jobs.job_card.new
    Sr Site Reliability Engineer - Remote

    Sr Site Reliability Engineer - Remote

    SitusAMCWashington, DC, United States
    serp_jobs.filters.remote
    serp_jobs.job_card.full_time
    SitusAMC is where the best and most passionate people come to transform our client’s businesses and their own careers.Whether you’re a real estate veteran, a passionate technologist, or looking to ...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_variable_hours
    • serp_jobs.job_card.promoted
    Site Reliability Engineer (Pipeline)

    Site Reliability Engineer (Pipeline)

    Technica CorporationWashington, DC, United States
    serp_jobs.job_card.full_time
    At Technica Corporation, our goal is to provide exceptional professional services and innovative technology solutions that meet or exceed our customer’s expectations. We specialize in a wide range o...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_1_day
    • serp_jobs.job_card.promoted
    Site Reliability Engineer - Redmond WA

    Site Reliability Engineer - Redmond WA

    Redis EnterpriseWashington, DC, United States
    serp_jobs.job_card.full_time
    We built the product that runs the fast apps our world runs on.If you checked the weather, used your credit card, or looked at your flight status online today, you’re welcome.At Redis, you’ll work ...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_30
    • serp_jobs.job_card.promoted
    Lead Software Engineer, Site Reliability

    Lead Software Engineer, Site Reliability

    Capital OneEast Case, MD, US
    serp_jobs.job_card.full_time +1
    Lead Software Engineer, Site Reliability Do you love building and pioneering in the technology space? Do you enjoy solving complex business problems in a fast-paced, collaborative, inclusive , and ...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_1_day
    • serp_jobs.job_card.promoted
    Site Reliability Engineer

    Site Reliability Engineer

    VirtualVocationsRockville, Maryland, United States
    serp_jobs.job_card.full_time
    A company is looking for a Mid-Sr.Site Reliability Engineer with a focus on on-prem Kubernetes / K8s.Key Responsibilities Manage and maintain on-premise containerized environments Deploy resources...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_30
    • serp_jobs.job_card.promoted
    Sr. Manager - Site Reliability Engineer

    Sr. Manager - Site Reliability Engineer

    VisaAshburn, VA, United States
    serp_jobs.job_card.full_time
    Visa is a world leader in payments and technology, with over 259 billion payments transactions flowing safely between consumers, merchants, financial institutions, and government entities in more t...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_30
    • serp_jobs.job_card.promoted
    Principal Site Reliability Engineer - Cloud (Remote)

    Principal Site Reliability Engineer - Cloud (Remote)

    Donnelley Financial, LLCRockville, MD, United States
    serp_jobs.filters.remote
    serp_jobs.job_card.full_time
    Join a dynamic team at the pulse of global markets, where we deliver innovative software and service solutions for essential financial reporting and capital markets transactions.At DFIN, we are a v...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_30
    • serp_jobs.job_card.promoted
    • serp_jobs.job_card.new
    Site Reliability / Gitops Engineer

    Site Reliability / Gitops Engineer

    CanonicalWashington, DC, United States
    serp_jobs.job_card.full_time
    Site Reliability / Gitops Engineer.Be among the first 25 applicants.Site Reliability / Gitops Engineer.Canonical is a leading provider of open source software and operating systems to the global en...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_variable_hours
    • serp_jobs.job_card.promoted
    Site Reliability Engineer - Developer, Connected Warfare

    Site Reliability Engineer - Developer, Connected Warfare

    Anduril IndustriesWashington, DC, United States
    serp_jobs.job_card.full_time
    Site Reliability Engineer, Connected Warfare.Washington, District of Columbia, United States.Anduril Industries is a defense technology company with a mission to transform U.By bringing the experti...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_30
    • serp_jobs.job_card.promoted
    Software Reliability Engineer

    Software Reliability Engineer

    RaftMcLean, VA, United States
    serp_jobs.job_card.full_time
    All of the programs we support require.All work must be conducted within the continental U.Distributed Data Systems, Platforms at Scale, and Complex Application Development, with headquarters in Mc...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_variable_days
    • serp_jobs.job_card.promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Karsun SolutionsWashington, DC, United States
    serp_jobs.job_card.full_time
    Summary : As a Site Reliability Engineer, you will help build out and run production environments, automate operations and maintain and support infrastructure. Drive and establish Service level objec...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_1_day
    • serp_jobs.job_card.promoted
    Site Reliability Engineer

    Site Reliability Engineer

    CSCI ConsultingQuantico, VA, United States
    serp_jobs.job_card.full_time
    CSCI Consulting is looking for a.Site Reliability Engineer (SRE).This role combines deep systems engineering knowledge with DevOps automation, proactive monitoring, and incident response practices....serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_30
    • serp_jobs.job_card.promoted
    • serp_jobs.job_card.new
    Cloud Site Reliability Engineer

    Cloud Site Reliability Engineer

    Ford Motor CompanyWashington, DC, United States
    serp_jobs.job_card.full_time
    Enterprise Technology is the engine driving the future of transportation.If you’re looking for the chance to leverage advanced technology to redefine the mobility landscape, enhance the customer ex...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_variable_hours
    • serp_jobs.job_card.promoted
    Site Reliability Engineer, Home

    Site Reliability Engineer, Home

    Google Inc.Washington, DC, United States
    serp_jobs.job_card.full_time
    Experience completing work as directed, and collaborating with teammates; developing knowledge of relevant concepts and processes. At Google, we have a vision of empowerment and equitable opportunit...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_1_day
    • serp_jobs.job_card.promoted
    • serp_jobs.job_card.new
    Sr Site Reliability Engineer - Remote US

    Sr Site Reliability Engineer - Remote US

    SitusAMCWashington, DC, United States
    serp_jobs.filters.remote
    serp_jobs.job_card.full_time
    SitusAMC is where the best and most passionate people come to transform our client’s businesses and their own careers.Whether you’re a real estate veteran, a passionate technologist, or looking to ...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_variable_hours
    • serp_jobs.job_card.promoted
    • serp_jobs.job_card.new
    Data Center Facility Operations Reliability Engineer

    Data Center Facility Operations Reliability Engineer

    MetaWashington, DC, United States
    serp_jobs.job_card.full_time
    Meta was built to help people connect and share, and over the last decade, our tools have played a critical part in changing how people around the world communicate with one another.With over two b...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_variable_hours
    • serp_jobs.job_card.promoted
    Site Reliability Engineer, Connected Warfare

    Site Reliability Engineer, Connected Warfare

    Jobs via DiceWashington, DC, United States
    serp_jobs.job_card.full_time
    Site Reliability Engineer, Connected Warfare.Posted 60+ days ago | Updated 10 hours ago.Anduril Industries is a defense technology company with a mission to transform U. By bringing the expertise, t...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_30