Randstad is seeking a skilled and proactive Site Reliability Engineer (SRE) to join our client in the Washington D.C. area, focusing on optimizing the availability, performance, and scalability of critical production services. The ideal candidate will bridge the gap between development and operations by applying software engineering principles to infrastructure and operational problems. This role requires a strong background in CI / CD pipeline development, infrastructure automation using Infrastructure-as-Code (IaC), incident response, and deep experience with cloud platforms, preferably AWS. The SRE will collaborate across engineering teams to drive automation, enhance observability, and ensure the continuous, secure delivery of high-quality software.
Responsibilities
- Deployment & Automation : Design, build, and maintain robust Continuous Integration / Continuous Delivery (CI / CD) pipelines utilizing tools suchs as GitHub Actions, Jenkins, or AWS CodePipeline.
- Infrastructure-as-Code (IaC) : Automate the provisioning and management of cloud infrastructure using IaC tools like Terraform, CloudFormation, or AWS CDK.
- Monitoring & Observability : Develop comprehensive monitoring dashboards, alerting rules, and logging configurations using platforms such as AppDynamics, CloudWatch, or Dynatrace to proactively ensure systems meet defined Service Level Objectives (SLOs).
- Incident Response & Remediation : Participate in a rotating on-call schedule, triage and resolve high-priority incidents, and conduct blameless postmortem reviews to identify and implement root cause remediations.
- Security & Compliance : Contribute to a DevSecOps culture by assisting with secrets management and integrating security scanning tools (e.g., AWS ECR, Checkmarx, Synk) directly into CI / CD pipelines.
- Documentation & Knowledge Sharing : Create and maintain high-quality technical documentation, runbooks, and escalation procedures to ensure system readiness and operational efficiency.
- Cross-Functional Collaboration : Partner with application developers, infrastructure engineers, and security teams to successfully deploy and sustain production-grade services.
- Database Management : Apply knowledge of relational (MySQL, PostgreSQL) and NoSQL (MongoDB) databases to optimize database structures and contribute to data modeling efforts.
QualificationsRequired Experience & Technical Skills
2+ years of hands-on experience in a Site Reliability Engineering, DevOps, or Infrastructure support role.Proficiency with at least one major cloud platform (AWS experience is strongly preferred).Experience with building and managing CI / CD pipelines (e.g., Jenkins, GitHub Actions, AWS CodePipeline).Proficiency in automating infrastructure with an IaC tool (e.g., Terraform, CloudFormation).Strong working knowledge of Linux-based systems and shell scripting.Familiarity with version control systems, particularly Git.Understanding of core monitoring and alerting principles and experience with common observability tools.Basic understanding of core cloud services (e.g., AWS S3, EFS, Kinesis) and basic troubleshooting techniques.Willingness to participate in on-call rotations and take ownership of service reliability.Education & Soft Skills
Bachelor's degree in Computer Science, Information Systems, Engineering, or a related technical field-or equivalent hands-on professional experience.Strong desire to learn and continually grow expertise in automation, observability, and SRE best practices.Excellent problem-solving, analytical, and communication skills to work effectively with diverse teams.Required Skills :
Basic Qualification :
Additional Skills :
This is a high PRIORITY requisition. This is a PROACTIVE requisition
Background Check : No
Drug Screen : No