Site Reliability Engineer
This company is looking for a Site Reliability Engineer to lead a team responsible for building, managing, maintaining, and scaling the centralized infrastructure services that support our mission-critical operations.
The company is located in Herndon, VA and will remain remote friendly. Requiring a couple days on site a month.
What You Will Be Doing :
- Oversee the design of software solutions that integrate Open Source, Commercial Off-The-Shelf (COTS), and custom-developed components.
- Deploy, configure, and manage services across production, QA, and development environments on platforms such as OpenStack and Docker.
- Build and manage infrastructure using Terraform.
- Develop deployment automation tools using Ansible.
- Create automation and configuration management solutions with SaltStack and Jenkins.
- Implement encryption solutions with HashiCorp Vault.
- Contribute to the development of a large-scale Software Defined Network (SDN) using Guardicore.
- Document processes, procedures, configurations, and deployment plans.
- Collaborate with technical teams to implement systems and software.
- Occasionally provide operational support, including troubleshooting and problem resolution.
- Offer technical leadership in operational processes and change management, while mentoring less experienced engineers.
- Provide regular progress updates to management.
- Participate in a 24x7 on-call rotation.
Required Skills & Experience :
- Bachelor’s degree in Computer Science, a related technical field, or equivalent education and experience.
- 8+ years of experience in developing and managing mission-critical systems.
- In-depth knowledge of Linux configuration and administration.
- Proficiency in a high-level scripting language such as Python.
- Extensive experience with automation, including not only development but understanding the purpose and key areas for automation.
- Strong grasp of infrastructure-as-code principles.
- Excellent written and verbal communication skills, with the ability to clearly explain complex issues.
- Solid understanding of network protocols and security practices.
- Experience building and optimizing monitoring and reporting solutions using tools like Grafana and Splunk.
- Familiarity with development tools such as GitHub, Jira, and Confluence.
Preferred Skills and Experience :
- Expertise in deployment automation using tools like Ansible.
- Hands-on experience with Jenkins in a continuous integration and delivery environment.
- Experience with Docker or Kubernetes in a production setting.
- Familiarity with OpenStack in production environments.
- Knowledge of HTTP proxies like Squid.
- Experience working with Red Hat Enterprise Linux and / or FreeBSD.
- Familiarity with CMDB and ITIL platforms such as ServiceNow.
- Experience with RedHat Identity Manager and / or FreeIPA.
- Administration of Linux and Unix systems in large-scale environments.
- Experience with VMware in a production environment.
- Familiarity with Agile methodologies, including Kanban and / or Scrum.
- Experience in Registry Services, E-commerce, or ISP environments is a plus.
Applicants must be currently authorized to work in the United States on a full-time basis now and in the future.
This position doesn’t provide sponsorship.
30+ days ago