Role : Site Reliability Engineer (Ex - Fidelity Exp)
Location : Remote
Position Type : Contract
Key Responsibilities
- Design, implement, and manage Kubernetes environments from deployment to configuration, monitoring, and troubleshooting
- Build and maintain scalable and reliable infrastructure using infrastructure as code principles
- Develop comprehensive monitoring solutions and implement alerting strategies
- Analyze system performance bottlenecks and implement improvements
- Implement and maintain CI / CD pipelines for seamless deployments
- Conduct incident response, root cause analysis, and implement preventative measures
- Create and enhance automation tools leveraging AI / ML where applicable
- Collaborate with development teams to improve application reliability and performance
Required Qualifications
5-7 years of experience in SRE or DevOps rolesStrong expertise with Kubernetes ecosystem and container orchestrationDeep understanding of Linux / Unix operating systems and performance analysis tools (NMON, etc.)Experience with log analysis, monitoring systems, and observability toolsProficiency in database administration and performance tuning (Oracle, SQL Server)Strong programming skills in at least one of : Python, Go, Java, or Node.jsExperience developing automation tools and frameworksProven track record of proactive problem identification and resolutionPreferred Qualifications
Experience with AI / ML integration into operational workflowsCloud platform experience (AWS, GCP, Azure)Knowledge of service mesh technologiesExperience with distributed systems architectureFamiliarity with security best practices and compliance requirementsPersonal Qualities
Proactive mindset with strong analytical and problem-solving abilitiesCollaborative approach to working across development and operations teamsExcellent communication skills and ability to explain complex technical conceptsSelf-motivated with the ability to work independently and as part of a teamPassion for continuous improvement and learning