Job Description :
- Kubernetes certified professional or an expert administrator of Kubernetes and Helm.
- A self-learner, self-driven, and able to operate with minimal supervision.
- Able to demonstrate expertise in at least one public cloud infrastructure (AWS / Azure / OCI).
- Be proficient in APM (Application Performance Monitoring) tools like Datadog APM, Dynatrace, AppDynamics, etc.
- Able to successfully communicate with business partners, management, and technical team members.
- Experienced SRE with development or DevOps background, worked on enterprise-scale applications.
- Proficient user of Monitoring and alerting tools.
- Proactive in raising problems and identifying solutions.
- AWS SysOps Associate or DevOps professional certified (or equivalent in other cloud service providers).
- Strong sense of customer service. Able to work in a highly collaborative team setting.
- Approaching work with a DevOps and continuous improvement mindset.
Qualifications :
- Bachelor's degree, Minimum of 5 years of experience in enterprise-level DevOps role.
- 3 years with Cloud AWS / Azure and 2 years with Kubernetes Administration.
- Expertise in Kubernetes administration / development, hands-on experience in Helm.
- Strong knowledge of infrastructure components (e.g., routers, load balancers, cloud products, container systems, compute, storage, and networks).
- Expertise is required in observability and monitoring tools like Dynatrace, Datadog, AppDynamics, Client, etc.
- A deep understanding of Application performance monitoring (APM) and user monitoring is essential.
- Sound knowledge of ITSM process, SI / SLO / SLA management, incident resolution, and automation techniques.
- Strong IP networking fundamentals and experience with usage of standard application protocols and messages (e.g., TCP / IP, HTTP, SOAP, RESTful APIs, XML / JSON, JDBC, JMS / MQ).
- Knowledge of Infrastructure as Code (IaC) : Ansible, AWS Cloud Formation, etc, is preferable.
- Apply standards of cloud compliance to application design to achieve reliability.
- Able to analyze application and server logs and error interpretation.
- Ability to code in one of the programming languages (Java, Python, Shell, etc).
- Experience in site reliability engineering in Java, Kubernetes, and Database platforms (like Postgres).
- The candidate should possess excellent written and verbal communication and collaboration skills.
30+ days ago