Job Summary :
We are seeking a highly skilled and motivated AI / ML Engineer to join our IT Operations team in an AIOps role. The ideal candidate will leverage artificial intelligence and machine learning to enhance observability, automate incident detection and resolution, and optimize IT operations. This role is critical in driving proactive operations and reducing mean time to resolution (MTTR) through intelligent automation.
Key Responsibilities :
- Design, develop, and deploy AI / ML models to detect anomalies, predict incidents, and automate root cause analysis in IT systems.
- Integrate AIOps solutions with monitoring tools (e.g., Splunk, Datadog, Prometheus, AppDynamics).
- Collaborate with SREs, DevOps, and IT teams to understand operational pain points and identify automation opportunities.
- Build and maintain data pipelines to collect and process logs, metrics, and events from various sources.
- Implement NLP techniques for log analysis and intelligent alert correlation.
- Continuously evaluate model performance and retrain models as needed.
- Contribute to the development of dashboards and visualizations for operational insights.
Required Qualifications :
Bachelor's or Master's degree in computer science, Data Science, or a related field.3+ years of experience in AI / ML engineering, preferably in IT operations or DevOps environments.Strong programming skills in PythonExperience with time-series analysis, anomaly detection, and predictive modeling.Familiarity with cloud platforms (AWS, Azure, or GCP) and containerized environments (Docker, Kubernetes).Knowledge of ITSM and ITIL processes is a plus.Preferred Qualifications :Experience with AIOps platforms DynatraceStrong problem-solving skills and ability to work in a fast-paced, collaborative environment.Agentic AI project planning and implementationManipulating text embeddings for enhanced language processingFamiliarity with popular JavaScript UI frameworks (React, Angular)Demonstrable knowledge of debugging code