Open Telemetry (SME) Consultant

Brilliance Cyber Systems INC
Georgia, USA
Full-time

Job Title : Open Telemetry (SME) Consultant

Location : Remote

Duration : Long Term Contract

Job Description :

We are seeking an experienced monitoring tools and Open Telemetry Subject Matter Expert (SME) who will be responsible for designing, implementing and optimizing monitoring solutions and leveraging Open Telemetry to enhance observability within the Enterprise Command Center (ECC).

The SME should collaborate with the Incident Management team to troubleshoot and resolve incidents.

Key Job Functions

Lead the design and implementation of monitoring solutions using industry standard tools such as Splunk and others.

Customize monitoring configurations to align with the organizational requirements.

Implement and integrate Open Telemetry across various applications and services for enhanced observability.

Optimize monitoring solutions for efficiency and accuracy ensuring minimal impact on system performance.

Responsible for designing and implementing application and infrastructure performance monitoring under AWS Cloud environment.

Create monitors and dashboards to monitor applications and infrastructure performance.

Perform deep statistical analysis using performance data to help identify capacity and performance bottlenecks.

Configure alerting mechanisms within monitoring tools to proactively identify and address potential issues.

Develop comprehensive documentation for monitoring tool configurations, Open Telemetry implementations and best practices.

Provide training to incident management teams on utilizing monitoring tools and interpreting open telemetry data effectively.

Setup monitoring dashboards for incident detection and alerting.

Perform end-to-end analysis of transactions under an observability environment.

Troubleshoot incidents and identify root cause quickly using wire data analytics, application performance management and event correlation monitoring tools.

Diagnose and resolve incidents by providing factual data from the various monitoring and instrumentation systems.

Job Requirements :

A good understanding of the IT Cloud infrastructure that includes AWS Cloud, middleware, database, storage and / or network infrastructure.

Strong understanding of IT infrastructure, networking, security concepts and application architecture.

Hands-on experience with Open Telemetry instrumentation and telemetry data collection.

Proven experience as a Splunk SM with in-depth knowledge of Splunk architecture and components.

Excellent troubleshooting and problem-solving skills.

Strong documentation skills and attention to detail.

Proactively monitoring of hardware, software, and environmental alerts or malfunctions.

Analyze dashboards and monitoring tools to look for trends and patterns in application / infrastructure health and performance.

Monitor applications and infrastructure using tools like Splunk, DynaTrace, Catchpoint, MoogSoft, xMatters, SignalFx, Catchpoint, MoogSoft, xMatters, SolarWinds, Extrahop etc.

Expert understanding of micro service-based applications deployed in Cloud using Lambdas, ECS Fargate etc.

Proficiency in AWS services like IAM, Roles, Security groups, EC2, S3, Lambda, ALB, ECS etc.

Experience working with AWS tools like ELB, RDS, Redshift, DynamoDB, Aurora, Route53, Lambda, S3, Batch, CloudWatch, CloudTrail, WAF etc.

Hands on experience with transaction level monitoring using Dynatrace and Splunk.

Create Splunk search queries and dashboards.

Be the SME in helping recognize and onboard new data sources into Splunk and other tools, analyze the data for anomalies and trends, and building dashboards highlighting the key trends of the data.

Implement best in class engineering strategies to support a distributed clustered Splunk environment consisting of Search Heads, Indexers, Forwarders, Splunk Enterprise Security (ES) app spanning security, performance, engineering, and operational roles.

Use open-source Observability framework, Open Telemetry for instrumenting, generating, collecting, and exporting telemetry data such as traces, metrics, logs to help analyze application performance and behavior.

Use distributed tracing in an end-to-end visibility environment that consists of micro-services, Containers, Serverless and Lambda.

Work closely with application teams and business stakeholders to perform troubleshooting and aid in incident triage.

Influence other technical teams on incident calls and articulate troubleshooting steps effectively.

Follow up on items that could negatively impact production operations, assist with postmortem related activities, and support various efforts related to operational improvements.

Strong relationship management skills and aptitude to multi-task and work well in a high stress environment, both within teams and independently.

Preferred Qualifications

Familiarity with distributed tracing and logging solutions.

Knowledge of Cloud Platforms (AWS, Azure) and their integration with monitoring tools.

AWS Solution Architect Associate or higher certification.

Exposure working under a incident management environment.

Triage incidents to resolution in a 24 / 7 / 365 environment, effectively guide incident triage calls from a technical perspective, share technical details obtained from monitoring tools and dashboards to aid troubleshooting, outline details of resolution activities provide timely status updates to stakeholders, assist with postmortem related activities and support various efforts related to operational improvements.

Ability to report incident details and metrics to senior leadership.

Perform analysis of data, evaluating multiple application protocols including web, database, storage, and supporting infrastructure such as UNIX, DNS, LDAP, SSL, SMTP, and FTP.

Proficient in Scripting - UNIX / LINUX- Shell Scripting & Python. Working knowledge of JavaScript / Perl etc. for customizing monitoring configurations

Certification in relevant monitoring tools or Open Telemetry is a plus.

1 day ago
Related jobs
Promoted
Brilliance Cyber Systems INC
Atlanta, Georgia

We are seeking an experienced monitoring tools and Open Telemetry Subject Matter Expert (SME) who will be responsible for designing implementing and optimizing monitoring solutions and leveraging Open Telemetry to enhance observability within the Enterprise Command Center (ECC). Job Title: Open Tele...

Promoted
Scicom Infrastructure Services
Atlanta, Georgia

Experience with implementing Open Telemetry instrumentation and data collectors. Knowledge of one or more monitoring tools such as Dynatrace, Lightstep, AppDynamics and integration with Open Telemetry. We need a hands-on architect who has done OpenTel implementation i. OpenTel code in any language (...

Brilliance Cyber Systems INC
Atlanta, Georgia

We are seeking an experienced monitoring tools and Open Telemetry Subject Matter Expert (SME) who will be responsible for designing implementing and optimizing monitoring solutions and leveraging Open Telemetry to enhance observability within the Enterprise Command Center (ECC). Job Title: Open Tele...

Brilliance Cyber Systems INC
Georgia, USA

Job Title: Open Telemetry (SME) Consultant<br /><br />Location: Remote<br /><br />Duration: Long Term Contract<br /><br /> <br /><br /> <br /><br />Job Description:<br /><br />We are seeking an experienced monitoring tools and O...

Brilliance Cyber Systems INC
Atlanta, Georgia

Job Title: Open Telemetry (SME)Consultant. Provide training toincident management teams on utilizing monitoring tools andinterpreting open telemetry data effectively. Certification in relevantmonitoring tools or Open Telemetry is aplus. The SMEshould collaborate with the Incident Management team tot...

Promoted
VirtualVocations
Decatur, Georgia

A company is looking for a RevOps Project Manager to manage complex projects with multiple stakeholders. ...

Promoted
Penske Truck Leasing
Lawrenceville, Georgia

The Operations Manager provides leadership to staff to ensure customer. High-level Requirements**_**-**The Operations Manager provides leadership. ...

Promoted
The Arc Southwest Georgia
Albany, Georgia

This position will report to the Project Director. Serves as on-call case manager as scheduled. This position will report to the Project Director. Serves as on-call case manager as scheduled. ...

Promoted
Willow Bridge Property Company
Conyers, Georgia

Willow Bridge is currently hiring for an experienced Leasing & Marketing Professional to oversee the leasing of apartments and assist with resident relations. The responsibilities of the Leasing & Marketing Professional are as follows:. Manage all aspects of the leasing process, including le...

Promoted
CLASS Leasing
Atlanta, Georgia

We assign Leasing Specialists to different communities where they will act as a leasing specialist and lease apartments for approximately 35 to 45 consecutive days — after which they receive a two-week paid vacation between assignments. A CLASS Leasing Specialist is given the opportunity to tr...