Job Description
Job Description
Who we are :
At Focused, we move quickly to deliver quality software that achieves client outcomes and meets their customer's needs. We strategically partner with our clients to leverage our expertise in design and software, while our clients bring their own domain expertise. We work with a variety of clients from different industries, collaborating as we get new products to market, modernizing legacy systems, or helping teams learn the skills they need to be successful.
Our values :
- Listen first
- We are experts in product practices but life long learners in the domain of our customers. We research, collaborate, and understand.
- Learn why
- We ask questions and talk to users to understand problem spaces, objectives, and goals, which allows us to deeply invest and drive towards the outcomes of our clients.
- Love your craft
- We love diving into a variety of domains and solving problems. We take pride in delivering value, in communicating progress, and guiding our clients to success.
We are seeking an experienced Observability Consultant with deep expertise in OpenTelemetry and strong Platform Engineering capabilities to help organizations implement, optimize, and scale their observability infrastructure. This role requires a specialist who can design comprehensive telemetry strategies, implement distributed tracing solutions, establish robust monitoring practices, and interface closely with clients on the observability journey.
Key Responsibilities :
OpenTelemetry & Observability
Design and implement end-to-end OpenTelemetry solutions across diverse technology stacksConfigure and deploy OpenTelemetry Collectors for efficient data collection, processing, sampling, and routingEstablish telemetry pipelines for metrics, traces, and logs across microservices architecturesOptimize collector configurations for performance, reliability, and cost-effectivenessPlatform Engineering & Infrastructure
Augment existing infrastructure with with integrated observability solutionsImplement Infrastructure as Code (IaC) solutions using Terraform, Pulumi, CloudFormation, etc.Architect and manage Kubernetes clusters with comprehensive monitoring and loggingBuild CI / CD pipelines with embedded observability and automated testingSite Reliability Engineering (SRE)
Establish and maintain Service Level Indicators (SLIs), Objectives (SLOs), and Agreements (SLAs)Implement error budgets, toil reduction strategies, and capacity planningSupport incident response procedures and post-mortem processesCloud & DevOps Engineering
Deploy and manage observability infrastructure across AWS, GCP, and AzureEstablish security, compliance, and governance frameworks for telemetry dataExperience automating Agent Evaluations in CI / CD pipelines and observability backends.Required Qualifications :
Core Observability & OpenTelemetry
3-5 years of experience in observability, monitoring, and distributed systemsDeep hands-on experience with OpenTelemetry ecosystem, including SDKs, APIs, and specificationsProficiency with OpenTelemetry Collector configuration, processors, exporters, and receiversStrong understanding of telemetry data models, semantic conventions, and instrumentation best practicesPlatform Engineering & DevOps
5+ years of Platform Engineering or DevOps experience with focus on site reliability, observability, and incident responseProficiency with Infrastructure as Code tools (Terraform, Pulumi, CloudFormation, CDK)Strong experience with CI / CD platforms (GitHub Actions, GitLab CI, Jenkins, ArgoCD)Cloud & Infrastructure
Hands-on experience with major cloud providers (AWS, GCP, Azure) and their observability servicesExperience with container technologies (Docker, Podman) and container registriesKnowledge of networking, security, load balancing, and distributed systems conceptsSite Reliability Engineering
Experience implementing SRE practices including error budgets and toil metricsProficiency in incident management, on-call procedures, and post-mortem cultureExperience with capacity planning, performance optimization, and scalability designProgramming & Automation
Proficiency in multiple programming languages preferred (Go, Python, Java, Node.js, Rust)Strong scripting and automation skills (Bash, Python, PowerShell)Understanding of software engineering best practices and testing methodologiesPreferred Qualifications (Exceptional Candidates)
AI & Agentic Frameworks
Understanding of Large Language Models (LLMs) and their application in DevOpsKnowledge of vector databases, embeddings, and retrieval-augmented generation (RAG)Experience with AI / ML model deployment and monitoring in production environmentsLeadership & Communication
Strong technical writing and documentation skillsAbility to present complex technical concepts to diverse stakeholdersA passion for knowledge sharingKey Competencies
Systems thinking and ability to design holistic observability solutionsStrong analytical and troubleshooting skills for complex distributed systemsCuriosity about emerging technologies, particularly AI applications in operationsAdaptability to rapidly evolving cloud-native and observability technologiesCollaborative mindset with focus on enabling developer productivity and system reliabilityWhat Sets Exceptional Candidates Apart :
Experience with HoneycombContributions to open-source observability or AI framework projectsTrack record of implementing platform engineering solutions that significantly improved developer experienceExperience scaling observability infrastructure to handle high event volumeWhat to know before you apply :
This role will require being in the Chicago office three days per week and up to 20% travel within the United States.Focused is unable to sponsor or take over sponsorship of the employment Visa process at this time.The Chicago base salary range for this role is $130,000 - $170,000.