SRE ( Site Reliability Engineer)

Carman Solutions Group

WA, United States

Full-time

Quick Apply

Role- Sr. SRE ( Site Reliability Engineer)

Location - Seattle WA- - needs to come to office 3 days a week.

Job Type- Contract (12 months)

Core skills needed -

Azure Clous, AKS Scalability, monitoring, deployment, check logs, ensure node and pod health.

Databases include - Cassandra, Mongo, PostGres

Databricks Notebooks There are a lot of jobs on Databricks experience with Databricks to know how a notebook is created and run - run queries against the database and finding discrepancies and perform fixes.

Based microservices, responsible for deployment, scripting language is python.

Should have an understanding around terraform.

Emphasis on Logs and Monitoring (datadog and splunk)

Summary of Experience

Requires 10-12 years experience in the IT industry
Requires 9+ years of software and DevOps development engineering
Experience in working with cloud environment Azure preferred.
Experience with Kubernetes, Azure Kubernetes (AKS) preferred.
Experience with using Kafka, Event Hub, NATS or any messaging broker.
Experience with Cassandra, PostgresSQL, Mongo, Elastic Search, Cosmos DB
Experience on Azure DevOps, Jenkins / Python / Terraform / Ansible
Experience with Databricks
Experience with DataDog, Splunk or other logging and APM tools.
Experience in working with Linux environment.
In-depth understanding of Computer Science fundamentals in object-oriented design, data structures, algorithms, and problem solving
Experience building complex, scalable, high-performance software systems that have been successfully delivered to customers
Demonstrated knowledge of best practices for the design and implementation of large-scale systems as well as experience in taking such systems from design to production
Experience building and operating mission critical, highly available (24x7) systems
Ability to work well with a team in a fast-paced agile development environment.
Bachelors in Computer Science or equivalent work experience.
Excellent communication, analytical and problem-solving skills
Extensive understanding in SDLC and scrum methodologies.

Job Summary and Mission

We are seeking an experienced, self-motivated Senior Engineer who is technically very strong with strong Linux background, with deep knowledge in micro services, backend storage design, NoSQL database, distributed systems and very good troubleshooting skills.

Typical activities include production monitoring, creating monitoring dashboards, setting up alerts, triaging alerts coupled with the ability to drive efforts and solution improvements effectively across various IT and business functions.

In this role, person will be responsible for setting up monitoring dashboards, alerts, maintaining production systems, deploying code in Production, monitoring alerts, resolving issues, and leading production troubleshooting calls.

Working with Product Owners and other developers to implement highly scalable reactive application platform solutions in Cloud based Linux environments.

Summary of Key Responsibilities

Responsibilities and essential job functions include but are not limited to the following :

Responsible for health of production system
Develop monitoring dashboards
Configure alerts and automate process for system recovery
Monitor alerts and take proactive steps to resolve system issues
Troubleshoot production issues
Lead production troubleshooting calls
Responsible for patches and updates on production systems.
Design and build cutting-edge, multi-micro service solutions to support Starbucks's growth worldwide.
Work with cross-functional teams for on-going design efforts and systems support.
Automate password and certificate rotations on application and DB servers.
Helping CI / CD team during rolling out application and infrastructure globally.
Collaborates with development team, other Information Technology (IT) team's developer leads. Initiates process improvements for new and existing systems.
Coaches, and mentors other team members. Performs cross-training and facilitates information sharing among team members.
Participates in a production support rotation that includes pager responsibilities.
Ability to accurately break down complex application designs into component deliverables and estimate design and development timelines

General IT Skills :

Experience in Application support Problem diagnosis and resolution

Expert in interpretation of functional requirements

Development of technical design specifications for complex projects

Expert in industry standard development methodologies

Experience in middleware integration using tools like Web Methods

Integrate application support efforts with concurrent, parallel application development efforts

5 days ago

Related jobs

Promoted

Site Reliability Engineer (SRE) - Object Storage

Apple

Seattle, Washington

We are looking for seasoned software and systems engineers to join the Object Storage SRE team at Apple. The storage SRE teams of Apple Cloud are building and running the next generation distributed storage systems to support Apple's most critical services. As a Storage SRE at Apple, you'll need to ...

Promoted

Sr Site Reliability Engineer

VirtualVocations

Seattle, Washington

...

Promoted

Site Reliability Engineering (SRE) Manager, Apple Services Engineering

Apple

Seattle, Washington

Bring passion and dedication to your job and there's no telling what you could accomplish! Join the Apple Service Engineering team as a Site Reliability Engineering (SRE) Manager to help support and scale cloud services for thousands of development and operations engineers. As a Site Reliability Eng...

Promoted

Site Reliability Engineer - Kentucky

VirtualVocations

Seattle, Washington

A company is looking for a Site Reliability Engineer in Remote Kentucky. ...

Site Reliability Engineer - Data Infrastructure (Seattle)

ByteDance

Seattle, Washington

Our data infrastructure Site Reliability Engineering (SRE) team is a pioneer in innovation. In this era, SRE takes a central role. Our dynamic SRE field is about actively shaping the future of technology, not just keeping pace with it. Establish sustainable mechanisms for scaling systems, such as au...

Promoted

Staff Software Engineer Site Reliability

VirtualVocations

Seattle, Washington

A company is looking for a Staff Software Engineer, Site Reliability. ...

Senior Site Reliability Engineer

Microsoft

Redmond, Washington

OR Bachelor's Degree in Computer Science, Information Technology, or related field AND 3+ years technical experience in software engineering, network engineering, or systems administrationOR Master's Degree in Computer Science, Information Technology, or related field AND 2+ years technical experien...

Senior or Staff Site Reliability Engineer - Cloud Infrastructure

Circle

Wenatchee, Washington

As a Senior Site Reliability Engineer at Circle, you will design, build, and maintain Circle's infrastructure estate to meet the growing worldwide customer base on public cloud providers across multiple regions. Staff Site Reliability Engineer (IV). Senior Site Reliability Engineer (III). Senior Sit...

Senior or Staff Site Reliability Engineer - Data Infrastructure

CIRCLE

Seattle, Washington

As a Senior Site Reliability Engineer at Circle, you will design, build, and maintain Circle’s infrastructure estate to meet the growing worldwide customer base on public cloud providers across multiple regions. Staff Site Reliability Engineer (IV). Senior Site Reliability Engineer (III). Senior Sit...

Principal Site Reliability Engineer (Join OCI-Ns2)

Oracle

Seattle, Washington

We’re looking for Site Reliability Engineers (SRE’s) to help build highly distributed systems, platform services and tools for a highly distributed multi-tenant cloud environment at massive scale. When not working on operations the SRE is working on software engineering tasks such as design and deve...