SRE ( Site Reliability Engineer)

Carman Solutions Group
WA, United States
Full-time
Quick Apply

Role- Sr. SRE ( Site Reliability Engineer)

Location - Seattle WA- - needs to come to office 3 days a week.

Job Type- Contract (12 months)

Core skills needed -

Azure Clous, AKS Scalability, monitoring, deployment, check logs, ensure node and pod health.

Databases include - Cassandra, Mongo, PostGres

Databricks Notebooks There are a lot of jobs on Databricks experience with Databricks to know how a notebook is created and run - run queries against the database and finding discrepancies and perform fixes.

Based microservices, responsible for deployment, scripting language is python.

Should have an understanding around terraform.

Emphasis on Logs and Monitoring (datadog and splunk)

Summary of Experience

  • Requires 10-12 years experience in the IT industry
  • Requires 9+ years of software and DevOps development engineering
  • Experience in working with cloud environment Azure preferred.
  • Experience with Kubernetes, Azure Kubernetes (AKS) preferred.
  • Experience with using Kafka, Event Hub, NATS or any messaging broker.
  • Experience with Cassandra, PostgresSQL, Mongo, Elastic Search, Cosmos DB
  • Experience on Azure DevOps, Jenkins / Python / Terraform / Ansible
  • Experience with Databricks
  • Experience with DataDog, Splunk or other logging and APM tools.
  • Experience in working with Linux environment.
  • In-depth understanding of Computer Science fundamentals in object-oriented design, data structures, algorithms, and problem solving
  • Experience building complex, scalable, high-performance software systems that have been successfully delivered to customers
  • Demonstrated knowledge of best practices for the design and implementation of large-scale systems as well as experience in taking such systems from design to production
  • Experience building and operating mission critical, highly available (24x7) systems
  • Ability to work well with a team in a fast-paced agile development environment.
  • Bachelors in Computer Science or equivalent work experience.
  • Excellent communication, analytical and problem-solving skills
  • Extensive understanding in SDLC and scrum methodologies.

Job Summary and Mission

We are seeking an experienced, self-motivated Senior Engineer who is technically very strong with strong Linux background, with deep knowledge in micro services, backend storage design, NoSQL database, distributed systems and very good troubleshooting skills.

Typical activities include production monitoring, creating monitoring dashboards, setting up alerts, triaging alerts coupled with the ability to drive efforts and solution improvements effectively across various IT and business functions.

In this role, person will be responsible for setting up monitoring dashboards, alerts, maintaining production systems, deploying code in Production, monitoring alerts, resolving issues, and leading production troubleshooting calls.

Working with Product Owners and other developers to implement highly scalable reactive application platform solutions in Cloud based Linux environments.

Summary of Key Responsibilities

Responsibilities and essential job functions include but are not limited to the following :

  • Responsible for health of production system
  • Develop monitoring dashboards
  • Configure alerts and automate process for system recovery
  • Monitor alerts and take proactive steps to resolve system issues
  • Troubleshoot production issues
  • Lead production troubleshooting calls
  • Responsible for patches and updates on production systems.
  • Design and build cutting-edge, multi-micro service solutions to support Starbucks's growth worldwide.
  • Work with cross-functional teams for on-going design efforts and systems support.
  • Automate password and certificate rotations on application and DB servers.
  • Helping CI / CD team during rolling out application and infrastructure globally.
  • Collaborates with development team, other Information Technology (IT) team's developer leads. Initiates process improvements for new and existing systems.
  • Coaches, and mentors other team members. Performs cross-training and facilitates information sharing among team members.
  • Participates in a production support rotation that includes pager responsibilities.
  • Ability to accurately break down complex application designs into component deliverables and estimate design and development timelines

General IT Skills :

Experience in Application support Problem diagnosis and resolution

Expert in interpretation of functional requirements

Development of technical design specifications for complex projects

Expert in industry standard development methodologies

Experience in middleware integration using tools like Web Methods

Integrate application support efforts with concurrent, parallel application development efforts

5 days ago
Related jobs
Promoted
Apple
Seattle, Washington

We are looking for seasoned software and systems engineers to join the Object Storage SRE team at Apple. The storage SRE teams of Apple Cloud are building and running the next generation distributed storage systems to support Apple's most critical services. As a Storage SRE at Apple, you'll need to ...

Promoted
VirtualVocations
Seattle, Washington

...

Promoted
Apple
Seattle, Washington

Bring passion and dedication to your job and there's no telling what you could accomplish! Join the Apple Service Engineering team as a Site Reliability Engineering (SRE) Manager to help support and scale cloud services for thousands of development and operations engineers. As a Site Reliability Eng...

Promoted
VirtualVocations
Seattle, Washington

A company is looking for a Site Reliability Engineer in Remote Kentucky. ...

ByteDance
Seattle, Washington

Our data infrastructure Site Reliability Engineering (SRE) team is a pioneer in innovation. In this era, SRE takes a central role. Our dynamic SRE field is about actively shaping the future of technology, not just keeping pace with it. Establish sustainable mechanisms for scaling systems, such as au...

Promoted
VirtualVocations
Seattle, Washington

A company is looking for a Staff Software Engineer, Site Reliability. ...

Microsoft
Redmond, Washington

OR Bachelor's Degree in Computer Science, Information Technology, or related field AND 3+ years technical experience in software engineering, network engineering, or systems administrationOR Master's Degree in Computer Science, Information Technology, or related field AND 2+ years technical experien...

Circle
Wenatchee, Washington

As a Senior Site Reliability Engineer at Circle, you will design, build, and maintain Circle's infrastructure estate to meet the growing worldwide customer base on public cloud providers across multiple regions. Staff Site Reliability Engineer (IV). Senior Site Reliability Engineer (III). Senior Sit...

CIRCLE
Seattle, Washington

As a Senior Site Reliability Engineer at Circle, you will design, build, and maintain Circle’s infrastructure estate to meet the growing worldwide customer base on public cloud providers across multiple regions. Staff Site Reliability Engineer (IV). Senior Site Reliability Engineer (III). Senior Sit...

Oracle
Seattle, Washington

We’re looking for Site Reliability Engineers (SRE’s) to help build highly distributed systems, platform services and tools for a highly distributed multi-tenant cloud environment at massive scale. When not working on operations the SRE is working on software engineering tasks such as design and deve...