Search jobs > Minneapolis, MN > Sr site reliability

Sr Site Reliability Engineer

IDeaS
Minneapolis, Minnesota, US
Full-time

Read the overview of this opportunity to understand what skills, including and relevant soft skills and software package proficiencies, are required.

We are seeking a Senior Site Reliability Engineer that will be at the forefront of establishing and driving best practices in system reliability, performance optimization, and observability.

With over five years of experience, you bring deep expertise in software development and infrastructure operations, particularly in building and maintaining scalable, data-intensive systems.

Your key focus will be on defining and implementing Service Level Objectives (SLOs) and Service Level Indicators (SLIs) to ensure our solutions meet rigorous performance standards.

You will work closely with cross-functional teams to build observability frameworks that empower teams to monitor, diagnose, and improve system performance proactively.

Your leadership and persistence will be vital in identifying and resolving performance bottlenecks, ensuring long-term scalability and efficiency across our systems.

What you’ll be doing...

  • Collaborate with development and operations teams to design, implement, and maintain observability frameworks that provide deep insights into system performance, particularly for data and ML pipelines.
  • Lead the establishment of Service Level Objectives (SLOs) and Service Level Indicators (SLIs), ensuring they align with business goals and drive continuous performance improvements.
  • Partner with stakeholders to understand system performance requirements and translate them into actionable performance engineering strategies.
  • Proactively identify performance bottlenecks and collaborate with teams to implement solutions that enhance system scalability and reliability.
  • Design and execute performance regression test suites, focusing on data-intensive and ML workloads, to ensure continuous performance optimization.
  • Own the reliability and performance metrics of our systems, driving a culture of performance excellence and proactive issue resolution.
  • Collaborate with subject matter experts to gain a deep understanding of domain-specific performance challenges, particularly in data and ML pipelines.
  • Utilize tools like Datadog, Jira, and GitHub to monitor system performance, manage projects, and track issues, with a strong emphasis on performance-related metrics.
  • Define and monitor success metrics, ensuring our systems consistently meet or exceed performance and reliability targets.
  • Actively contribute to the continuous improvement of performance engineering practices across the team, fostering a culture of excellence in observability and system performance.
  • Perform other duties as assigned.

What you’ll bring to us

  • Bachelor’s degree in Computer Science, Engineering, or a related field.
  • Five years of experience in a site-reliability-focused role responsible for establishing reliability standards in a cloud-native environment.
  • Strong expertise in establishing SLOs / SLIs and building observability frameworks for complex systems.
  • Proficiency with cloud services, particularly AWS, and experience in designing scalable and reliable architectures.
  • Hands-on experience with performance monitoring and observability tools like Datadog.
  • Proficiency in version control systems like Git / GitHub and infrastructure as code tools like Terraform.
  • Strong interpersonal skills and excellent communication abilities, with a focus on driving performance improvements across teams.

Preferred :

  • Proficiency in Java programming and hands-on experience with REST, Spring and microservices development.
  • Proficiency in RDBMS schema design and index utilization.

We Support Who You Are

As a global company, we strive to create an inclusive environment where diverse perspectives spark innovation and meet the challenges of an evolving world.

Whether you’re launching a new career or expanding your current one, IDeaS is a company where you can balance great work with all other aspects of your life.

At IDeaS, we also aspire to live our values each day by being Accountable, Curious, Passionate and Authentic. And we continue our quest to build a more inclusive environment that attracts, represents and provides a place for diverse ideas, unique perspectives, and authentic voices.

Additional Information :

To qualify, applicants must be legally authorized to work in the United States , and should not require, now or in the future, sponsorship for employment visa status.

SAS is an equal opportunity / Affirmative Action employer. All qualified applicants are considered for employment without regard to race, color, religion, gender, sexual orientation, gender identity, age, national origin, disability status, protected veteran status or any other characteristic protected by law.

Equivalent combination of education, training, and relevant experience may be considered in place of the education requirement stated above.

Resumes may be considered in the order they are received.

IDeaS / SAS employees performing certain job functions may require access to technology or software subject to export or import regulations.

To comply with these regulations, IDeaS / SAS may obtain nationality or citizenship information from applicants for employment.

IDeaS / SAS collects this information solely for trade law compliance purposes and does not use it to discriminate unfairly in the hiring process.

J-18808-Ljbffr

2 days ago
Related jobs
SAS
Bloomington, Minnesota

We are seeking a Senior Site Reliability Engineer at IDeaS, a SAS Company. You will play a pivotal role in ensuring the reliability, scalability, and performance of our revenue science software solutions. Your strong communication skills will be instrumental as you proactively build relationships an...

IDeaS
Minneapolis, Minnesota

We are seeking a Senior Site Reliability Engineer that will be at the forefront of establishing and driving best practices in system reliability, performance optimization, and observability. Five years of experience in a site-reliability-focused role responsible for establishing reliability standard...

Patterson Companies, Inc.
Saint Paul, Minnesota
Remote

Site Reliability Engineer (SRE) is responsible for the availability, latency, performance, efficiency, change management, monitoring, emergency response, and capacity planning. Plan, design, deploy, and operate Site Reliability Engineering capabilities for cloud products & services. DevOps and Site ...

Federal Reserve System
Minneapolis, Minnesota
Remote

As a Senior Engineer of the SRE / Production Operations team for FedNow, you will operate the production environment for the program. The team uses open source and proprietary software to support Engineering, DevOps, and DevSecOps tools, services, and solutions. The SRE / Production Operations team ...

Shipt
Minneapolis, Minnesota
Remote

As our Site Reliability Engineer in Birmingham, Minneapolis, or remote you will work with our metrics pipeline to help our developers gain insight into their microservices. You will work on a team of 4 engineers to develop, manage, and administrate real-time production monitoring, instrumentation, a...

NetApp
Edina, Minnesota

Title: Site Reliability Engineer (SRE). As a Cloud Infrastructure/Site Reliability Engineer, you will be operating at the intersection of development and operations. Team Collaboration and Influence: Work in tandem with other Cloud Infrastructure Engineers and developers to ensure maximum performanc...

Inspire Medical Systems I
Minneapolis, Minnesota

Senior Software Engineer, Site Reliability – Minneapolis, MN. Senior Software Engineer, Site Reliability. As an integral part of our DevOps team, you will work closely with our engineers and scientists to debug applications and develop solutions for our next generation Inspire products. Bachelor’s D...

Federal Reserve System
Minneapolis, Minnesota

As a Senior Cloud Reliability Engineer in the SRE chapter, you will be accountable for implementing reliability practices using software as means for the cloud foundational product line in the Federal Reserve. The SRE Chapter is part of the Cloud Solutions & Services department and has the overall r...

Thomson Reuters
Eagan, Minnesota

Thomson Reuters is seeking a Senior Site Reliability Engineer to join our Service Management, Technology team. In this opportunity as Senior Site Reliability Engineer, you will:. You're a fit for the role of Senior Site Reliability Engineer if your background includes:. DevOps Engineer, Cloud Engine...

Novon Consulting
Minneapolis, Minnesota

We are seeking a Senior Site Reliability Engineer thatwill be at the forefront of establishing and driving best practicesin system reliability performance optimization and observability. Define and monitor success metricsensuring our systems consistently meet or exceed performance andreliability tar...