Site Reliability Engineer-FedRAMP, AWS (FULLY REMOTE) - 29122

Splunk Inc

Massachusetts, United States

$146.4K-$201.3K a year

Remote

Full-time

Job Description Join us as we pursue our disruptive vision to make machine data accessible, usable and valuable to everyone.

We are a company filled with people who are passionate about our product and seek to deliver the best experience for our customers.

AtSplunk, we’re committed to our work, customers, having fun and most significantly to each other’s success. Learn more about Splunk careers and how you can become a part of our journey! Role : Splunk's Cloud Services group is looking for a Site ReliabilityEngineer to help lead, design and build the next generation of our large scale cloudoffering.

You will be working on core services and applications that form the primitives for our current and future cloud service offerings.

Site Reliability Engineers in this role will be engaging with multiple service owners across the platform to teach and implement modern interpretations ofSRE,observability, Chaos Engineering andDevOps.

This role is highly visible and impactful to the organization and will help shapeSplunk's Engineering culture for years to come.

Your job, in a nutshell, is to make every team around you better... including your own!This is a remote role available in all US states except AK, ND, and WY.

You also have the option of an office desk in some locations if that's convenient and desirable for you! You will :

Own Splunk Cloudin FedRAMP environments
Work across the organization to deliver quality products that delightSplunk's passionate users.
Lead teams of tight-knit engineers who are building a state-of-the-art,cloud-based environment for massive-scale data processing.
Mentor and help new engineers to achieve more than they thought possible. You enjoy making other teams successful and are fulfilled through the success of others.

Qualifications :

You have experience or an interest in working with regulated computing environments such as FISMA and / or FedRAMP and are enthusiastic about doing it better.
This is a fully remote, US-based / work-from-home position. You must be a US Citizen working on US soil to be considered.
You have owned and operated Kubernetes Clusters and their associated ecosystems. Kubernetes certifications or an interest in obtaining these certifications are a plus, such as those from the Cloud Native Computing Foundation;

Certified Kubernetes Administrator (CKA), Certified Kubernetes Application Developer (CKAD), or Certified Kubernetes Security Specialist (CKS).

You have experience deploying and operating services on the Azure cloud platform.
You enjoy building and running distributed systems at scale in production. You understand the challenges and trade-offs to be made when building and deploying systems to production.
Deep understanding of linux systems (network stack, file system, OS services) and networking (L2 vs. L3, network architecture, VLANs, etc)
Experience with at least one programming language, preferably golang (go) or python. Knowledge of working with and automating linux systems tasks using this language is required, including working with configuration files and system services.

Knowledge of common data structures and algorithms, as well as their performance characteristics is required.

Knowledge of standard methodologies related to security, performance, and disaster recovery.
Highly skilled in identifying performance bottlenecks, identifying anomalous system behavior, and resolving root cause of service issues.
You have assembledOpen Sourcecomponents into cohesive services.
You've demonstrated the skills to effectively work across teams and functions to influence design, operations and deployment of highly available software.
You are interested in working hard to make the users ofSplunk's products happier every day.

Preferred skills :

Experience monitoring cloud environments withSplunk.
Experience with large distributed cloud service development,infrastructure, traffic management and architecture..
Experience with distributed architectures / systems with optimized and scalable software that operates on a large number of nodes.

30+ days ago

Related jobs

Promoted

Site Reliability Developer, Data & Technology (Remote)

Ankura Consulting Group

Sudbury, Massachusetts

Remote

Reporting directly to the Senior Director of Business Application Engineering, the Site Reliability Engineer role at Ankura is responsible for planning, coordinating, and design of hosted Business Applications in production environments. Site Reliability Developers play a pivotal role within the Bus...

Promoted

Lead Site Reliability Engineer

BJ's Wholesale Club, Inc.

Marlborough, Massachusetts

As a Lead Site Reliability Engineer, you will be responsible for designing, building, monitoring, and continuously improving our ecommerce platforms infrastructure and processes. Design and manage Java based microservices, bash scripts, Redis, High-Availability design, while strictly adhering to Sit...

Promoted

Senior Site Reliability Engineer - Platform Infrastructure

Klaviyo

Boston, Massachusetts

As a Senior Site Reliability Engineer, you will own multiple foundational Klaviyo services and make a big impact on the productivity of our product engineering teams. Internally, we call this role Senior Site Reliability Engineer on the Platform Infrastructure team. Engineers come to Klaviyo with ex...

Site Reliability Engineer (Remote)

The Resource Technology Partners

Cambridge, Massachusetts

Remote

Senior Software Engineer - Greenfield Development (Full-time). As an experienced Engineer and a senior member in our team, you’ll be immersed in all the elements of Software Development Lifecycles - design, development, integration, operation, support and testing. You’ll be instrumental in crafting ...

Senior or Staff Site Reliability Engineer - Cloud Infrastructure

Circle

Boston, Massachusetts

As a Senior Site Reliability Engineer at Circle, you will design, build, and maintain Circle's infrastructure estate to meet the growing worldwide customer base on public cloud providers across multiple regions. Staff Site Reliability Engineer (IV). Senior Site Reliability Engineer (III). Senior Sit...

Senior Site Reliability Engineer

Flywire

Boston, Massachusetts

We, at Flywire, are looking for an experienced engineer to join as the Site Reliability Engineering team in North America to help drive reliability, automation and performance in our cloud-based infrastructure. Experience as a Software Engineer or Systems Engineer is also valuable. Software engineer...

Lead Site Reliability Engineer

UKG

Lowell, Massachusetts

Lead Site Reliability Engineer. ...

Principal Site Reliability Engineer KR1540

Global InfoTek, Inc.

Bedford, Massachusetts

The Site Reliability Engineer (SRE) must be able to build and maintain infrastructure as code on large scale multi-site deployments. Eight-plus (8+) years of experience working in Operations, DevOps, or Site Reliability Engineering. The engineer will troubleshoot issues until root causes are underst...