Sr Site Reliability Engineer

Federal Reserve System

Seattle, WA

Full-time

Part-time

Company

Federal Reserve Bank of San FranciscoWe are the Federal Reserve Bank of San Francisco public servants with a mission to advance the nation’s monetary, financial, and payment systems to build a stronger economy for all Americans.

We are a community-engaged bank, and are committed to understanding and serving the vibrant, expansive communities of the Twelfth District.

That means we seek and appreciate new perspectives. We respect people for what they do and for who they are. We build opportunities to learn and grow.

When you join the SF Fed, you become part of a diverse team united in its purpose to promote an economy that works for everyone.

While the SF Fed is a Reserve Bank, we’re not what you might expect. We’re unreserved here. That means we seek new and diverse perspectives.

We spark conversations and encourage debate. We build opportunity. We pursue careers that are true to ourselves. We are looking for people who want to help our country reach its full economic potential.

When you join the SF Fed, you join a team of people working together to promote an inclusive economy that works for everyone.

We empower our people to balance their life and work responsibilities. That’s why we offer a flexible hybrid work model that allows you to collaborate with office colleagues on some days, and work from home on others.

Essential Responsibilities

As a Sr. Site Reliability Engineer, you will be part of the Data & Analytics Services (DAS) Team and will get an opportunity to broadly apply your engineering skills across various technology solutions, as well as build your skills in other areas by being exposed to various aspects of product delivery from inception, through design, build, and deployment.

You will be working multi-functionally with Product Managers, Architects, Engineers, and Customer teams in a rapidly evolving environment.

You will be developing Infrastructure as Code to launch server instances, install and configure software, amongst other things.

You will provide technical leadership in the planning, design, and implementation of cloud-based infrastructure systems with both traditional and non-traditional infrastructures.

Responsible for improving and protecting the availability, latency, performance, efficiency, change management, monitoring, emergency response, and capacity planning of cloud-based software and systems.

Responsible for implementing, managing, and scaling distributed systems in a public, private, or hybrid cloud environment.

Help implement the automation strategy for cloud services in coordination with architects and developers for toil reduction, reduce human errors, drive scalability and to improve reliability of the data platform.

Identify and respond effectively to service failures to preserve the service's conformance to its Service-Level Agreement and regularly update the application playbook to reduce the time necessary to mitigate an incident.

Work with development teams to establish Service-Level Objectives and key Service-Level Indicators.

Design, development, engineer, and deploy Infrastructure-as-Code solutions related to platform scalability, isolation, latency, throughput, and efficiency.

Establish postmortem procedures and lead postmortem exercises.

Conduct Production Readiness Reviews to ensure services meets accepted standards of operational readiness before going live.

Set guiding principles and assist in the creation of Infrastructure-as-Code and automated solutions in coordination with developers.

Facilitate compliance, rehydrating infrastructure on schedule, keeping up with the upgraded versions of the services, empowering developers with self-service ability.

Incident response, on call activities, standing up ad-hoc environments, helping with POC's to facilitate changes to architecture with high confidence.

Manage system activities to an error budget.

Qualifications

Bachelor’s degree in computer science, Information Systems, Computer Engineering, Systems Analysis or a related field or equivalent work experience.

5+ years of relevant technical work experience in the field of software development using cloud service provider platforms

3 or more years of experience in using Terraform to manage AWS Programmable Infrastructures

Must have architected and implemented the Cloud Infrastructure Automation scripts to create and maintain various target environments like Dev, Stage, QA, Integration and Production in AWS environments

Must have experience managing infrastructure including security roles and permissions, Cloud networking assets like VPC, Subnets, Routing Tables, Access Controls lists, storage assets like S3 buckets, creating lambda functions & layers, provisioning other AWS services like Redshift, DynamoDB etc.

Experience with advanced features like S3 backends and State file locks in Terraform

Experienced in implementing Data and Advanced Analytics solutions, or related experience in the Cloud preferably in AWS

Experience in developing an end to end AWS native platform for building Data lakes ( S3, Glue (Crawlers, ETL, Catalog), IAM, CodePipeline, CodeCommit, CloudTrail, CloudWatch, AWS Config, Guard Duty, Secrets Manager, KMS, EC2, Data Visualization Tool like Tableau run on an EC2 or AWS Quicksight, Athena

Hands on programming and scripting skills (Java, C++, C#, Python, Bash)

Working knowledge of Amazon Web Services

Experience in Continuous Integration, Continuous Delivery, and Continuous Deployment software tools to support, enhance and grow CI and CD capabilities

Understanding of security design for enterprise software systems. This includes but is not limited to :

Source code control systems (Subversion, Git variants) Build systems (Jenkins, GitLab, CircleCI) Static analysis tools (SonarQube, Fortify) Containerization tools (Docker, Podman) Orchestration and environment management tools (Puppet, Kubernetes, Ansible Terraform)

Knowledge of high-availability, load-balancing and failover configurations across application, infrastructure, and platform

Experience with and / or working knowledge the Financial Industry, Government Agencies, Federal Reserve Bank Lines of Business (LoBs) Applications

Practical experience and knowledge of Service Oriented Architecture (SOA), Mircoservices and API Management

Background in data security, governance and cybersecurity solutions.

Proven ability to write clear and concise communications : technical documents, design documents, specifications.

Thorough understanding of APIs, gateways, orchestrators, databases, networking, monitoring, configuration management and security best practices for a production environment

Must be a U.S. Citizen or a Green Card holder with the intent to become a U.S. Citizen

LI-Hybrid

Full Time / Part Time

Full time

Regular / Temporary

Regular

Job Exempt (Yes / No)

Job Category

Information Technology

Work Shift

First (United States of America)

30+ days ago

Related jobs

Promoted

Staff Site Reliability Engineer

VirtualVocations

Seattle, Washington

A company is looking for a Staff Site Reliability Engineer - Incident Response. ...

Promoted

Sr. Reliability (RAMS) Engineer - Engines & Avionics

Blue Origin

Seattle, Washington

BS in aerospace engineering, mechanical engineering, electrical engineering, materials engineering, computer science, physics, or related technical discipline. As part of a diverse and hardworking engine development team, you will use data to identify factors that drive engine reliability and suppor...

Promoted

Site Reliability Engineer, Apple Services Engineering, Traffic

Apple, Inc.

Seattle, Washington

We are looking for passionate and talented Site Reliability Engineers to continue our focus in providing our customers the highest quality Apple Services experience. If you love designing, engineering and running systems and infrastructure that will help millions of customers, then this is the place...

Promoted

Staff Site Reliability Engineer

Moloco, Inc.

Seattle, Washington

We are looking for an exceptional Senior Site Reliability Engineer to help us build a state-of-the-art ML model serving infrastructure for our mobile advertising platform. You will be part of an engineering team that manages the infrastructure that serves deep neural network machine learning (ML) mo...

Promoted

Software Engineer - Site Reliability

Lacework

Seattle, Washington

Develop best practices alongside engineering/operations teams to improve the scalability and reliability of internal processes. Our team is growing, and we are looking for engineers with passion for automation. To do that, we build and support observability tooling and work with engineering to conti...

Site Reliability Engineer (Grafana)

Bay Area TeK Solutions LLC

Seattle, Washington

Job Description: We are looking for a skilled Senior Site Reliability Engineer (SRE) with deep expertise in Prometheus, Grafana, and Kubernetes to join our remote team. Qualifications:...

Senior Site Reliability Engineer

Axon

Seattle, Washington

Manager, Site Reliability Engineering. You will work closely not only with the APX SRE organization, but your technical deliverables will reach the entire engineering organization to enable product teams to continuously deliver features on the vanguard of innovation. Exemplify cloud-native site reli...

AI Ops Site Reliability Engineer - Data Infrastructure (Seattle)

ByteDance

Seattle, Washington

Join our innovative Site Reliability Engineering (SRE) team that merges software development with infrastructure operations to manage large-scale, highly distributed systems. Key Responsibilities:- Develop and implement AI-based software for efficient and intelligent management of service-oriented a...

Senior or Staff Site Reliability Engineer - Cloud Infrastructure

CIRCLE

Seattle, Washington

As a Senior Site Reliability Engineer at Circle, you will design, build, and maintain Circle’s infrastructure estate to meet the growing worldwide customer base on public cloud providers across multiple regions. Staff Site Reliability Engineer (IV). Senior Site Reliability Engineer (III). Senior Sit...

Senior Site Reliability Engineer (Multiple Positions)

ByteDance

Seattle, Washington

Scale systems sustainability through mechanisms such as automation and evolve systems reliability, efficiency, and velocity by pushing for changes. Participate in technical operations and rotations in response to performance and reliability issues. Mentor junior SREs and interns. ...

Sr Site Reliability Engineer

Staff Site Reliability Engineer

Sr. Reliability (RAMS) Engineer - Engines & Avionics

Site Reliability Engineer, Apple Services Engineering, Traffic

Staff Site Reliability Engineer

Software Engineer - Site Reliability

Site Reliability Engineer (Grafana)

Senior Site Reliability Engineer

AI Ops Site Reliability Engineer - Data Infrastructure (Seattle)

Senior or Staff Site Reliability Engineer - Cloud Infrastructure

Senior Site Reliability Engineer (Multiple Positions)

Related searches