Search jobs > San Francisco, CA > Site reliability engineer

Sr Site Reliability Engineer

Federal Reserve System
San Francisco, CA
Full-time
Part-time

Company

Federal Reserve Bank of San FranciscoWe are the Federal Reserve Bank of San Francisco public servants with a mission to advance the nation’s monetary, financial, and payment systems to build a stronger economy for all Americans.

We are a community-engaged bank, and are committed to understanding and serving the vibrant, expansive communities of the Twelfth District.

That means we seek and appreciate new perspectives. We respect people for what they do and for who they are. We build opportunities to learn and grow.

When you join the SF Fed, you become part of a diverse team united in its purpose to promote an economy that works for everyone.

While the SF Fed is a Reserve Bank, we’re not what you might expect. We’re unreserved here. That means we seek new and diverse perspectives.

We spark conversations and encourage debate. We build opportunity. We pursue careers that are true to ourselves. We are looking for people who want to help our country reach its full economic potential.

When you join the SF Fed, you join a team of people working together to promote an inclusive economy that works for everyone.

We empower our people to balance their life and work responsibilities. That’s why we offer a flexible hybrid work model that allows you to collaborate with office colleagues on some days, and work from home on others.

Essential Responsibilities

As a Sr. Site Reliability Engineer, you will be part of the Data & Analytics Services (DAS) Team and will get an opportunity to broadly apply your engineering skills across various technology solutions, as well as build your skills in other areas by being exposed to various aspects of product delivery from inception, through design, build, and deployment.

You will be working multi-functionally with Product Managers, Architects, Engineers, and Customer teams in a rapidly evolving environment.

You will be developing Infrastructure as Code to launch server instances, install and configure software, amongst other things.

You will provide technical leadership in the planning, design, and implementation of cloud-based infrastructure systems with both traditional and non-traditional infrastructures.

Responsible for improving and protecting the availability, latency, performance, efficiency, change management, monitoring, emergency response, and capacity planning of cloud-based software and systems.

Responsible for implementing, managing, and scaling distributed systems in a public, private, or hybrid cloud environment.

Help implement the automation strategy for cloud services in coordination with architects and developers for toil reduction, reduce human errors, drive scalability and to improve reliability of the data platform.

Identify and respond effectively to service failures to preserve the service's conformance to its Service-Level Agreement and regularly update the application playbook to reduce the time necessary to mitigate an incident.

Work with development teams to establish Service-Level Objectives and key Service-Level Indicators.

Design, development, engineer, and deploy Infrastructure-as-Code solutions related to platform scalability, isolation, latency, throughput, and efficiency.

Establish postmortem procedures and lead postmortem exercises.

Conduct Production Readiness Reviews to ensure services meets accepted standards of operational readiness before going live.

Set guiding principles and assist in the creation of Infrastructure-as-Code and automated solutions in coordination with developers.

Facilitate compliance, rehydrating infrastructure on schedule, keeping up with the upgraded versions of the services, empowering developers with self-service ability.

Incident response, on call activities, standing up ad-hoc environments, helping with POC's to facilitate changes to architecture with high confidence.

Manage system activities to an error budget.

Qualifications

Bachelor’s degree in computer science, Information Systems, Computer Engineering, Systems Analysis or a related field or equivalent work experience.

5+ years of relevant technical work experience in the field of software development using cloud service provider platforms

3 or more years of experience in using Terraform to manage AWS Programmable Infrastructures

Must have architected and implemented the Cloud Infrastructure Automation scripts to create and maintain various target environments like Dev, Stage, QA, Integration and Production in AWS environments

Must have experience managing infrastructure including security roles and permissions, Cloud networking assets like VPC, Subnets, Routing Tables, Access Controls lists, storage assets like S3 buckets, creating lambda functions & layers, provisioning other AWS services like Redshift, DynamoDB etc.

Experience with advanced features like S3 backends and State file locks in Terraform

Experienced in implementing Data and Advanced Analytics solutions, or related experience in the Cloud preferably in AWS

Experience in developing an end to end AWS native platform for building Data lakes ( S3, Glue (Crawlers, ETL, Catalog), IAM, CodePipeline, CodeCommit, CloudTrail, CloudWatch, AWS Config, Guard Duty, Secrets Manager, KMS, EC2, Data Visualization Tool like Tableau run on an EC2 or AWS Quicksight, Athena

Hands on programming and scripting skills (Java, C++, C#, Python, Bash)

Working knowledge of Amazon Web Services

Experience in Continuous Integration, Continuous Delivery, and Continuous Deployment software tools to support, enhance and grow CI and CD capabilities

Understanding of security design for enterprise software systems. This includes but is not limited to :

Source code control systems (Subversion, Git variants) Build systems (Jenkins, GitLab, CircleCI) Static analysis tools (SonarQube, Fortify) Containerization tools (Docker, Podman) Orchestration and environment management tools (Puppet, Kubernetes, Ansible Terraform)

Knowledge of high-availability, load-balancing and failover configurations across application, infrastructure, and platform

Experience with and / or working knowledge the Financial Industry, Government Agencies, Federal Reserve Bank Lines of Business (LoBs) Applications

Practical experience and knowledge of Service Oriented Architecture (SOA), Mircoservices and API Management

Background in data security, governance and cybersecurity solutions.

Proven ability to write clear and concise communications : technical documents, design documents, specifications.

Thorough understanding of APIs, gateways, orchestrators, databases, networking, monitoring, configuration management and security best practices for a production environment

Must be a U.S. Citizen or a Green Card holder with the intent to become a U.S. Citizen

LI-Hybrid

Full Time / Part Time

Full time

Regular / Temporary

Regular

Job Exempt (Yes / No)

Job Category

Information Technology

Work Shift

First (United States of America)

30+ days ago
Related jobs
E-Solutions
California, United States

Site Reliability Engineer (SRE). We are seeking a skilled Site Reliability Engineer (SRE) to join our dynamic team. You will be responsible for ensuring the availability and reliability of our SaaS products, which host customer data and require 24x7 uptime. Ensure the reliability, availability, and ...

Cisco
San Francisco, California

As a Principal Site Reliability you will focus on innovating and providing strong technical vision as well as work with the team to build reliable, scalable and highly available datastores on a constantly growing multi-region scale platform. We’re looking for a reliability-focused engineering leader...

Outdefine
San Francisco, California

Senior Site Reliability Engineer - Node Operations. As a Site Reliability Engineer, you will help us solve some of the unique challenges of blockchain oracle architecture and be primarily responsible for the Chainlink ecosystem's off-chain part. SRE and Software Engineering background. At Chainlink ...

Cisco Systems, Inc.
San Francisco, California

Principal Site Reliability Engineer, Datastores (ThousandEyes). As a Principal Site Reliability Engineer, you will focus on innovating and providing strong technical vision as well as work with the team to build reliable, scalable, and highly available datastores on a constantly growing multi-region...

Google Cloud - Minnesota
San Francisco, California

Site Reliability Engineering (SRE) combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems. SRE ensures that Google Cloud's services—both our internally critical and our externally-visible systems—have reliability, uptime appropriate to c...

Hasura
San Francisco, California
Remote

Site Reliability Engineers (SREs) are responsible for keeping Hasura Cloud systems running smoothly and making sure updates can be rolled out reliably without any downtime. Be on a PagerDuty rotation to respond to Hasura Cloud availability incidents and provide support for service engineers with cus...

VLink Inc
CA, United States

Role: Site Reliability Engineer. ...

Tbwa Chiat/Day Inc
San Francisco, California
Remote

Site Reliability, DevOps, or Cloud engineer. And respect for both work and play, with vehicles that are equally at home at a camp site, a job site, or on a Tuesday commute. Design and implement an availability reporting framework working with engineering teams to develop SLO and SLI measurements and...

Ellation, Inc.
San Francisco, California

The Site Reliability Engineering (SRE) team is dedicated to ensuring the reliability, scalability, and performance of our data infrastructure. As a Staff Site Reliability Engineer for the Data Engineering team, you will be responsible for maintaining and enhancing the reliability of our data infrast...

https:/www.energyjobline.com/sitemap.xml
San Francisco, California

Senior Site Reliability Engineer (Azure SRE). We're proud to announce that we've partnered with an ambitious fintech company looking for an experienced Senior Site Reliability Engineer (SRE) to join their infrastructure team. Site Reliability Engineering or a related field, ideally within fintech or...