Search jobs > San Francisco, CA > Senior site reliability

Senior Site Reliability Engineer (Cloud Networking)

1000 Kyndryl, Inc.
San Francisco, CA, USA
$75.5K-$143.5K a year
Full-time
Part-time

Who We Are

At Kyndryl, we design, build, manage and modernize the mission-critical technology systems that the world depends on every day.

So why work at Kyndryl? We are always moving forward always pushing ourselves to go further in our efforts to build a more equitable, inclusive world for our employees, our customers and our communities.

The Role

This Networking Team is responsible for the design, implementation, and operation of software-defined networking technology that is at the core of Skytap’s product.

Our SDN technology provides customers Layer 2 through Layer 7 networking features for traditional applications they’ve migrated from on-premises to Skytap Cloud.

This includes features such as network isolation, MAC and IP address management and translation, policy-based routing, and hybrid-cloud connectivity.

Customers can create identical clones of their virtual data centers without any L2 or L3 modifications and can connect them to one another and to on-prem resources.

This Networking team makes this magic possible at scale in data centers across the world.

We hire talented engineers who all work together to foster a great engineering environment. In your role as a Site Reliability Engineer, you’ll use your skills to help instrument our systems so they can be easily built, observed, monitored, tested, and deployed at scale, and ensure Skytap’s services perform well for enterprise customers.

One of your primary responsibilities will be to ensure the reliability, scalability, and security of our systems and services.

You will work closely with development, operations, and security teams to design, implement, and maintain automated solutions that enhance the stability and security of our infrastructure.

You will also participate in incident response activities, including incident triage, root cause analysis, and post-mortem reviews.

In order to be effective in this role as a Site Reliability Engineer, you’ll need to have proficiency with general DevOps and automation and knowledge of Linux / Unix-based operating systems.

Previous experience with networking technologies is a bonus, but not required and you will have opportunities for exposure and learning.

You can expect to spend half of your time writing code : usually provisioning and monitoring automation improvements, bug fixes, and internal technical improvements.

Your Responsibilities :

Design and add new monitoring, logging, alerting, and metrics to systems

Eventually, contribute to the team's on-call rotation

General system operations work

Improve configuration management systems and automation

Improve processes and documentation for service administration

Write design documentation for major service improvements

Develop, maintain, improve, and automate the build and testing pipeline

Ensure the software release process is operating smoothly and effectively

Incorporate new software into package management repos when needed

Implement security controls and hardening measures to mitigate risks and enhance the security posture of our systems.

As your domain knowledge increase over time, you can take on these additional responsibilities such as assisting in field requests and customer-facing troubleshooting and diagnostics, including working with the Support team.

Your Future at Kyndryl

Kyndryl has a global footprint, which means that as a Site Reliability Engineer at Kyndryl you will have opportunities to work on projects and collaborate with colleagues from around the world.

This role is dynamic and influential offering a wide range of professional and personal growth opportunities that you won’t find anywhere else.

Kyndryl currently does not require employees to be fully vaccinated against COVID-19, however, if you are hired to work at a client, customer, or partner location, you may be required to show proof of vaccination to align with their respective COVID-19 vaccination policies.

Those who believe they are eligible may apply for a medical or religious accommodation prior to the start of employment.

Who You Are

Your Skills & Expertise

3 years of experience with infrastructure and configuration management tools like Ansible, Puppet, and Terraform.

Core networking domain technology knowledge & experience such as TCP / IP, DNS, DHCP, TLS, and network virtualization.

Understand that your success is measured by the success of our service’s reliability and performance.

Experience with time series databases and data visualization tools such as the TICK Stack. (Telegraf, InfluxDB, Chronograf and Kapacitor).

Experience with logging, search, and visualization tools such as the Elastic Stack (Elasticsearch, Logstash, Kibana) and Grafana

General networking protocol stack knowledge.

Experience with container orchestration tools such as Docker and Kubernetes.

Solid understanding of Linux / Unix-based operating systems and experience in debugging system and networking issues.

Knowledge of Linux kernel internals and tunable.

Understanding of service level objectives and service level agreements.

Have experience creating and scaling highly available distributed systems.

Intermediate programming experience with languages like Python and Bash and experience with source code control tools and platforms such as Git and GitHub.

Ability to dig into the details of projects or write scripts to uncover patterns from sources of data.

Ability to remain calm and effective in high-stress settings such as interpersonal conflicts, technical discussions, and production outages

Detail-oriented reader. You can read a spec and see the big picture as well as missing edge cases.

Other required skills include strong communication skills a collaborative attitude and a working style.

Bonus Skills :

Experience in high-performance networking architecture and operational troubleshooting of network issues is a bonus.

Experience with network packet generating and analyzing tools such as Scapy, tcpdump, Wireshark / TShark, etc.

Experience with cloud platforms like Azure, AWS, etc.

Zabbix

MySQL

The compensation range for the position in the U.S. is $68,520 to $130,320 based on a full-time schedule. Your actual compensation may vary depending on your geography, job-related skills and experience.

For part time roles, the compensation will be adjusted appropriately. The pay or salary range will not be below any applicable state, city or local minimum wage requirement.

There is a different applicable compensation range for the following work locations :

California : $75,480 to $156,480

Colorado : $68,520 to $130,320

New York City : $82,320 to $156,480

Washington : $75,480 to $143,520

Washington DC : $75,480 to $143,520

This position will be eligible for Kyndryl’s discretionary annual bonus program, based on performance and subject to the terms of Kyndryl’s applicable plans.

You may also receive a comprehensive benefits package which includes medical and dental coverage, disability, retirement benefits, paid leave, and paid time off.

Note : If this is a sales commission eligible role, you will be eligible to participate in a sales commission plan in lieu of the annual discretionary bonus program.

Applications will be accepted on a rolling basis.

Being You

Diversity is a whole lot more than what we look like or where we come from, it’s how we think and who we are. We welcome people of all cultures, backgrounds, and experiences.

But we’re not doing it single-handily : Our Kyndryl Inclusion Networks are only one of many ways we create a workplace where all Kyndryls can find and provide support and advice.

This dedication to welcoming everyone into our company means that Kyndryl gives you and everyone next to you the ability to bring your whole self to work, individually and collectively, and support the activation of our equitable culture.

That’s the Kyndryl Way.

What You Can Expect

With state-of-the-art resources and Fortune 100 clients, every day is an opportunity to innovate, build new capabilities, new relationships, new processes, and new value.

Kyndryl cares about your well-being and prides itself on offering benefits that give you choice, reflect the diversity of our employees and support you and your family through the moments that matter wherever you are in your life journey.

Our employee learning programs give you access to the best learning in the industry to receive certifications, including Microsoft, Google, Amazon, Skillsoft, and many more.

Through our company-wide volunteering and giving platform, you can donate, start fundraisers, volunteer, and search over 2 million non-profit organizations.

At Kyndryl, we invest heavily in you, we want you to succeed so that together, we will all succeed.

5 days ago
Related jobs
Promoted
CaptivateIQ
San Francisco, California
Remote

The Site Reliability Engineering team in CaptivateIQ operates horizontally across the engineering organization, supporting our development teams by providing them with the tools and processes they need to operate in a frictionless manner. Full Time] Site Reliability Engineer - Remote at CaptivateIQ ...

Promoted
Verkada
San Mateo, California

We are actively looking for a talented Site Reliability Engineer to join the Infrastructure team. Provide technical support for engineers on other teams. Experience with one of the major cloud platforms (preferably AWS). ...

Promoted
Google
San Bruno, California

Site Reliability Developers combine software and systems development to build and run large-scale, massively distributed, fault-tolerant systems. SRE ensures that Google Cloud's services—both our internally critical and our externally-visible systems—have reliability, uptime appropriate to customer'...

Promoted
Long Term Stock Exchange
San Francisco, California

Full Time] Senior Systems Reliability Engineer at Long Term Stock Exchange (United States). Senior Systems Reliability Engineer. The Role: An opportunity to join the team responsible for the reliability and resiliency of one of the newest US stock exchanges. Capable of and excited by designing capac...

Promoted
Ellation, Inc.
San Francisco, California

As a Staff Site Reliability Engineer for the Data Engineering team, you will be responsible for maintaining and enhancing the reliability of our data infrastructure. The Site Reliability Engineering (SRE) team is dedicated to ensuring the reliability, scalability, and performance of our data infrast...

Promoted
ThousandEyes
San Francisco, California

In August 2020, Cisco Systems completed the acquisition of ThousandEyes, which now forms the ThousandEyes Business Unit within the Cisco Networking Business Group and is the Network Assurance solution for Cisco across the Cisco Networking Cloud and Cisco Security Cloud. Principal Site Reliability En...

GlossGenius
San Francisco, California
Remote

Production Engineer, Cloud Engineer, Site Reliability Engineer, or DevOps equivalent roles. In this role, you'll have the opportunity to join GlossGenius as one of the first Senior Site Reliability Engineer as part of the Infrastructure Engineering team. As a Site Reliability Engineer, you will play...

GEICO
San Francisco, California

Our Senior Manager is an engineering leader who works with the engineering staff to innovate and build new engineering solutions, improveand enhance existing solutions as well as leverage engineering solutions to solve critical operational problems. Senior Manager, Site Reliability Engineering - Net...

Splunk Inc
San Francisco, California
Remote

Splunk's Cloud Services group is looking for a Site ReliabilityEngineer to help lead, design and build the next generation of our large scale cloudoffering. Site Reliability Engineers in this role will be engaging with multiple service owners across the platform to teach and implement modern interpr...

SoFi
San Francisco, California

We are seeking a skilled and experienced Senior Engineer to join our Cloud Primitives team. If you are an experienced engineer with expertise in cloud technologies and a passion for building reliable, scalable, and secure cloud infrastructure, we invite you to apply for this exciting opportunity. So...