Search jobs > Santa Clara, CA > Site reliability engineer

Director, Site Reliability Engineer

Ushur
Santa Clara, California, US
$180K-$225K a year
Full-time

Ushur is transforming the way enterprises communicate and engage with customers. Fueled by consumer’s self-service demands, enterprises are modernizing customer engagement and experience models.

Ushur is fast becoming the platform of choice for Customer Experience Automation, enabling these enterprises to leapfrog their digital native counterparts and deliver delightful customer and employee experiences.

With cutting-edge Conversational AI, Machine Learning and Intelligent Process Automation technologies, Ushur has enabled Fortune 100 enterprises including some of the world’s most well known brands in healthcare, insurance, banking and financial services sectors to automate their customer engagement.

Cloud-native, 100% no-code and purely workflow-driven, Ushur empowers citizen developers within business operations teams to build AI-powered, fully-automated and omni-channel experience to digitally transform customer journeys end-to-end.

About the Role

As the Director of Senior Reliability Engineering, you will be responsible for building and managing a team of talented reliability engineers.

You will ensure our systems are robust, scalable, and optimized for performance, focusing on maintaining uptime, handling incident management, and driving automation in a fast-paced startup environment.

This is a hands-on leadership role requiring deep technical expertise, strategic thinking, and the ability to foster collaboration in a dynamic fast-paced startup setting.

Are you the right applicant for this opportunity Find out by reading through the role overview below.

Responsibilities

Leadership & Strategy : Lead a high-performing team of senior reliability engineers, providing technical direction and career development.

Develop and execute strategies to improve system reliability, scalability, and performance.

  • System Architecture : Collaborate closely with engineering teams to design and implement reliable and scalable systems that meet business needs and ensure high availability.
  • Incident Management : Oversee incident response, post-mortems, and root cause analysis to ensure timely resolution and continuous improvement of reliability practices.
  • Automation & Monitoring : Drive the automation of manual tasks and implementation of monitoring solutions to increase system reliability, efficiency, and incident response.
  • Continuous Improvement : Foster a culture of continuous improvement, encouraging experimentation, learning, and adaptation to ensure the reliability and performance of all systems.
  • Collaboration : Partner with cross-functional teams, including Product and Engineering, to align system design with business goals and growth initiatives.
  • Scalability & Performance : Ensure systems can scale effectively as we grow, maintaining performance and minimizing downtime in a rapidly changing startup environment.
  • Security & Compliance : Work with security and compliance teams to ensure system reliability aligns with industry standards and best practices.

Requirements

  • 10+ years of experience in Site Reliability Engineering or related roles, with at least 3+ years in a leadership capacity.
  • Proven experience working in fast-paced startup environments, balancing the need for quick delivery with long-term scalability and reliability.
  • Expertise in cloud infrastructure, preferably AWS, GCP, or Azure.
  • Strong knowledge of automation tools and scripting languages (Python, Bash, etc.) and familiarity with CI / CD pipelines.
  • Experience with monitoring and incident management tools such as Prometheus, Datadog, PagerDuty, etc.
  • Excellent problem-solving skills, with a focus on proactive prevention and quick response.
  • Strong communication skills, capable of articulating complex technical issues to both technical and non-technical stakeholders.
  • Hands-on approach with a deep understanding of system architecture and operational excellence.
  • Ability to mentor and develop a growing team in a dynamic, high-growth environment.

$180,000 - $225,000 a yearThe pay range for this position is $180,000-$225,000 plus bonus and equity. However, base pay offered may vary depending on skills, experience, job-related knowledge and location.

Benefits

Great Company Culture. We pride ourselves on having a values-based culture that is welcoming, intentional, and respectful.

Bring your whole self to work. We are focused on building a diverse culture, with innovative ideas where you and your ideas are valued.

We are a start-up and know that every person has a significant impact!

Rest and Relaxation. Unlimited PTO, wellness days, paid holidays, and more!

Health Benefits. Comprehensive health, dental, and vision. We offer a variety of plans to meet the needs of you and your loved ones.

We care about your Future. Access to 401(k) so you can contribute and generous stock options.

Keep learning. One of our core values is Growth Mindset - we believe in lifelong learning. Whether you are a previous student, or currently enrolled in higher education, we can help cover some of those expenses and support your ongoing development and career growth.

Flexible Work. In-office, work-from-home, or hybrid, depending on position and location. We seek to create an environment for all of our employees where they can thrive in both their professional and personal lives.

J-18808-Ljbffr

2 days ago
Related jobs
Promoted
NetApp
San Jose, California

Manages, supports and maintains a reliable environment for the site in order to ensure the stability and security of multiple open-source systems/platforms that are run or operated in that environment. Building and supporting a reliable site for the environment in order to meet the development and m...

Promoted
Apple
Cupertino, California

At least 5 years in a Site Reliability Engineering, DevOps or infrastructure focused role. The Apple Services Engineering (ASE) team is one of the most exciting examples of Apple's long-held passion for combining art and technology. These engineers build secure, end-to-end solutions. Thanks to Apple...

Promoted
Groq
Mountain View, California

Site Reliability Engineer, Distributed Systems. Specifically engineered for the demands of large language models (LLMs), the Language Processing Unit outpaces the GPU in speed, power, efficiency, and cost-effectiveness. Some roles may require being located near our primary sites, as indicated in the...

Promoted
NVIDIA
Santa Clara, California

Join our team at NVIDIA as a Senior Site Reliability Engineer focused on HPC storage and play a crucial role in designing, implementing, and optimizing on-prem High-Performance Computing (HPC) storage solutions while harnessing the power of cloud computing. You will collaborate closely with engineer...

Promoted
Apple, Inc.
Cupertino, California

Support and improve the Hardware Technology engineering environment from design through deployment, including additional refinement and scale-up to support future growth. ...

Promoted
TikTok
San Jose, California

Our data infrastructure Site Reliability Engineering (SRE) team is a pioneer in innovation. Establish sustainable mechanisms for scaling systems, such as automation, to drive enhancements in reliability, efficiency, and velocity. ...

Atlassian
Mountain View, California

As a Site Reliability Engineer (SRE) you will actively work to improve the performance and reliability of services as well as address root causes of incidents and reduce incident rates. Love staying ahead of the growth curve and experimenting with new software and environments? Get on board as an At...

Hireio, Inc.
San Jose, California

Therefore, we set up an engineer team with high talent density, mainly focusing on AI technology and Privacy&Security here. ...

ByteDance
San Jose, California

Participate in technical operations and rotations in response to performance and reliability issues. ...

Rubrik
Palo Alto, California

Senior Site Reliability Engineers at Rubrik are systems/software engineers who ensure that Rubrik's infrastructure services run smoothly and have the capacity for future growth. As a Senior Site Reliability Engineer, you will be responsible for:. Minimum 3-5 years of experience as a Development, Dev...