Search jobs > Seattle, WA > Site reliability engineering

SITE RELIABILITY ENGINEERING MANAGER, AI PLATFORM

Adobe Systems Incorporated
Seattle, WA, United States
$146.3K-$281.1K a year
Full-time

Our Company

Changing the world through digital experiences is what Adobe's all about. We give everyone-from emerging artists to global brands-everything they need to design and deliver exceptional digital experiences! We're passionate about empowering people to create beautiful and powerful images, videos, and apps, and transform how companies interact with customers across every screen.

We're on a mission to hire the very best and are committed to creating exceptional employee experiences where everyone is respected and has access to equal opportunity.

We realize that new ideas can come from everywhere in the organization, and we know the next big idea could be yours!

The Opportunity

We're looking for an outstanding, hands-on leader to drive Reliability for Adobe's AI Inference Platform, Adobe Firefly. You will develop a team of Site Reliability Engineers closely working with the Engineering teams on building, scaling, and securing the AI Platform.

This enables the Firefly product teams to easily manage and deploy Machine Learning capabilities used by Adobe client applications.

The Applied Research groups from Adobe Research and other App Teams in Adobe will deploy thousands of models onto this platform in a variety of lifecycle stages (early research, development, productization, optimization, etc).

This platform will offer an ML model serving at scale, with high-cost efficiency, and on a wide variety of hardware platforms across multiple clouds.

What You'll Do

  • Guide the technical vision and roadmap for AI Platform Inference infrastructure.
  • Grow and lead a team of dedicated SRE engineers.
  • Engage with Firefly Engineering and Firefly App Integrations team to understand their needs and goals to drive the platform's reliability.
  • Identify and implement methodologies and solutions to increase reliability, scalability, security, and efficiency.
  • Ensure the highest uptime and Quality of Service (QoS) for Adobe's customers through operational excellence.
  • Define service level objectives (SLOs) and indicators (SLIs) to represent and measure service quality.
  • Support and maintain globally distributed, multi-cloud (public and / or private) environments.
  • Automate common, repeatable tasks at a large scale to streamline operational procedures.
  • Identify areas to improve service resiliency through techniques such as chaos engineering, performance / load testing, etc.
  • Coordinate with other Adobe platform teams and service providers (primarily AWS) to innovate on Generative AI as a Service.
  • Ensure inference services improve GPU utilization, scale models independently, and optimize COGs.

What You'll Need to Succeed

  • A BS or MS degree in Computer Science, Electrical Engineering, a related field, or equivalent industry experience.
  • You have 3+ years of experience as an Engineering Manager.
  • You excel in undefined environments and get excited about finding pragmatic solutions to complex technical or organizational challenges.
  • You've worked with high-scale distributed systems used by tens or hundreds of millions of users.
  • You are passionate about coaching and developing engineers but love to dig into technical problems when the opportunity arises.
  • You keep up with the industry trends and grow your knowledge and skills to solve technical problems.
  • Experience in building and scaling distributed systems, as well as experience with containerization and orchestration technologies like Kubernetes.
  • Strong communication and collaboration skills - building strong relationships with internal customers and external partners.
  • Dedication to team-work, self-organization, and continuous improvement
  • A track record of leading high-performance teams to deliver results in a fast-paced and dynamic environment of AI infrastructure.
  • Production level expertise with containerization orchestration engines (e.g. Kubernetes) and demonstrated understanding of modern, continuous development techniques and pipelines (IaC, CI / CD, ArgoCD, Git)
  • Fundamental programming skills, ideally practical experience in one (and preferably more) of the following languages : Python, Go or Java
  • An understanding of AI / ML, including ML frameworks, public cloud, and commercial AI / ML solutions - familiarity with Pytorch, SageMaker, HuggingFace, NVIDIA TensorRT or OpenAI Triton a plus.

FireflyGenAI

Our compensation reflects the cost of labor across several? U.S. geographic markets, and we pay differently based on those defined markets.

The U.S. pay range for this position?is $146,300 $281,100 annually. Pay?within this range varies by work location?and may also depend on job-related knowledge, skills,?

and experience. Your recruiter can share more about the specific salary range for the job location during the hiring process.

At Adobe, for sales roles starting salaries are expressed as total target compensation (TTC base + commission), and short-term incentives are in the form of sales commission plans.

Non-sales roles starting salaries are expressed as base salary and short-term incentives are in the form of the Annual Incentive Plan (AIP).

In addition, certain roles may be eligible for long-term incentives in the form of a new hire equity award.

Adobe will consider qualified applicants with arrest or conviction records for employment in accordance with state and local laws and fair chance ordinances.

Adobe is proud to be an Equal Employment Opportunity and affirmative action employer. We do not discriminate based on gender, race or color, ethnicity or national origin, age, disability, religion, sexual orientation, gender identity or expression, veteran status, or any other applicable characteristics protected by law. Learn more.

Adobe aims to make Adobe.com accessible to any and all users. If you have a disability or special need that requires accommodation to navigate our website or complete the application process, email accommodations@adobe.

com or call (408) 536-3015.

Adobe values a free and open marketplace for all employees and has policies in place to ensure that we do not enter into illegal agreements with other companies to not recruit or hire each other's employees.

4 days ago
Related jobs
Promoted
Stripe
Seattle, Washington

As a Technical Program Manager in the Infrastructure Platform and Reliability space, you will play a key role within engineering and drive programs that span across Stripe in the core areas of Stripe's payment systems and underlying infrastructure. Partner with Engineering Managers, Product Managers...

Promoted
MCG Health
Seattle, Washington

The manager will play a key role in implementing and maintaining the platform's availability, security, and performance while continually improving data access capabilities to meet evolving business needs. Engineering Manager, Data Platform. This role is focused on building and scaling the platform ...

Promoted
Apple Inc.
Seattle, Washington

We are looking for a manager to lead the AI/ML Quality Engineering organization within Apple Services Engineering. AI/ML QE Manager - Apple Services Engineering. Make sure to apply with all the requested information, as laid out in the job overview below. Bring passion and dedication to your job, an...

GEICO
Seattle, Washington
Remote

GEICO is seeking an experienced and visionary SRE Senior Manager to join the organization and aid the establishment and growth of the Site Reliability Engineering (SRE) practice for Hybrid Cloud - Infrastructure as a Service (IaaS). As an SRE Leader, you will be responsible for leading and driving d...

Promoted
Apple Inc.
Seattle, Washington

Site Reliability Engineering Leader - Security, Apple Service Engineering. We are looking for a passionate and talented Site Reliability Engineering Leader to continue our focus on providing our customers the highest quality Apple Services experience. You will lead the SRE teams responsible for the ...

Unity
Bellevue, Washington

As an Engineering Manager, you’ll lead our Web Platform team to elevate the performance of Unity’s runtime across desktop and mobile browsers. You’ll be responsible for attracting, developing, and retaining top engineering talent. If you excel at shaping software that millions love and use, and enjo...

JPMorgan Chase Bank, N.A.
Seattle, Washington

As a Technical Project Manager III in Core Platform Engineering, you will help lead complex technology projects and programs that drive business goals and create value for clients, employees, and stakeholders. Job responsibilities * Develop and execute comprehensive project plans, incorpor...

Apple
Seattle, Washington

This includes collaborating with all partner organizations relevant for a project, You will: - Communicate: Provide clear, timely and objective communication, including regular program status updates and critical issues as needed to executive team - Coordinate: Coordinate discussions and break down ...

Robinhood
Bellevue, Washington

The Reliability Engineering team provides a specialization within engineering focused on designing, engineering, evolving, and safely making changes to large-scale distributed systems; these systems are often composed of disparate components which are each individually complex. As the Senior Enginee...

Circle
Seattle, Washington

Experience with: Building Docker images and deploying containers in Kubernetes clusters; Any modern CI/CD platform with seeminglyplex gates and workflows; Blue-Green, Canary, and A/B Testing deployment strategies; Distributed blockchain systems, running and maintaining blockchain full nodes; Databas...