Search jobs > San Jose, CA > Engineering manager

Manager, Reliability Engineering

The Trade Desk
San Jose, California, US
$110.6K-$138.3K a year
Full-time

Who We Are

Check you match the skill requirements for this role, as well as associated experience, then apply with your CV below.

At The Trade Desk, we recognize that a seamless customer experience is driven by operational excellence. In pursuit of constantly improving the reliability of our platform, we are establishing a global Reliability Operations team.

This team's core mission will be to vigilantly monitor The Trade Desk platform services, refine our incident response methodologies, and guarantee a robust and highly-available customer experience.

If you're passionate about ensuring system reliability, process improvement, and making an essential customer impact, we invite you to play a critical role in this next evolution of our on-call experience.

What You'll Do

  • Define, manage, and measure incident response engineering practices
  • Liaise with engineering teams to ensure work discovered during incident response is prioritized
  • Participate in incident response engineering duties as necessary
  • Manage a global Reliability Operations team (3 to 6+ Reliability operations engineers across NAMER, EMEA, APAC)
  • Periodically meet with reports across timezones

There may be periodic weekend coverage requirements

Who We Are Looking For

  • Bachelor’s Degree from a four-year university or relevant substitute experience
  • 6+ years relevant work experience in Technical and / or Application Support with strong knowledge of technical troubleshooting
  • 2-5 years of management experience with direct reports

The Reliability Operations Engineering Manager will either possess or be excited to learn a number of skills...

Management

  • Adaptive management style according to level and proficiency of engineering reports.
  • Ability to understand technical employee career paths and collaboratively develop career plans.
  • Scheduling a global team through holidays, sickness and vacation leaves, across timezones.

Technical Proficiency

  • Understanding of large-scale distributed system architectures (e.g., databases, web services, application services).
  • Familiarity with monitoring tools (e.g., Prometheus, Grafana, Nagios).
  • Ability to author scripts to facilitate troubleshooting as well as configure alerts.
  • Proficiency in scripting languages (e.g., Python, Bash) is a plus

Incident Management and Troubleshooting

  • Ability to prioritize and manage incidents based on severity, with a focus on customer impact.
  • Ability to remain calm under pressure and quickly diagnose issues.
  • Understanding of system logs, metrics, telemetry.

Communication Skills

  • Ability to take command and confidently direct engineering resources in ambiguous situations.
  • Ability to communicate effectively with stakeholders during an incident.
  • Ability to maintain and update trouble-shooting guides (TSGs) and operational documentation.

NY, CO, CA, and WA residents only : In accordance with NY, CO, CA, and WA law, the range provided is The Trade Desk's reasonable estimate of the base compensation for this role.

The actual amount may differ based on non-discriminatory factors such as experience, knowledge, skills, abilities, and location.

All employees may be eligible to become The Trade Desk shareholders through eligibility for stock-based compensation grants, which are awarded to employees based on company and individual performance.

The Trade Desk also offers other compensation depending on the role such as sales-based incentives and commissions. Plus, expected benefits for this role include comprehensive healthcare (medical, dental, and vision) with premiums paid in full for employees and dependents, retirement benefits such as a 401k plan and company match, short and long-term disability coverage, basic life insurance, well-being benefits, reimbursement for certain tuition expenses, parental leave, sick time of 1 hour per 30 hours worked, vacation time for full-time employees up to 120 hours through the first year and 160 hours thereafter, and around 13 paid holidays per year.

Employees can also purchase The Trade Desk stock at a discount through The Trade Desk’s Employee Stock Purchase Plan.

Note : Interns are not eligible for variable incentive awards such as stock-based compensation, retirement plan, vacation, tuition reimbursement or parental leave.

At The Trade Desk, Base Salary is one part of our competitive total compensation and benefits package and is determined using a salary range.

The base salary range for this role is $110,600 $138,300 USD.

J-18808-Ljbffr

17 days ago
Related jobs
Promoted
Apple
Cupertino, California

Experience applying software engineering to solve large scale operational problems (Java and Golang preferred). Track record of improving service reliability and efficiency whilst lowering operational cost. ...

Promoted
Plume Design, Inc.
Palo Alto, California

We’re looking for a seasoned Technical Manager, experienced with Customer Facing environments, to Captain our Site Reliability Engineering Team. Supervise a team of Site Reliability Engineers who provide first-line support to Customer Clouds. ...

Promoted
Apple
Cupertino, California

The Ad Platforms team is seeking a Senior Manager for leading Data Site Reliability Engineering. Design and implement scalable data platforms for our customer facing services Monitor production, staging, test and development environments for multiple teams in an agile / dynamic fast paced engineerin...

Promoted
The Trade Desk
San Jose, California

The Reliability Operations Engineering Manager will either possess or be excited to learn a number of skills. In pursuit of constantly improving the reliability of our platform, we are establishing a global Reliability Operations team. Manage a global Reliability Operations team (3 to 6+ Reliability...

Promoted
Apple
Cupertino, California

We are looking for passionate and talented Site Reliability Engineering Manager to continue our focus in providing our customers the highest quality Apple Services experience. Demonstrable success leading engineering teams; ideally SRE or Production Engineering. Understanding of SRE principals, incl...

GEICO
San Jose, California

Our Senior Manager is an engineering leader who works with the engineering staff to innovate and build new engineering solutions, improveand enhance existing solutions as well as leverage engineering solutions to solve critical operational problems. Senior Manager, Site Reliability Engineering - Net...

Netflix
Los Gatos, California

Partner Enablements Apps group within CPT is looking for a software engineering manager to lead our Ecosystem Platform and Reliability team. We are seeking an experienced Software Engineering Manager to lead our Ecosystem Platform and Reliability team. The team is also responsible for the reliabilit...

GEICO
San Jose, California

Our Senior Manager is an engineering leader who works with the engineering staff to innovate and build new engineering solutions, improveand enhance existing solutions as well as leverage engineering solutions to solve critical operational problems. Senior Manager, Site Reliability Engineering – Dat...

Promoted
Marvin
San Jose, California

As an Architectural Project Manager with Marvin you will develop new business, build brand awareness and loyalty and ultimately drive sales of the Marvin Collections to architectural and commercial design professionals!. Actively pursue business development with facility managers and owners of insti...

Promoted
KLA
Milpitas, California

The Quality Engineer, has the responsibility to verify that company activities fully implement the quality manual policies and procedures; in addition, The Quality Representative has the authority to assign action items to all personnel to ensure compliance as well as the authority to escalate any q...