Search jobs > San Jose, CA > Engineering manager

Engineering Manager Machine Learning Infrastructure

ByteDance
San Jose
Full-time

ResponsibilitiesFounded in 2012, ByteDance's mission is to inspire creativity and enrich life. With a suite of more than a dozen products, including TikTok, Helo, and Resso, as well as platforms specific to the China market, including Toutiao, Douyin, and Xigua, ByteDance has made it easier and more fun for people to connect with, consume, and create content.

Why Join UsCreation is the core of ByteDance's purpose. Our products are built to help imaginations thrive. This is doubly true of the teams that make our innovations possible.

Together, we inspire creativity and enrich life - a mission we aim towards achieving every day. To us, every challenge, no matter how ambiguous, is an opportunity;

to learn, to innovate, and to grow as one team. Status quo? Never. Courage? Always. At ByteDance, we create together and grow together.

That's how we drive impact - for ourselves, our company, and the users we serve. Join us. The mission of our AML team is to push the next-generation AI infrastructure and recommendation platform for the ads ranking, search ranking, live & ecom ranking in our company.

We also drive substantial impact on core businesses of the company. Currently, we are looking for Engineer Manager - Machine Learning Infrastructure to join our team to support and advance that mission.

Responsibilities : - Lead the team to design and implement distributed inference / training / scheduling / ochestration / storage / parameter server infrastructure for feeds, ads and search ranking models.

  • Oversee the development of monitoring and management tools to ensure the reliability and scalability of machine learning infra.
  • Manage the identification and prioritization of system inefficiencies and bottlenecks, leading efforts to enhance system performance.
  • Lead the team in creating tools to analyze bottlenecks and sources of instability, formulating and implementing effective solutions.
  • Collaborate with product teams, offering comprehensive solutions tailored to their specific requirements. Job requirements- Experience in leading an engineering team- Experience in developing and deploying large-scale machine learning systems.
  • Strong sense of responsibility and good at communication and teamwork- Passionate about solving complex and challenging problemsQualifications- Experience contributing to an open sourced machine learning framework (tensorflow / jax / pytorch / torchscript / mxnet / tensorrt).
  • Experience in big data frameworks (, Spark / Hadoop / Flink), experience in resource management and task scheduling for large scale distributed systems.
  • Participated in Parameter Server system optimization, or index structure optimization for search systems.- Strong background in one of the following fields : Hardware-Software Co-Design, High Performance Computing, ML Hardware Acceleration (, GPU / RDMA) or ML for Systems.

ByteDance is committed to creating an inclusive space where employees are valued for their skills, experiences, and unique perspectives.

Our platform connects people from across the globe and so does our workplace. At ByteDance, our mission is to inspire creativity and enrich life.

To achieve that goal, we are committed to celebrating our diverse voices and to creating an environment that reflects the many communities we reach.

We are passionate about this and hope you are too. ByteDance Inc. is committed to providing reasonable accommodations in our recruitment processes for candidates with disabilities, pregnancy, sincerely held religious beliefs or other reasons protected by applicable laws.

If you need assistance or a reasonable accommodation,

30+ days ago
Related jobs
Promoted
TikTok
San Jose, California

Within the given managerial scope, up-level engineering excellence and culture, support career growth of engineers and data scientists, deepen XFN collaboration, empower the team for long term success. Master or above degree in computer science, statistics, or other relevant, machine-learning-heavy ...

Promoted
Canonical
San Jose, California

We're hiring engineering managers with experience in high-quality microservice architectures and high-performance team leadership who also have deep familiarity with Linux kernel, virtualisation, storage and networking. As an engineering manager at Canonical you must have a solid technical backgroun...

Promoted
Apple, Inc.
Cupertino, California

Do you want to make Siri and Apple products smarter for our users? The Information Intelligence teams are building groundbreaking technology for algorithmic search, machine learning, natural language processing, and artificial intelligence. Experience with machine learning algorithms and tools. We d...

Promoted
Celebree School of Pike Creek
Mountain View, California

In this hybrid role, you will report to a Director in Machine Learning. Manage multiple teams of world class machine learning software engineers. Experience in ML infrastructure, and how to build a culture with sustainable engineering practices. Software Engineering builds the brains of Waymo's full...

Promoted
Apple
Sunnyvale, California

Apple is looking for a Software Engineering manager to develop the next generation of Apple DNS infrastructure. You will be responsible for critical systems, such as Apple DNS, that every single Apple team and billions of customer devices depend on! The work for the team (and engineering manager) in...

Promoted
Apple Inc.
Cupertino, California

We are the System Intelligent and Machine Learning (SIML) group that provides foundational computer vision and machine learning technologies to Apple’s ecosystem. In this role, you will be building infrastructure to support product-focused machine learning projects. Do you think Computer Vision and ...

Promoted
Celebree School of Pike Creek
Mountain View, California

In this hybrid role, you will report to a Director in Machine Learning. Manage multiple teams of world class machine learning software engineers. Experience in ML infrastructure, and how to build a culture with sustainable engineering practices. Software Engineering builds the brains of Waymo's full...

NVIDIA
Santa Clara, California

We are looking for a Senior Developer Relation Manager to drive strategic partnerships with ISVs and developer communities who are building AI enhanced engineering and scientific simulation platforms, applications, solutions, or services with Physics Informed Machine Learning. An ideal candidate has...

Adobe
San Jose, California

At least 8 years of industry experience in machine learning, optimization algorithms, and/or deep-learning techniques. We use machine learning and AI in information retrieval, encompassing query and content understanding, as well as ranking and scoring of creative assets. Oversee all facets of ML pr...

Annapurna Labs (U.S.) Inc.
Cupertino, California

SDM of Software Development for the Machine Learning Distributed Training, Core Technologies and Infra org, you will be responsible for leading a strong teams of software engineers and managers to help design and deploy a software that enables ML workloads work seamlessly on these new products. AWS ...