Senior Software Engineer, Infrastructure

CentML Inc.
San Francisco, California, US
Full-time
We are sorry. The job offer you are looking for is no longer available.

About Us

Scroll down to find the complete details of the job offer, including experience required and associated duties and tasks.

We believe AI will fundamentally transform how people live and work. CentML's mission is to massively reduce the cost of developing and deploying ML models so we can enable anyone to harness the power of AI and everyone to benefit from its potential.

Our founding team is made up of experts in AI, compilers, and ML hardware and has led efforts at companies like Amazon, Google, Microsoft Research, Nvidia, Intel, Qualcomm, and IBM.

Our co-founder and CEO, Gennady Pekhimenko, is a world-renowned expert in ML systems who holds multiple academic and industry research awards from Google, Amazon, Facebook, and VMware.

Position Overview :

We are seeking a highly motivated and skilled senior infrastructure engineer to join our team in a key role focused on designing, developing, and maintaining the CentML platform that offers a cost-effective infrastructure for serving and training large-scale machine learning models.

As an infrastructure engineer, you will be responsible for laying out the design of a deployment infrastructure for ML training and inference jobs over GPU clusters that spans across multiple cloud service providers like AWS, GCP, Azure, Coreweave, and OCI.

You should also be responsible for leading a team of engineers and building a scalable, performant, and reliable platform, enabling our customers to seamlessly access and utilize a comprehensive suite of ML services that we offer.

Responsibilities

  • Design and lead the development of the deployment infrastructure of the CentML platform. The deployment infrastructure manages the hardware resources necessary to deploy the ML training and inference applications.
  • Implement GPU cluster scheduling solutions for large-scale ML training and inference workloads to efficiently utilize the hardware resources in the GPU cluster.
  • Communicate with our product teams and define new features and goals for improving the CentML platform.

Qualifications

  • 4+ years of experience working with containerized deployment systems (e.g., Kubernetes, OpenShift, Terraform, etc.).
  • A big plus if you have contributed to Kubernetes and have expertise in container runtime technologies like Docker Engine, containerd, or CRI-O.
  • Experience with deploying and managing cloud infrastructure on AWS, GCP, Azure.
  • Past experience in building GPU clusters for large-scale ML training and inference is desirable.
  • Knowledge in GPU architecture and Nvidia GPU virtualization technologies is highly desirable.
  • Strong coding skills in languages like Python, Java, Go, and / or C / C++.

Benefits & Perks

  • An open and inclusive culture and work environment
  • Fully stocked kitchen at the office
  • Full health and dental benefits
  • Parental Leave top-up for 6 months
  • Continuous education budget
  • Generous vacation - we're not saying unlimited, but if you need extra time to recharge, just ask

At CentML, we celebrate our differences and value cultivating an inclusive environment for all. We welcome applications of all kinds and are committed to providing an equal opportunity process.

J-18808-Ljbffr

11 days ago
Related jobs
Guidewire Software
CA, United States

Embark on a transformative career as a Guidewire Cloud Platform Software Engineer, where you will be at the forefront of revolutionizing how business leverage cloud technologies. Guidewire provides outstanding software for the second-largest financial services industry in the world: insurance. Deepl...

Square
San Francisco, California
Remote

As a senior engineer on the Bank Accounts team at Square Banking, you will help design and develop net-new backend systems to scale our Checking and Savings account products. Experience serving as a technical lead, mentoring more junior engineers, both technically and in their careers. Strong unders...

OpenAI
San Francisco, California

We are looking for visionary Senior Software Engineer to join our Applied Group, where you'll transform groundbreaking research into real-world applications that can change industries, enhance human creativity, and solve complex problems. As a Senior Software Engineer in OpenAI's Applied Group, you ...

Earnest LLC
San Francisco, California
Remote

As Senior Software Engineer, New Products, you will:. Mentorship: As a senior member of the team, you'll have the chance to mentor and guide junior engineers. Collaboration: You'll work closely with cross-functional teams, from product development to data engineering. ...

BHO Tech
San Francisco, California

We are hiring a Senior Software Engineer for our Data Platforms team in San Francisco. We are looking for strong engineer to help build out our multiple services. We’re looking for engineers located in San Francisco. Maintain infrastructure-as-code in an AWS environment. ...

Snowflake
San Mateo, California

You’ll be part of the cloud engineering organization where we have a strong focus on using engineering and software practices to manage and scale our cloud infrastructure. AS A SENIOR ENGINEER, CONTAINER PLATFORM ENGINEER AT SNOWFLAKE, YOU WILL:. Working in cloud engineering, you’ll lead and contrib...

LogicMonitor
San Francisco, California

This role is key to helping LogicMonitor accelerate growth by being a foundational contributor of the engineering team, helping us to become a best-in-class infrastructure software product engineering team. We are seeking an experienced Senior UI Engineer (SUIE) that is ready to advance to the next ...

Motion Recruitment
X, California, United States

We are seeking a talented and experienced Senior Back End Software Engineer to join our dynamic team and play a crucial role in the development and optimization of our core platform. As a Senior Back End Software Engineer, you will be responsible for designing, implementing, and maintaining scalable...

Flexport
San Francisco, California

Flexport is looking for a Senior Security Engineer to help Flexport establish itself as the most trusted company in the global trade ecosystem. As a Security Engineer, Enterprise Infrastructure, you will be responsible for enabling visibility across our enterprise, deploying and managing commercial ...

BHO Tech
San Francisco, California

As a Senior Software Engineer on the Mobile team, you’ll pioneer the next generation of mobile games, tools, and features, with rapid release cycles that will an immediate impact. A solid foundation in computer science, with strong competencies in data structures, algorithms, software design and obj...