Search jobs > Sunnyvale, CA > Network engineer

Network and Server Performance Test Engineer

CEREBRAS SYSTEMS INC.
Sunnyvale, California, US
Full-time

Cerebras Systems has pioneered a groundbreaking chip and system that revolutionizes deep learning applications. Our system empowers ML researchers to achieve unprecedented speeds in training and inference workloads, propelling AI innovation to new horizons.

Making sure you fit the guidelines as an applicant for this role is essential, please read the below carefully.

The Condor Galaxy 1 (CG-1), unveiled in a recent announcement, stands as a testament to Cerebras' commitment to pushing the boundaries of AI computing.

With a staggering 4 ExaFLOP processing power, 54 million cores, and 64-node architecture, the CG-1 is the first of nine powerful supercomputers to be built and operated through an exclusive partnership between Cerebras and G42.

This strategic collaboration aims to redefine the possibilities of AI by creating a network of interconnected supercomputers that will collectively deliver a mind-boggling 36 ExaFLOPS of AI compute power upon completion in 2024.

The Role

Evaluate and recommend Data Center equipment including Switches, Routers, Server, NICs, Transceivers for next generation infrastructure, with focus on performance and cost improvement.

Responsibilities

  • Identify experiments, tools, and methodology to test complex Data Center equipment including Switches, Routers, Server, NICs, Transceivers that push the frontier in hardware design and system integration.
  • Co-work with equipment vendors to evaluate the performance of newly introduced hardware, and to resolve defects.
  • Design and setup test lab, test beds to exercise and evaluate vendor equipment.
  • Work with architects, software engineers to create test cases, write test scripts, execute tests, and document results of evaluation of solution from different vendors.
  • Troubleshoot, isolate, and drive issues to resolution through partnerships with other teams and vendors.
  • Provide solutions for efficient networking design for AI infrastructure.
  • Design, install, configure, and maintain complex Network for AI Infrastructure.
  • Build up and optimize server system benchmarks based on deep understanding of server system architect, and workload characterization.

Qualifications / Skills Required :

  • Master’s degree or higher in Electrical Engineering, Computer Engineering, Computer Science, or related majors.
  • 5+ years experience in Software Development, Quality Assurance, System Test of Switches and Routers at a Networking equipment vendor.
  • Understanding of RDMA congestion control mechanisms on InfiniBand and RoCE Networks.
  • Must have deep understanding of networking protocols BGP, PFC, ECN, QoS, MLAG, ECMP, and VRF.
  • Experience with computer system architecture, especially on CPU SoC or Platform Architecture, Interconnect Fabric, and Memory sub-system.
  • Experience designing and implementing large switching and routing networks.
  • Strong technical abilities, problem-solving, design, coding, and debugging skills.
  • Expertise in Linux tools such as lspci, ping, traceroute, tcpdump, ifconfig, ip link, ip route, arp, / proc / net, / proc / sys / net, vmstat, netstat, ttcp, iperf, strac, memtest, fio, ozone, and iometer.
  • Must be proficient in python.
  • Proficient in Networking Test Tools like IXIA and Smartbits.

Why Join Cerebras

People who are serious about software make their own hardware. At Cerebras we have built a breakthrough architecture that is unlocking new opportunities for the AI industry.

With dozens of model releases and rapid growth, we’ve reached an inflection point in our business. Members of our team tell us there are five main reasons they joined Cerebras :

  • Build a breakthrough AI platform beyond the constraints of the GPU.
  • Publish and open source their cutting-edge AI research.
  • Work on one of the fastest AI supercomputers in the world.
  • Enjoy job stability with startup vitality.
  • Our simple, non-corporate work culture that respects individual beliefs.

Apply today and become part of the forefront of groundbreaking advancements in AI.

Cerebras Systems is committed to creating an equal and diverse environment and is proud to be an equal opportunity employer.

We celebrate different backgrounds, perspectives, and skills. We believe inclusive teams build better products and companies.

We try every day to build a work environment that empowers people to do their best work through continuous learning, growth and support of those around them.

J-18808-Ljbffr

1 day ago
Related jobs
Promoted
Apple
Sunnyvale, California

Knowledge in wireless communication systems: WiFi and/or Cellular Technologies Understanding of key performance indicators and impairments of RF and Baseband signals. You will alsodevelop the test and test infrastructure for the validation and regression at the center of a silicon design group on th...

Promoted
Apple, Inc.
Cupertino, California

Build and implement test plans; collect and analyze test results; identify, root-cause, and debug issues. In this technical role, you will be responsible for system integration and characterization of state-of-the-art wireless SoC products, and lead the test development at the center of a silicon de...

Promoted
Apple
Sunnyvale, California

Your experience using and understanding the test equipment to identify and zero in on issues and your software skills to configure and probe the chip. You will engage with the first silicon arrives to bringup, integrate, tune and debug the Phy and radio system operation and performance. You will use...

Promoted
Google Cloud - Minnesota
Sunnyvale, California

The team focuses on the evaluation, analysis, design, debugging, and optimization of Google's storage, data analytics, and database platforms, and partners with Spanner, Bigtable, Sawmill Logs, Search Indexing, Napa, BigQuery, and Flume teams to drive performance optimizations and resource efficienc...

Promoted
Muon Space inc
Mountain View, California

Lead test efforts including defining test strategy and standards, creating test plans, executing testing, post-processing/analysis, and generating test reports. Previous experience working to develop and operate infrared electro-optical systems, remote sensing instruments, and/or test infrastructure...

Palo Alto Networks
Santa Clara, California

You will build, automate, and run performance testing scenarios for our products in virtualized elements. You'll work closely with our development and product management teams, testing and presenting the results to identify new approaches. You will drive the efficiency and reliability of our product...

Comtech
Santa Clara, California

Ability to troubleshoot moderate to complex IP networks, which requires in-depth understanding of the ISO layers of IP networking as well as familiarity with routers, switches, and standard IP protocol analyzers. Writes technical reports and develops charts, graphs, and schematics to describe and il...

Amazon Data Services, Inc.
Cupertino, California

Do you want to build the backbone of Generative AI cloud at AWS? Do you want to build the future of the cloud for AI training and inference? Want to do industry leading work delivering continuous price performance improvements in the cloud for AI model training for multi billion variable LLMs? Come ...

Oracle
Santa Clara, California

The Platform ILOM team of Oracle Hardware Development (OHD) is looking for a self-motivated, talented Embedded Software developer to bring exceptional technical skills to join a growing, distributed, multifunctional team developing and maintaining OHD’s latest embedded Linux software for new and exc...

Ledgent Technology
San Jose, California

Test Automation Engineer Using TestStand and LabView. RF Power Systems, Test Automation, Refactoring, TestStand Developer, LabView Developer,Python, C# / JavaScript, EtherCAT. Needs to be able to integrate test stand with these tools. We consider all qualified applicants, including those with crimin...