Search jobs > Austin, TX > Data platform engineer

Platform Networking Systems Design Engineer - Data Center GPU

AMD
Austin, TX, US
Full-time

WHAT YOU DO AT AMD CHANGES EVERYTHING

We care deeply about transforming lives with AMD technology to enrich our industry, our communities, and the world. Our mission is to build great products that accelerate next-generation computing experiences the building blocks for the data center, artificial intelligence, PCs, gaming and embedded.

Underpinning our mission is the AMD culture. We push the limits of innovation to solve the world's most important challenges.

We strive for execution excellence while being direct, humble, collaborative, and inclusive of diverse perspectives.

AMD together we advance

THE ROLE :

We are seeking an engineer to join our team that will thrive in a fast-paced work environment, using effective communication, problem-solving and prioritization skills.

Individuals that are well organized, show great attention to detail, and employ critical thinking are well-suited for our team.

The Datacenter Graphics and Accelerated Computing (DCGPU) organization is looking for an experienced network system level debug engineer focused on Datacenter environments.

Individual will be part of a quality initiative that involves driving weekly production level parts through specific validation that includes stress, Technical Data Package verification (clocks, frequency, power), and BOM / EC verification in various network configurations.

Individual will need to be able to drive to root closure any issues encountered and communicate with the different IP layers for resolution.

THE PERSON :

This AMD (Advanced Micro Devices) team is looking for a senior level person that can help guide the team, mentor upcoming developers, provide long range strategy, and is willing to jump in to help resolve issues quickly.

You will be involved in all areas that impact the team including performance, automation, and development. The right candidate will be informed on the latest trends and become prepared to give consultative direction to senior management.

Person should be experienced in debugging of complex HW / FW issues, understand the flow of a GPU through the different layers of an SOC and system.

Communication is essential in working with different owners of the code stack as well the ability to drive issues via phone calls, chat messages.

KEY RESPONSIBILITIES :

  • A powerful desire to learn new skills and understand new features as they are added
  • Proven record of accomplishment of working within and across groups.
  • Effective communication skills
  • Responsible for exploring opportunities to improve product
  • Work closely with other team members to understand design architecture and to propose solutions to improve and enhance products
  • Debug / triage engineer for a new quality initiative
  • Understanding of GPU / System level HW and SW flow
  • Provide leadership for driving to root cause issues / bugs
  • Communicate / Document flows and methods of debug ability
  • Embedded coding for hardware components and respective drivers for network components
  • Assist with network prototypes and in-depth testing to validate the design
  • Formulate and define platform level validation test plans based on product / customer needs
  • Troubleshoot and resolve platform network issues
  • Provide customer support regarding network architectural questions, product prerequisites, and product features
  • Interface with networking partners and software / hardware engineers
  • Work with software developers on network performance enhancement

KEY QUALIFICATIONS :

  • Exposure to systems architecture
  • Minimum 10 yrs experience in System or SOC level debug and triage
  • Proven ability to drive resolution of critical problems within a lab, Datacenter
  • Relationship with external customers / partners and able to help resolve problems in their Data Center
  • Relationship with external customers / partners on ability to work manufacturing issues / failures
  • Relationship with external customers / partners on ability to define rqmts for manufacturing validation
  • 8+ years' working experience with network technologies including network selection and deployment in Datacenter environments
  • Experience with modern networking standards
  • Experience with mesh network routing protocols and switching protocols
  • Familiar with Ethernet and InfiniBand network designs and switch topologies
  • Linux Operating System as a development environment
  • Familiar with Ethernet and Infiniband networking in Linux and Windows environments
  • Familiar with Virtualization environments KVM and HyperV
  • RDMA network configuration, troubleshooting
  • Linux kernel networking expertise
  • System / Platform level debug tools.
  • Familiar with networking environments that utilizes HPC / ML / DL workloads
  • Hands on experience with lab equipment like oscilloscopes, protocol analyzers, power supplies, multi meter
  • Familiar with Platform / System bring up and validation of GPU networks intranode and internode. (Networking Adapters, cables, switches)

IDEAL CANDIDATE :

  • Significant experience in SoC and / or System debug of complex network issues
  • Develop / Document debug capabilities on a given SOC and System
  • Go-to-person for debugging of issues for the Production level Platform validation
  • Collaborate with internal teams on root causing issues, finding optimum resolutions

ACADEMIC CREDENTIALS :

Bachelor's or Master's in Electrical Engineer, Computer Engineering, Computer Science, or a closely related field

LOCATION : Austin, TX

Austin, TX

LI-SL2

At AMD, your base pay is one part of your total rewards package. Your base pay will depend on where your skills, qualifications, experience, and location fit into the hiring range for the position.

You may be eligible for incentives based upon your role such as either an annual bonus or sales incentive. Many AMD employees have the opportunity to own shares of AMD stock, as well as a discount when purchasing AMD stock if voluntarily participating in AMD's Employee Stock Purchase Plan.

You'll also be eligible for competitive benefits described in more detail here.

AMD does not accept unsolicited resumes from headhunters, recruitment agencies, or fee-based recruitment services. AMD and its subsidiaries are equal opportunity, inclusive employers and will consider all applicants without regard to age, ancestry, color, marital status, medical condition, mental or physical disability, national origin, race, religion, political and / or third-party affiliation, sex, pregnancy, sexual orientation, gender identity, military or veteran status, or any other characteristic protected by law.

We encourage applications from all qualified candidates and will accommodate applicants' needs under the respective laws throughout all stages of the recruitment and selection process.

30+ days ago
Related jobs
Promoted
Apple
Austin, Texas

The GBI Big Data Engineering team is responsible for building and managing data platforms at scale on Cloud, that help Apple process, store and access petabytes of data. From Apple Pay to the Apple website to our data centers around the globe, you'll help design and manage the massive systems that c...

Promoted
Cadence Design Systems
TX, United States

This is an opportunity to join a development team designing state-of-the-art DDR memory controllers to be used in a wide range of applications including Datacenter, Edge computing, Automotive, and AI. Work across disciplines with Design Verification, Support, Delivery, Application Engineers, PHY des...

Promoted
Mathematica
Austin, Texas
Remote

Lead the design of data pipelines and perform analytics involving large administrative data sources such as state Medicaid claims and APCDs, as well as clinical data sources such as EHRs and HIEs, to effectively answer research questions about health care policy. Read more about our benefits here: M...

Amazon Data Services, Inc.
Austin, Texas

Amazon Web Services is seeking an Electrical Engineer to become part of a global engineering team, responsible for the design and continuous innovation of our rapidly expanding data center foot print. If you can design an electrical system, have an understanding of the critical equipment needs for a...

Cadence Design Systems, Inc.
Austin, Texas

At Cadence, we hire and develop leaders and innovators who want to make an impact on the world of technology.Cadence is looking for a Standard Cell expert to develop exploratory libraries on Advanced technology nodes with experience in DTCO.Develop novel standard cell architectures and implement lay...

WilsonHCG
Austin, Texas

Senior Security Design Engineer. Security System Design Engineer to cover the Austin and surrounding areas. Design fully integrated security solutions. Develop project design and material list. ...

Apple
Austin, Texas

Extensive expertise in distributed data processing and storage technologies, such as Apache Spark, Apache Flink, Apache Kafka, NoSql database systems and/or similar technologies. Good understanding of database connectivity and data access, connecting to data sources and SQL, understanding of REST AP...

AMD
Austin, Texas

We are looking for a dynamic, energetic Lead Systems Design Engineer to join our growing team. The Systems Design Engineering team fosters and encourages continuous technical innovation to showcase successes as well as facilitate continuous career development. This individual will interface with Sil...

Cadence Design Systems, Inc.
Austin, Texas

Will work closely with customers on bringing up flows at advanced nodes, and solving challenges in meeting power, performance and area (PPA) in vertical markets such as datacenter, ML/AI, networking and processors. Principal Application Engineer responsible for providing pre-sales and post-sales tec...

AMD
Austin, Texas

Our mission is to build great products that accelerate next-generation computing experiences – the building blocks for the data center, artificial intelligence, PCs, gaming and embedded. Collaborate with board engineering to develop custom development platforms for measuring power, including staying...