Talent.com
Senior Site Reliability Engineer / HPC - Pre-IPO Tech Leader
Senior Site Reliability Engineer / HPC - Pre-IPO Tech LeaderAndiamo • New York, NY, US
Senior Site Reliability Engineer / HPC - Pre-IPO Tech Leader

Senior Site Reliability Engineer / HPC - Pre-IPO Tech Leader

Andiamo • New York, NY, US
job_description.job_card.1_day_ago
serp_jobs.job_preview.job_type
  • serp_jobs.job_card.full_time
job_description.job_card.job_description

Senior Site Reliability Engineer / HPC - Pre-IPO Tech Leader

We are seeking a highly skilled Senior Site Reliability Engineer (SRE) / High-Performance Computing (HPC) Engineer to design, build, and operate the large-scale infrastructure that powers a $2.5B pre-IPO technology company. Our systems run on massive distributed clusters, handling some of the most demanding workloads in cloud, AI, and data-driven computing.

In this role, you will be responsible for ensuring the reliability, scalability, and performance of mission-critical platforms. You will optimize HPC workloads, streamline CI / CD for large-scale clusters, and enable research and product teams to deliver innovations with speed and confidence. This is a hands-on position with the opportunity to influence architecture, lead reliability initiatives, and solve some of the hardest problems in distributed systems and performance engineering.

What You'll Do

  • Design Reliable Infrastructure : Architect and maintain large-scale, distributed HPC and cloud-native systems with a focus on uptime, scalability, and resilience.
  • Optimize HPC Workloads : Tune scheduling, job orchestration, and performance for compute- and memory-intensive workloads (AI / ML, simulations, large-scale analytics).
  • Build Observability : Implement monitoring, logging, and alerting systems that provide full visibility into cluster and service health.
  • Automate Everything : Develop tooling and automation for provisioning, scaling, and recovery of critical systems.
  • Ensure Security & Compliance : Implement best practices for access control, encryption, and governance across HPC and cloud environments.
  • Collaborate Cross-Functionally : Work with engineering, research, and product teams to deliver reliable infrastructure for next-gen applications.
  • Incident Response : Lead troubleshooting, root cause analysis, and postmortems for high-severity incidents.

What We're Looking For

  • Professional Experience : 7+ years in SRE, infrastructure engineering, or HPC roles with a proven track record of supporting large-scale distributed systems.
  • Technical Skills : Expertise in Linux systems, Python or Go, and infrastructure-as-code (Terraform, Ansible, or similar).
  • HPC Expertise : Strong knowledge of job schedulers (Slurm, Kubernetes, or Mesos), workload managers, and parallel / distributed computing.
  • Cloud & Hybrid : Hands-on experience with AWS, GCP, or Azure in combination with on-premises HPC clusters.
  • Observability : Proficiency with monitoring and logging frameworks (Prometheus, Grafana, ELK, OpenTelemetry).
  • Resilience Engineering : Experience with chaos engineering, failure testing, and disaster recovery planning.
  • Collaboration : Strong communication skills and the ability to work with research scientists, engineers, and operations teams.
  • Education : Bachelor's or Master's degree in Computer Science, Engineering, or related field.
  • This is an opportunity to join a pre-IPO technology leader valued at $2.5B, at a time of rapid growth and innovation. As a Senior SRE / HPC Engineer, you will shape the infrastructure that powers next-generation AI, analytics, and large-scale computing.

    J-18808-Ljbffr

    serp_jobs.job_alerts.create_a_job

    Senior Site Reliability Engineer • New York, NY, US

    Job_description.internal_linking.related_jobs
    Sr. Site Reliability Engineer

    Sr. Site Reliability Engineer

    CENTRL Inc. • New York, NY, United States
    serp_jobs.job_card.full_time
    CENTRL is a rapidly growing Silicon Valley technology company specializing in third-party risk, due diligence, cyber risk, and security. With offices in the SF Bay Area, NY, Australia, and India, CE...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
    Executive Partner, AMSE CIO / CTO Advisory : Diversified or Life Sciences

    Executive Partner, AMSE CIO / CTO Advisory : Diversified or Life Sciences

    Gartner • Stamford, CT, United States
    serp_jobs.job_card.full_time
    Gartner Executive Programs (ExP) is a service within Gartner Executive Technology Services (ETS) and is the indispensable tool for digital leaders. It is an exclusive, membership-based organization ...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
    Executive Partner, CIO / CTO Advisory for CPG / Retail / Manufacturing

    Executive Partner, CIO / CTO Advisory for CPG / Retail / Manufacturing

    Gartner • Stamford, CT, United States
    serp_jobs.job_card.full_time
    Gartner Executive Programs (ExP) is a service within Gartner Executive Technology Services (ETS) and is the indispensable tool for digital leaders. It is an exclusive, membership-based organization ...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
    Site Reliability Engineer (Hybrid)

    Site Reliability Engineer (Hybrid)

    Selective Insurance • Millburn, NJ, United States
    serp_jobs.job_card.temporary
    At Selective, we don't just insure uniquely, we employ uniqueness.Selective's unique position as both a leading insurance group and an employer of choice is recognized in a wide variety of awards a...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Triangle Workforce • New York, New York, United States
    serp_jobs.job_card.full_time
    serp_jobs.filters_job_card.quick_apply
    Site Reliability Engineer, Commodities Technology.Ensure high availability and uptime of Commodities Technology services and applications. Automate and streamline manual processes.Contribute to root...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_30
    Reliability and Electromechanical Team Lead

    Reliability and Electromechanical Team Lead

    Dr. Praeger's Sensible Foods • Elmwood Park, NJ, US
    serp_jobs.job_card.full_time
    Uptime champion sought : Are you ready to lead from the front?.Imagine joining a food company started by two heart surgeons who believed better eating should be simple.That’s been our trajecto...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_1_day • serp_jobs.job_card.promoted
    E2E Team Lead Planning

    E2E Team Lead Planning

    Novartis Group Companies • Morris Plains, NJ, United States
    serp_jobs.job_card.full_time
    Cell & Gene therapies to ensure our patients have the treatments they need to live longer, healthier lives.As a Team Leader, you will be responsible for ensuring operational efficiency, effective c...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
    Resident Engineer / Senior Construction Inspector

    Resident Engineer / Senior Construction Inspector

    Traffic Planning & Design • Freehold, NJ, US
    serp_jobs.job_card.full_time
    Ranked the overall #4 Best Civil Engineering Firm to Work for in the Nation (#1 in our size category) and residing on the list of the Engineering News Record (ENR)’s Top 500 Design Firms in t...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
    SAP FICO Specialist

    SAP FICO Specialist

    Terumo Medical Corporation • Somerset, NJ, United States
    serp_jobs.job_card.full_time
    The SAP FI-CO Specialist will collaborate with business stakeholders, cross-functional teams, external project consultants and COE team members to design, build, test, and deploy solutions in SAP S...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
    Sr. Process Engineer

    Sr. Process Engineer

    VB Spine • Allendale, NJ, United States
    serp_jobs.job_card.full_time
    Leesburg, VA (preferred) or Allendale, NJ.Looking for a career where your work truly matters? At VB Spine, you’ll be part of a mission-focused team that supports surgeons during life-changing spina...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_hours • serp_jobs.job_card.promoted • serp_jobs.job_card.new
    Field Engineer

    Field Engineer

    Henkels & McCoy, Inc. • Farmingdale, NJ, United States
    serp_jobs.job_card.full_time
    Founded in 1923, the firm adapts a century of experience to the dynamic infrastructure needs of today.H&M leverages the collective strength of its diverse disciplines to create seamless integration...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
    Civil-Site Project Engineer CM

    Civil-Site Project Engineer CM

    GAI Consultants Inc. • Westchester, NY, United States
    serp_jobs.job_card.full_time
    Creighton Manning, a GAI Company is seeking a talented Civil-Site Project Engineer.Are you looking for a collaborative work environment where you can work on exciting and vital transportation proje...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
    Senior Engineer / Equipment Qualification

    Senior Engineer / Equipment Qualification

    Novartis Group Companies • East Hanover, NJ, United States
    serp_jobs.job_card.full_time
    This position will be located at East Hanover, NJ site and will not have the ability to be located remotely.Please note that this role would not provide relocation and only local candidates will be...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
    Senior Claim Specialist, Environmental

    Senior Claim Specialist, Environmental

    Axis Capital • Red Bank, NJ, United States
    serp_jobs.job_card.full_time
    This is your opportunity to join AXIS Capital - a trusted.We stand apart for our outstanding client service, intelligent risk taking and superior risk adjusted returns for our shareholders.We also ...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
    Senior Engineer

    Senior Engineer

    Matlen Silver • Mahwah, NJ, United States
    serp_jobs.job_card.full_time
    Job Title : Senior Engineer – Modeling & Simulation.Months (with potential for extension).In this role, you will leverage your technical expertise to support product development through analytical m...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
    Lead Diver

    Lead Diver

    COASTAL ENGINEERING LLC • West Milford, NJ, US
    serp_jobs.job_card.full_time
    Coastal Engineering is a rapidly growing VOSB specializing in marine-related construction services.We are seeking a qualified and experienced Dive Supervisor with at least 8 years of experience in ...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
    Sr. Manufacturing Systems Engineer

    Sr. Manufacturing Systems Engineer

    Meet Life Sciences • Somerset, NJ, United States
    serp_jobs.job_card.full_time
    Job Title : Senior Manufacturing Systems Engineer.Location : Somerset, NJ (On-Site).We are seeking a Senior Manufacturing Systems Engineer to lead the design, delivery, and maintenance of Manufacturi...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_hours • serp_jobs.job_card.promoted • serp_jobs.job_card.new
    Lead CT Tech

    Lead CT Tech

    Hackensack Meridian Mountainside Medical Center • Montclair, NJ, United States
    serp_jobs.job_card.full_time
    Join our team as an evening shift, full-time, Cat Scan-A Lead CT Technician in Montclair, NJ.You may be eligible for a sign on bonus of up to $10,000. Thrive in a People-First Environment and Make H...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
    Chief Engineer, TPM

    Chief Engineer, TPM

    New York City Department of Transportation • Queens, NY, United States
    serp_jobs.job_card.full_time
    Transportation Planning and Management (TPM) is responsible for the safe, efficient, and environmentally responsible movement of people and goods on the City's streets, supporting the larger goals ...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
    SAP EAM Technical Lead

    SAP EAM Technical Lead

    Chesapeake Utilities Corporation • Newark, NJ, United States
    serp_jobs.job_card.full_time
    Hybrid Remote - periodic travel to Newark, DE.Must reside or be willing to relocate to TX, IL, NJ, OH, DE, MD, PA, VA, NC, GA, FL. The SAP EAM Technical Lead is responsible for the technical archite...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted