We are seeking an exceptional Senior Software Engineer to build the foundational infrastructure for our next-generation AI-powered transcriptome analysis platform. This role combines cutting edge software engineering with the demands of processing petabyte-scale genomic data and orchestrating complex AI workflows. You will create the robust, scalable systems that enable our LLM and Agentic AI components to transform biological research from traditional pipelines to intelligent, autonomous discovery platforms.
Key Responsibilities
Platform Architecture & Development
Design and implement distributed systems for processing petabyte-scale genomic datasets
Build high-performance APIs supporting 10,000+ concurrent AI agent requests
Develop microservices architecture for modular AI component integration
Create real-time data streaming pipelines for continuous genomic analysis
Implement fault-tolerant systems with 99.99% uptime requirements
AI Infrastructure Engineering
Build scalable infrastructure for LLM deployment and inference optimization
Develop orchestration systems for multi-agent AI workflows
Create GPU / TPU cluster management for distributed AI processing
Implement caching strategies for billion-parameter model inference
Design model versioning and A / B testing frameworks
Data Engineering & Processing
Develop high-throughput pipelines for RNA-seq data processing
Implement efficient storage solutions for 20,000+ gene expression matrices
Create data validation and quality control frameworks
Build real-time monitoring for genomic data integrity
Design compression algorithms for efficient genomic data storage
Integration & Interoperability
Create unified APIs connecting LLMs, agents, and biological databases
Implement FHIR-compliant interfaces for clinical data integration
Build connectors for major genomic databases (GEO, TCGA, GTEx)
Develop webhook systems for laboratory instrument integration
Create SDKs for researcher and clinical user access
Required Qualifications
Technical Expertise
BS / MS in Computer Science, Software Engineering, or related field
5+ years of software engineering experience with Python as primary language
Expert-level proficiency in Python async programming and frameworks (FastAPI, asyncio)
Strong experience with distributed systems (Kubernetes, Docker, microservices)
Proven track record with high-throughput data processing systems
Deep understanding of database systems (PostgreSQL, MongoDB, Redis)
Infrastructure & DevOps
Experience with cloud platforms (AWS, GCP, or Azure) at scale
Proficiency with infrastructure as code (Terraform, Pulumi)
Strong background in CI / CD pipelines and GitOps practices
Experience with observability tools (Prometheus, Grafana, ELK stack)
Knowledge of message queuing systems (Kafka, RabbitMQ, Celery)
AI / ML Engineering
Experience deploying and scaling ML models in production
Familiarity with ML frameworks (PyTorch, TensorFlow) from an engineering perspective
Understanding of GPU programming and optimization
Experience with model serving frameworks (TorchServe, TensorFlow Serving, Ray Serve)
Preferred Qualifications
Experience with bioinformatics tools and pipelines
Knowledge of genomic data formats (FASTQ, BAM, VCF)
Familiarity with scientific computing (NumPy, SciPy, Pandas)
Understanding of HIPAA compliance and healthcare data security
Experience with real-time systems and streaming architectures
Background in building developer platforms and APIs
Contributions to open-source projects
Key Performance Metrics
Achieve
Support 1M+ daily genomic analyses with linear scaling
Maintain 99.99% platform uptime with zero data loss
Reduce infrastructure costs by 40% through optimization
Enable 5x faster genomic pipeline execution
Successfully integrate 10+ external biological databases
Integration Responsibilities
Team Collaboration
Partner with LLM Engineers to optimize model serving infrastructure
Support Agentic AI Engineers with scalable agent execution platforms
Collaborate with Bioinformaticians on pipeline optimization
Work with Security teams on HIPAA-compliant implementations
Platform Leadership
Define engineering standards and best practices
Mentor junior engineers on distributed systems design
Lead architecture reviews and technical decision-making
Drive adoption of new technologies and methodologies
Technical Stack
Core Technologies
Languages : Python (primary), Go, Rust (performance-critical components)
Frameworks : FastAPI, Celery, Ray, Dask
Databases : PostgreSQL, MongoDB, Redis, InfluxDB
Infrastructure : Kubernetes, Docker, Terraform, ArgoCD
Monitoring : Prometheus, Grafana, OpenTelemetry
ML / AI : PyTorch, Ray Serve, MLflow, Weights & Biases
Domain-Specific Tools
Genomics : Nextflow, Snakemake, CWL
Data Formats : Apache Parquet, HDF5, Zarr
Compute : SLURM, AWS Batch, Google Cloud Life Sciences
What We Offer
Build infrastructure powering the future of precision medicine
Work with cutting-edge AI and genomics technologies
Collaborate with world-class engineers and scientists
Comprehensive benefits with equity participation
$5,000 annual learning and development budget
Top-tier hardware and development environment
Flexible remote work with quarterly team offsites
The Engineering Challenge
This role offers unique engineering challenges at the intersection of :
Scale : Processing petabytes of genomic data daily
Performance : Sub-second response times for complex biological queries
Reliability : Clinical-grade system reliability
Innovation : Enabling autonomous AI agents in biological discovery
Senior Software Engineer • Frisco, TX, US