Talent.com
Machine Learning Engineer, LLM Fine-Tuning

Machine Learning Engineer, LLM Fine-Tuning

First Soft SolutionsSan Jose, CA, United States
job_description.job_card.variable_days_ago
serp_jobs.job_preview.job_type
  • serp_jobs.job_card.full_time
  • serp_jobs.filters_job_card.quick_apply
job_description.job_card.job_description

We are actively hiring for Machine Learning Engineer

Location : San Jose, CA (Onsite)

Skills : LLM Fine Tuning (Verilog / RTL Applications) AWS (primary; Bedrock + SageMaker)

Own the technical roadmap for Verilog / RTL focused LLM capabilities-from model selection and adaptation to evaluation, deployment, and continuous improvement.

  • Lead a hands on team of applied scientists / engineers : set direction, unblock technically, review designs / code, and raise the bar on experimentation velocity and reliability.
  • Fine tune and customize models using state of the art techniques (LoRA / QLoRA, PEFT, instruction tuning, preference optimization / RLAIF) with robust HDL specific evals :

Compile / lint / simulate based pass rates, pass@k for code generation, constrained decoding to enforce syntax, and "does it synthesize" checks.

  • Design privacy first ML pipelines on AWS :
  • Training / customization and hosting using Amazon Bedrock (including Anthropic models) where appropriate; SageMaker (or EKS + KServe / Triton / DJL) for bespoke training needs.

  • Artifacts in S3 with KMS CMKs; isolated VPC subnets & PrivateLink (including Bedrock VPC endpoints ), IAM least privilege, CloudTrail auditing, and Secrets Manager for credentials.
  • Enforce encryption in transit / at rest, data minimization, no public egress for customer / RTL corpora.
  • Stand up dependable model serving : Bedrock model invocation where it fits, and / or low latency self hosted inference (vLLM / TensorRT LLM), autoscaling, and canary / blue green rollouts.
  • Build an evaluation culture : automatic regression suites that run HDL compilers / simulators, measure behavioral fidelity, and detect hallucinations / constraint violations; model cards and experiment tracking (MLflow / Weights & Biases).
  • Partner deeply with hardware design, CAD / EDA, Security, and Legal to source / prepare datasets (anonymization, redaction, licensing), define acceptance gates, and meet compliance requirements.
  • Drive productization : integrate LLMs with internal developer tools (IDEs / plug ins, code review bots, CI), retrieval (RAG) over internal HDL repos / specs, and safe tool use / function calling.
  • Mentor & uplevel : coach ICs on LLM best practices, reproducible training, critical paper reading, and building secure by default systems.
  • 10+ years total engineering experience with 5+ years in ML / AI or large scale distributed systems; 3+ years working directly with transformers / LLMs.
  • Proven track record shipping LLM powered features in production and leading ambiguous, cross functional initiatives at Staff level.
  • Deep hands on skill with PyTorch , Hugging Face Transformers / PEFT / TRL , distributed training (DeepSpeed / FSDP), quantization aware fine tuning (LoRA / QLoRA), and constrained / grammar guided decoding.
  • AWS expertise to design and defend secure enterprise deployments, including :
  • Amazon Bedrock (model selection, Anthropic model usage, model customization, Guardrails, Knowledge Bases, Bedrock runtime APIs, VPC endpoints)

  • SageMaker (Training, Inference, Pipelines), S3 , EC2 / EKS / ECR , VPC / Subnets / Security Groups , IAM , KMS , PrivateLink , CloudWatch / CloudTrail , Step Functions , Batch , Secrets Manager .
  • Strong software engineering fundamentals : testing, CI / CD, observability, performance tuning; Python a must (bonus for Go / Java / C++).
  • Demonstrated ability to set technical vision and influence across teams; excellent written and verbal communication for execs and engineers.
  • serp_jobs.job_alerts.create_a_job

    Machine Learning Engineer • San Jose, CA, United States