Member of Technical StaffPlacement Services USA, Inc. • San Francisco, CA, United States

Member of Technical Staff

Placement Services USA, Inc. • San Francisco, CA, United States

job_description.job_card.variable_hours_ago

serp_jobs.job_preview.job_type

serp_jobs.job_card.full_time

job_description.job_card.job_description

Leading development of systems and libraries for large-scale generation of domain-randomized synthetic data to train AI models and define and maintain the technical vision for synthetic data systems supporting large language models (LLMs), overseeing multimodal data from data generation or ingestion, processing, quality curation to training and evaluation. Develop high performance data-driven applications in Python requiring distributed computing and manage high-performance Python pipelines for extensive text and multimodal synthetic datasets aligned with Coheres engineering best practices. Lead research initiatives in multimodal synthetic data, employing advanced augmentation and domain randomization techniques to enhance model generalizability by writing 3D programming code in Unity with C# / .NET. Engage with clients to understand specific challenges and address model failures in their use-cases and design customized synthetic data strategies improving LLM performance. Build end-to-end solutions for transforming, viewing, filtering, and annotating datasets including custom frontend visualizations of data and design user-friendly front-end interfaces facilitating interactive dataset visualization, annotation, filtering, and diagnostic analysis for technical and non-technical users. Direct implementation of large-scale computational environments to support resource-intensive model training and real-time operations and managing necessary infrastructure for large-scale synthetic data generation using Google Cloud Platform. Champion usability and performance, enabling other teams to rapidly utilize synthetic data techniques for their own use-cases. Establish and enforce best practices for data security, storage efficiency, and cost-effective resource allocation within cloud infrastructures. Maintain ongoing feedback loops with stakeholders to continuously refine and enhance data pipelines and model effectiveness. Collaborate cross-functionally and handle data transformation, distribution, and annotation to improve the model in line with business / end-user objectives. Evaluate system performance regularly, applying strategic optimizations or updates to accommodate evolving synthetic data demands. May Telecommute.

Requires a Bachelors (or foreign educ. equiv.) Degree in Computer Science, Software Engineering, Computer Information Systems or closely related.

Two (2) years of experience in the job offered or related. Experience must have included 2 yrs. in each of the following : Leading development of systems and libraries for large-scale generation of domain-randomized synthetic data to train AI models; Developing high performance data-driven applications in Python requiring distributed computing; 3D game programming in Unity with C# / .NET; Working directly with clients to address model failures in their use-cases by developing and implementing customized synthetic data generation strategies; Building end-to-end solutions for transforming, viewing, filtering, and annotating datasets including custom frontend visualizations of data; Managing necessary infrastructure for large-scale synthetic data generation using Google Cloud Platform.

Please copy and paste your resume in the email body (do not send attachments, we cannot open them) and email it to candidates at (link removed) with reference #760034 in the subject line.

Thank you.

serp_jobs.job_alerts.create_a_job

Member Staff • San Francisco, CA, United States