Data Scientist Specialist

Kanak Elite Services IncMcLean, VA, United States

job_description.job_card.variable_hours_ago

serp_jobs.job_preview.job_type

serp_jobs.job_card.temporary

serp_jobs.filters_job_card.quick_apply

job_description.job_card.job_description

IN-PERSON INTERVIEW

Need locals

Job Title : Data Scientist Specialist

Location : McLean, Virginia 22102 (Fully 5 days onsite)

Duration : 06+ Months Contract

Need Local Candidates, In-person interview is required

Supplier Vetting Questions :

Assessment testing on the must have listed is required for candidates to be considered.
What is RAG and how have you implemented it?
What is Semantic Similarity Search and how can this be done?
What is the formula for Precision?

Must Have Qualifications :

Must have hands on experience with machine learning transitioned into GenAI. Rag, Python Jupyter, other Software knowledge, using agents in workflows, strong understanding of data.

Required Qualifications :

MS / PhD in AI / Data Science

10 plus years of experience in AI / ML, with 3 plus years in applied GenAI or LLM based solutions.

Deep expertise in prompt engineering, finetuning, RAG, GraphRAG, vector databases (e.g., AWS Knowledge Base / Elastic), and multi-modal models.

Proven experience with cloud native AI development (AWS SageMaker, Bedrock, MLFlow on EKS).

Strong programming skills in Python and ML libraries (Transformers, LangChain, etc.).

Deep understanding of Gen AI system patterns and architectural best practices, Evaluation Frameworks

Demonstrated ability to work in cross-functional agile teams.

Need Github Code Repository Link for each candidate. Please thoroughly vet the candidates.

Preferred Qualifications :

Published contributions or patents in AI / ML / LLM domains.

Handson experience with enterprise AI governance and ethical deployment frameworks.

Familiarity with CI / CD practices for ML Ops and scalable inference APIs.

Position Summary :

Client is seeking a highly experienced Principal Gen AI Scientist with a strong focus on Generative AI (GenAI) to lead the design and development of cutting-edge AI Agents, Agentic Workflows and Gen AI Applications that solve complex business problems.

This role requires advanced proficiency in Prompt Engineering, Large Language Models (LLMs), RAG, Graph RAG, MCP, A2A, multimodal AI, Gen AI Patterns, Evaluation Frameworks, Guardrails, data curation, and AWS cloud deployments.

Candidates will serve as a hands-on Gen AI (data) scientist and critical thought leader, working alongside full stack developers, UX designers, product managers and data engineers to shape and implement enterprise-grade Gen AI solutions.

Key Responsibilities :

Architect and implement scalable AI Agents, Agentic Workflows and GenAI applications to address diverse and complex business use cases.

Develop, finetune, and optimize lightweight LLMs; lead the evaluation and adaptation of models such as Claude (Anthropic), Azure OpenAI, and open-source alternatives.

Design and deploy Retrieval Augmented Generation (RAG) and Graph RAG systems using vector databases and knowledge bases.

Curate enterprise data using connectors integrated with AWS Bedrock's Knowledge Base / Elastic

Implement solutions leveraging MCP (Model Context Protocol) and A2A (Agentto-Agent) communication.

Build and maintain Jupyterbased notebooks using platforms like SageMaker and MLFlow / Kubeflow on Kubernetes (EKS).

Collaborate with cross-functional teams of UI and microservice engineers, designers, and data engineers to build full-stack Gen AI experiences.

Integrate GenAI solutions with enterprise platforms via APIbased methods and GenAI standardized patterns.

Establish and enforce validation procedures with Evaluation Frameworks, bias mitigation, safety protocols, and guardrails for productionready deployment.

Design & build robust ingestion pipelines that extract, chunk, enrich, and anonymize data from PDFs, video, and audio sources for use in LLMpowered workflows-leveraging best practices like semantic chunking and privacy controls

Orchestrate multimodal pipelines

using scalable frameworks (e.g., Apache Spark, PySpark) for automated ETL / ELT workflows appropriate for unstructured media

Implement embeddings drives-map media content to vector representations using embedding models, and integrate with vector stores (AWS KnowledgeBase / Elastic / Mongo Atlas) to support RAG architectures.

Preferred :

Built AI agent, MCP, A2A, Graph Rag, deployed Gen AI applications to production

Feel free to reach me at gauravverma@kanakits.com

serp_jobs.job_alerts.create_a_job

Data Scientist • McLean, VA, United States