Description :
Protingent Staffing has an exciting Remote Direct Hire opportunity.
Job Responsibilities :
- Spearhead the development of cutting-edge data products by adapting and extending Vision-Language Models (VLMs) and other multimodal foundation models. This includes applying advanced techniques like fine-tuning, RAG, in-context learning, continual pre-training, and knowledge distillation.
- Design and curate high-quality multimodal datasets crucial for training and evaluating multimodal foundation models. This includes developing innovative strategies for data curation, dataset creation, and synthetic data generation to optimize multimodal foundation models for long-tail event mining.
- Drive the in-depth analysis of multimodal foundation models' performance, generalization, and robustness in diverse real-world settings
Job Qualifications :
MS / PhD in computer science or related fields with a strong emphasis on multimodal foundation modelsStrong publication record in premier conferences (e.g., CVPR, ICCV, ECCV, NeurIPS, ICML, ICLR) demonstrating significant contributions to the field of vision-language understanding or multimodal foundation modelsProficiency in Python and deep learning frameworks such as PyTorch, with a demonstrated ability to write clean, efficient, and maintainable codePreferred Job Qualifications :
Experience in the application of Vision-Language Models (VLMs) or other multimodal foundation models to data mining in real-world settingsExperience in production deployment of Vision-Language Models (VLMs) or other multimodal foundation models for real-world applications (e.g., image / video captioning, open-vocabulary image / video searching)Experience with data from diverse sensor modalities (e.g., camera, lidar, radar)Experience in applied machine learning for autonomous drivingJob Details :
Job Type : Direct HireLocation : RemoteSalary Range : $175 – 234k / year