Talent.com
Senior Data Acquisition Engineer

Senior Data Acquisition Engineer

People Data LabsSan Francisco, CA, US
job_description.job_card.30_days_ago
serp_jobs.job_preview.job_type
  • serp_jobs.job_card.permanent
job_description.job_card.job_description

Job Description

Job Description

Note for all engineering roles : with the rise of fake applicants and AI-enabled candidate fraud, we have built in additional measures throughout the process to identify such candidates and remove them.

About Us

People Data Labs (PDL) is the provider of people and company data. We do the heavy lifting of data collection and standardization so our customers can focus on building and scaling innovative, compliant data solutions. Our sole focus is on building the best data available by integrating thousands of compliantly sourced datasets into a single, developer-friendly source of truth. Leading companies across the world use PDL's workforce data to enrich recruiting platforms, power AI models, create custom audiences, and more.

We are looking for individuals who can balance extreme ownership with a "one-team, one-dream" mindset. Our customers are trying to solve complex problems, and we only help them achieve their goals as a team. Our Data Engineering & Acquisition Team ensures our customers have standardized and high quality data to build upon.

You will be crucial in accelerating our efforts to build standalone data products that enable data teams and independent developers to create innovative solutions at massive scale. In this role, you will be working with a team to continuously improve our existing datasets as well as pursuing new ones. If you are looking to be part of a team discovering the next frontier of data-as-a-service (DaaS) with a high level of autonomy and opportunity for direct contributions, this might be the role for you. We like our engineers to be thoughtful, quirky, and willing to fearlessly try new things. Failure is embraced at PDL as long as we continue to learn and grow from it.

What You Get to Do

  • Use and develop web crawling technologies to capture and catalog data on the internet
  • Support and improve our web crawling infrastructure
  • Structure, define, and model captured data, providing semantic data definition and automate data quality monitoring for data that we crawl
  • Develop new techniques to increase speed, efficiency, scalability, and reliability of web crawls
  • Use big data processing platform to build data pipelines, publish data, and ensure the reliable availability of data that we crawl
  • Work with our data product and engineering team to design and implement new data products with captured data, and enhance and improve upon existing products

The Technical Chops You'll Need

  • 7+ years industry experience with clear examples of strategic technical problem solving and implementation
  • Strong software development architecture and fundamentals for backend applications
  • Solid understanding of browser rendering pipeline, web application architecture (auth, cookies, http request / response)
  • Solid programming experience : strong grasp of object-oriented design and experience building applications using asynchronous programming paradigms (e.g., async / await, event loops, or concurrency libraries)
  • Experience building crawlers
  • Proficient in Linux / Unix command line utilities, Linux system administration, architecture, and resource management
  • Experience evaluating data quality and maintaining consistently high data standards across new feature releases (e.g., consistency, accuracy, validity, completeness)
  • People Thrive Here Who Can

  • Must thrive in a fast paced environment and be able to work independently
  • Can work effectively remotely (able to be proactive about managing blockers, proactive on reaching out and asking questions, and participating in team activities)
  • Strong written communication skills on Slack / Chat and in documents
  • You are experienced in writing data design docs (pipeline design, dataflow, schema design)
  • You can scope and breakdown projects, communicate and collaborate progress and blockers effectively with your manager, team, and stakeholders
  • Some Nice To Haves

  • Degree in a quantitative discipline such as computer science, mathematics, statistics, or engineering
  • Experience as a Red Teamer
  • Experience working in data acquisition
  • Experience in network architecture and how to debug and inspect network traffic (DNS, IPv4, Proxies, Application ports and interfaces; packet capture and analysis)
  • Experience with Apache Spark
  • Experience with SQL, including writing advanced queries (e.g., window functions, CTEs)
  • Experience with streaming data platforms (e.g. Kafka or other pub / sub; Spark streaming or other stream processing)
  • Experience with cloud computing services (AWS (preferred), GCP, Azure or similar)
  • Experience working in Databricks (including delta live tables, data lakehouse patterns, etc.)
  • Knowledge of modern data design and storage patterns (e.g., incremental updating, partitioning and segmentation, rebuilds and backfills)
  • Experience with data warehousing (e.g., Databricks, Snowflake, Redshift, BigQuery, or similar)
  • Understanding of modern data storage formats and tools (e.g., parquet, ORC, Avro, Delta Lake)
  • Our Benefits

  • Stock
  • Competitive Salaries
  • Unlimited paid time off
  • Medical, dental, & vision insurance
  • Health, fitness, and office stipends
  • The permanent ability to work wherever and however you want
  • Comp : $160K - $200K

    People Data Labs does not discriminate on the basis of race, sex, color, religion, age, national origin, marital status, disability, veteran status, genetic information, sexual orientation, gender identity or any other reason prohibited by law in provision of employment opportunities and benefits.

    Qualified Applicants with arrest or conviction records will be considered for Employment in accordance with the Los Angeles County Fair Chance Ordinance for Employers and the California Fair Chance Act.

    Personal Privacy Policy for California Residents

    https : / / www.peopledatalabs.com / pdf / privacy -policy-and-notice.pdf

    serp_jobs.job_alerts.create_a_job

    Senior Data Engineer • San Francisco, CA, US

    Job_description.internal_linking.related_jobs
    • serp_jobs.job_card.promoted
    ServiceNow Business Systems Analyst (7584U), Berkeley IT - 81511

    ServiceNow Business Systems Analyst (7584U), Berkeley IT - 81511

    InsideHigherEdBerkeley, California, United States
    serp_jobs.job_card.full_time
    ServiceNow Business Systems Analyst (7584U), Berkeley IT - 81511.At the University of California, Berkeley, we are dedicated to fostering a community where everyone feels welcome and can thrive.Our...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_variable_days
    • serp_jobs.job_card.promoted
    Data Recovery Engineer - Windows Platform

    Data Recovery Engineer - Windows Platform

    DriveSavers Data RecoveryNovato, CA, US
    serp_jobs.job_card.full_time
    Seeking a candidate with 1-2 years of IT / Desktop Support and troubleshooting experience on the Windows PC platform who is excited to learn the art of data recovery. Associate / Bachelor Degree or eq...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_30
    • serp_jobs.job_card.promoted
    Assistant / Associate / Full Project Scientist - Machine Learning & Data Analytics

    Assistant / Associate / Full Project Scientist - Machine Learning & Data Analytics

    InsideHigherEdBerkeley, California, United States
    serp_jobs.job_card.full_time
    Assistant / Associate / Full Project Scientist - Machine Learning & Data Analytics Advanced Bioimaging Center Department of Molecular and Cell Biology. The UC academic salary scales set the minimum pa...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_variable_days
    • serp_jobs.job_card.promoted
    Software Engineer, Data Infrastructure

    Software Engineer, Data Infrastructure

    OpenAISan Francisco, CA, United States
    serp_jobs.job_card.full_time
    Data Platform at OpenAI owns the foundational data stack powering critical product, research, and analytics workflows.We operate some of the largest Spark compute fleets in production; design, and ...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_variable_days
    • serp_jobs.job_card.promoted
    Distinguished AI Engineer

    Distinguished AI Engineer

    Capital OneRichmond, CA, United States
    serp_jobs.job_card.part_time
    Bachelor's degree in Computer Science, AI, Electrical Engineering, Computer Engineering, or related fields plus at least 8 years of experience developing AI and ML algorithms or technologies, or a ...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_variable_days
    • serp_jobs.job_card.promoted
    Machine Learning Engineer / AI

    Machine Learning Engineer / AI

    NLP PEOPLERedwood City, CA, United States
    serp_jobs.job_card.temporary
    Dice is the leading career destination for tech experts at every stage of their careers.Our client, Spotline, is seeking the following. Position : Machine Learning Engineer / AI.Location : Redwood Cit...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_30
    • serp_jobs.job_card.promoted
    Senior Manufacturing Engineer - Tooling and Automation

    Senior Manufacturing Engineer - Tooling and Automation

    Mainspring Energy, Inc.Menlo Park, CA, United States
    serp_jobs.job_card.full_time
    Senior Manufacturing Engineer - Tooling and Automation.Mainspring Energy is revolutionizing power generation with the world’s most flexible and adaptable onsite power generator, the Mainspring Line...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_30
    • serp_jobs.job_card.promoted
    • serp_jobs.job_card.new
    Senior Machine Learning Engineer

    Senior Machine Learning Engineer

    MeltwaterRedwood City, CA, United States
    serp_jobs.job_card.full_time
    Meltwater's Consumer Intelligence AI Team is looking for a.Natural Language Processing or Computer Vision features relying on the literature's state of the art. Those features are meant to be integr...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_variable_hours
    • serp_jobs.job_card.promoted
    Data Engineer

    Data Engineer

    The Rockridge GroupEmeryville, CA, US
    serp_jobs.job_card.full_time
    Google Search console experience required.Google Tag Manager, merchant account or data studio experience preferred.Facebook knowledge will be big plus. Very proficient with installation, data interr...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_30
    • serp_jobs.job_card.promoted
    Data Architect

    Data Architect

    Santa Clara UniversitySanta Clara, CA, United States
    serp_jobs.job_card.full_time
    POSITION PURPOSE • • • Design and maintenance of Lakehouse and Prism environments for all functional areas.Design, Development, Deployment and Maintenance of physical databases, dimensional data model...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_variable_days
    • serp_jobs.job_card.promoted
    Compute Infrastructure Strategy Lead

    Compute Infrastructure Strategy Lead

    OpenAISan Francisco, CA, United States
    serp_jobs.job_card.full_time
    Compute Infrastructure Strategy Lead.The Industrial Compute team builds and operates the infrastructure behind OpenAI’s research and products. We design for scale, performance, and adaptability—brid...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_variable_days
    • serp_jobs.job_card.promoted
    • serp_jobs.job_card.new
    Informatica MDM Developer

    Informatica MDM Developer

    Vedan TechnologiesSan Mateo, CA, US
    serp_jobs.job_card.temporary
    Title : Informatica MDM Developer.Master Data Management (MDM) solutions using Informatica, with strong expertise in the Life Sciences Commercial domain. Design, configure, and implement Informatica....serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_variable_hours
    • serp_jobs.job_card.promoted
    Data Architect

    Data Architect

    I did my part and supported the Regular ToiletSan Jose, CA, United States
    serp_jobs.job_card.full_time
    IBM is hiring a Data Architect in San Jose!.IBM is hiring a Data Architect in San Jose.This job was posted more than 6 months ago. Find new data scientist, data engineering, and machine learning job...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_variable_days
    • serp_jobs.job_card.promoted
    • serp_jobs.job_card.new
    Principal / Senior AI and Machine Learning Scientist at Eluvio AI Labs

    Principal / Senior AI and Machine Learning Scientist at Eluvio AI Labs

    EluvioBerkeley, CA, United States
    serp_jobs.job_card.full_time
    Eluvio is a highly focused and expert team of systems, networking, application, and video software engineers, AI scientists, ML engineers, and security specialists working together to implement the...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_variable_hours
    • serp_jobs.job_card.promoted
    Power BI Developer

    Power BI Developer

    MediumSan Francisco, CA, United States
    serp_jobs.job_card.full_time
    Employment Type : Full-Time, Mid-level.Department : Information Technology.CGS is seeking a Power BI Developer to join our team in supporting a wide-ranging technical support initiative for a large F...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_variable_days
    • serp_jobs.job_card.promoted
    Senior Software Development Engineer in Test - 38649

    Senior Software Development Engineer in Test - 38649

    Informatica Corp.Redwood City, CA, United States
    serp_jobs.job_card.full_time
    Build Your Career at Informatica.We seek innovative thinkers who believe in the power of data to drive meaningful change. At Informatica, we welcome adventurous, work-from-anywhere minds eager to ta...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_variable_days
    • serp_jobs.job_card.promoted
    37F PsyOp Specialist

    37F PsyOp Specialist

    U.S. ArmyAlbany, CA, US
    serp_jobs.job_card.permanent
    As a Psychological Operations Specialist, you'll be an expert at persuasion.You'll assess and develop the information needed to influence and engage specific audiences. You'll broadcast important in...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_variable_days
    • serp_jobs.job_card.promoted
    Software Engineer II, Data Engineering & Infrastructure

    Software Engineer II, Data Engineering & Infrastructure

    Australian Competition and Consumer CommissionSan Francisco, CA, United States
    serp_jobs.job_card.full_time
    Aurora’s mission is to deliver the benefits of self-driving technology safely, quickly, and broadly.The Aurora Driver will create a new era in mobility and logistics, one that will bring a safer, m...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_variable_days