Search jobs > San Francisco, CA > Software engineer data
Overview :
The Data Acquisition team within the Pre-training organization at OpenAI is responsible for all aspects of data collection to support our model training operations.
Our team manages web crawling and GPTBot services and works closely with Data Processing, Architecture, and Scaling teams.
We are looking for a skilled Senior Software Engineer to join our Data Acquisition team.
Responsibilities :
Own and lead engineering projects in the area of data acquisition including web crawling, data ingestion, and search.
Collaborate with other sub-teams, such as Data Processing, Architecture, and Scaling, to ensure smooth data flow and system operability.
Work closely with the legal team to handle any compliance or data privacy-related matters.
Develop and deploy highly scalable distributed systems capable of handling petabytes of data.
Architect and implement algorithms for data indexing and search capabilities.
Build and maintain backend services for data storage, including work with key-value databases and synchronization.
Deploy solutions in a Kubernetes Infrastructure-as-Code environment and perform routine system checks.
Conduct and analyze experiments on data to provide insights into system performance.
Qualifications :
BS / MS / PhD in Computer Science or a related field.
5+ years of industry experience in software development.
Experience with large web crawlers a plus
Strong expertise in large stateful distributed systems and data processing.
Proficiency in Kubernetes, and Infrastructure-as-Code concepts.
Willingness and enthusiasm for trying new approaches and technologies.
Ability to handle multiple tasks and adapt to changing priorities.
Strong communication skills, both written and verbal.
Staff Software Engineer (Data)
Embrace the pivotal role of a Staff Software Engineer within our Data Platforms team, where your expertise will become the backbone of our cutting-edge analytics and data processing capabilities. Partner with senior engineers, architects, and product owners to build scalable data pipelines and servi...
Staff Software Engineer - Distributed Data Systems
As a software engineer on the Runtime team at Databricks, you will be building the next generation distributed data storage and processing systems that can outperform specialized SQL query engines in relational query performance, yet provide the expressiveness and programming abstractions to support...
Software Engineer III, Full Stack, Google Cloud Data Management
Proficiency in code and system health, diagnosis and resolution, and software test engineering. Google's software engineers develop the next-generation technologies that change how billions of users connect, explore, and interact with information and one another. We're looking for engineers who brin...
Senior Software Engineer - Data Platform
The central Data Platform seeks to build a self-service tooling platform to make the petabytes of data at Discord easily accessible for everyone at the company. Our tooling covers the end-to-end lifecycle of data from acquisition to consumption. Reporting to the Engineering Manager of Data Products,...
Principal Software Engineer, Data Platform
A minimum of 10 years in a pivotal Software/Data Engineering role, with extensive experience in modern data stacks, particularly Snowflake, Airflow, dbt, Kafka, Docker/k8s, and AWS data services. The Data Platform Group supports data use cases across all of SoFi's diverse business units by providing...
Software Engineer - Data Infrastructure
As a Data Infrastructure Engineer at Notable, you will have the opportunity to help us continue to rapidly scale our platform and infrastructure, interfacing directly with our engineering teams. Customers use Notable to drive patient acquisition, retention, and reimbursement, scaling growth without ...
Senior Software Engineer - Distributed Data Systems
As a software engineer on the Runtime team at Databricks, you will be building the next generation distributed data storage and processing systems that can outperform specialized SQL query engines in relational query performance, yet provide the expressiveness and programming abstractions to support...
Software Engineer - Data Production
Software Engineer (Data Production Team). The Data Production team is made up of urbanist-minded engineers who are passionate about evolving our product to address critical economic, equity, and sustainability challenges. Develop and maintain the data infrastructure and services that are the foundat...
Senior Fullstack Software Engineer, Frontier Data
Our Generative AI Data Engine powers the world's most advanced LLMs and generative models through world-class RLHF (Reinforcement Learning with Human Feedback), human data generation, model evaluation, safety, and alignment. The Frontier Data team is a new product team that focuses on building datas...
Senior Staff Software Engineer, Data and ML Platform
You will collaborate with cross-functional teams, including data scientists, software engineers, MLEs and product managers, to deliver modern and bleeding-edge solutions that drive business insights and innovation. Design and implement data storage solutions, including relational databases, data lak...