Senior Data Engineer NLP, Machine Learning
All potential applicants are encouraged to scroll through and read the complete job description before applying.
Location : New York, NY
Skills : NLP, Machine Learning
Leadership Responsibilities
- Lead data engineers in 1-1s, regular professional development and performance reviews in a center of excellence context.
- Partner with head of data strategy establishing a best-in-class data engineering competency and tooling within the organization.
- Contribute to the overall data strategy vision and execution via quarterly planning and executive committee reporting.
- Participate and improve in the data engineering recruiting process from resume reviews and phone screens to on-site interviews.
Data Engineer Responsibilities
- Develop, implement, and deploy custom data pipelines powering machine learning algorithms, insights generation, client benchmarking tools, business intelligence dashboards, reporting, and new data products.
- Innovate new ways to leverage enormous amounts of various datasets to drive revenues via the development of new products with the Data Strategy team, as well as the enhanced delivery of existing products.
- Consume data from a variety of sources (relational DBs, APIs, NetApp and other cloud storage, FTPs) & formats (Excel, CSV, XML, Parquet, unstructured).
- Construct and maintain data pipelines between GC’s databases and other sources, with the data lake utilizing modern ETL frameworks.
- Own the role of data steward for a variety of high-value datasets and implement innovative quality assurance practices.
- Establish and implement data security and privacy standards to ensure internal and regulatory compliance.
- Establish and implement metadata management standards and capabilities, including lineage mapping.
- Establish and maintain strong relationships with internal clients as an engineering representative for data strategy.
- Enforce strong development standards across the team through code reviews, unit testing, and monitoring.
- Perform basic data analysis within Jupyter Notebooks to validate the fulfillment of requirements for data pipelines.
- Evangelize data strategy techniques and best practices throughout global strategic advisory.
- Keep up-to-date on the latest trends and innovation in data technology and how these trends apply to GC's business and data strategy.
Required Qualifications
- 5-8+ years of relevant experience as a data engineer or in a similar role.
- Master’s Degree or PhD in data science, computer science, or related quantitative field such as applied mathematics, statistics, engineering, or operations research.
- Extensive experience with Spark, Python, JSON, and SQL.
- Extensive experience integrating data from semi-structured and unstructured sources.
- Knowledge of various industry-leading SQL and NoSQL database systems.
- Experience working in an Agile environment to facilitate the quick and effective fulfillment of group goals.
- Good interpersonal skills for establishing and maintaining good internal relationships, working well as part of a team and for presentations and discussions.
- Strong analytical skills and intellectual curiosity as demonstrated through academic experience or work assignments.
- Ability to communicate technical concepts to a non-technical audience.
- Excellent English verbal and writing skills for complex communications with GC colleagues at all levels of the organization.
- Good ability to prioritize workload according to volume, urgency, etc., and to deliver on required projects in a timely fashion.
- Desire to take ownership of proposed and assigned tasks and to seek assistance when needed.
Preferred Qualifications
- Strong understanding of entity resolution, streaming technologies, and ELT / ETL frameworks.
- Ability to articulate the advantages of various cloud and on-premises deployment options.
- Experience with Master Data Management.
- Experience with web scraping and crowd sourcing technologies.
- Familiarity with modern data productivity frameworks and their alternatives such as Databricks, DataRobot, and Alteryx.
J-18808-Ljbffr
1 day ago