Title : Data Engineer
Location : Remote
Duration : (3-Month Contract)
This is a non-exempt position.
Project : Supplier Contract Ingestion & Data Pipeline for Negotiation AI
About the Project
We're launching a focused 3-month initiative to :
1. Bulk-ingest over 50,000 supplier contracts into SAP Ariba, with metadata extraction powered by OCR.
2. Design and implement the database architecture and data flows that will feed our Negotiation AI-including contract detail extraction and supplier spend analytics.
This work currently runs separately from the Negotiation AI MVP, but must be future-ready for seamless integration.
Role Overview
As our Data Engineer, you will own the end-to-end data pipelines. This includes designing scalable databases, developing ingestion workflows, collaborating with our internal Machine Learning Engineering team, and structuring supplier spend data. You'll work closely with the Full Stack Developer to co-design the database schema for the Negotiation AI and ensure future compatibility with the ingestion pipeline.
Key Deliverables
- Ingestion Pipeline : Build and deploy a robust ETL / ELT pipeline using Azure to ingest 50,000+ contracts.
- Metadata Extraction : Configure and run OCR workflows (e.g., OlmOCR / Azure Document Intelligence) to extract key contract fields such as dates, parties, terms etc.
- Scalable Database Schema : Design and implement a schema in Azure PostgreSQL to store contract metadata, OCR outputs, and supplier spend data. Collaborate with the Software Developer to design a future-ready schema for AI consumption.
Required Skills & Experience
Data Engineering & ETL / ELT
Experience with Azure PostgreSQL or similar relational databasesSkilled in building scalable ETL / ELT pipelines (preferably using Azure)Proficient in Python for scripting and automationOCR Collaboration
Ability to work with internal Machine Learning Engineering teams to validate and structure extracted dataFamiliarity with OCR tools (e.g., Azure Document Intelligence, Tesseract) is a plusSAP Ariba Integration
Exposure to cXML, ARBCI, SOAP / REST protocols is a plusComfortable with API authentication (OAuth, tokens) and enterprise-grade securityAgile Collaboration & Documentation
Comfortable working in sprints and cross-functional teamsAble to use Github Copilot to document practices for handoverPreferred Qualifications
Experience with large-scale contract ingestion projectsFamiliarity with procurement systems and contract lifecycle managementBackground in integrating data pipelines with AI or analytics platformsWhy Join Us?
Focused Scope with Future Impact : Client the foundation for an AI-driven negotiation platformCutting-Edge Tools : Work with SAP Ariba, OCR, Azure, and advanced analyticsCollaborative Environment : Partner with Software Developers and AI specialistsData Engineer (3-Month Contract)
Project : Supplier Contract Ingestion & Data Pipeline for Negotiation AI
About the Project
We're launching a focused 3-month initiative to :
Bulk-ingest over 50,000 supplier contracts into SAP Ariba, with metadata extraction powered by OCR.Design and implement the database architecture and data flows that will feed our Negotiation AI-including contract detail extraction and supplier spend analytics.This work currently runs separately from the Negotiation AI MVP, but must be future-ready for seamless integration.
Role Overview
As our Data Engineer, you will own the end-to-end data pipelines. This includes designing scalable databases, developing ingestion workflows, collaborating with our internal Machine Learning Engineering team, and structuring supplier spend data. You'll work closely with the Full Stack Developer to co-design the database schema for the Negotiation AI and ensure future compatibility with the ingestion pipeline.
Key Deliverables
Ingestion Pipeline : Build and deploy a robust ETL / ELT pipeline using Azure to ingest 50,000+ contracts.Metadata Extraction : Configure and run OCR workflows (e.g., OlmOCR / Azure Document Intelligence) to extract key contract fields such as dates, parties, terms etc.Scalable Database Schema : Design and implement a schema in Azure PostgreSQL to store contract metadata, OCR outputs, and supplier spend data. Collaborate with the Software Developer to design a future-ready schema for AI consumption.Required Skills & Experience
Data Engineering & ETL / ELT
Experience with Azure PostgreSQL or similar relational databasesSkilled in building scalable ETL / ELT pipelines (preferably using Azure)Proficient in Python for scripting and automationOCR Collaboration
Ability to work with internal Machine Learning Engineering teams to validate and structure extracted dataFamiliarity with OCR tools (e.g., Azure Document Intelligence, Tesseract) is a plusSAP Ariba Integration
Exposure to cXML, ARBCI, SOAP / REST protocols is a plusComfortable with API authentication (OAuth, tokens) and enterprise-grade securitygile Collaboration & Documentation
Comfortable working in sprints and cross-functional teamsAble to use Github Copilot to document practices for handoverPreferred Qualifications
Experience with large-scale contract ingestion projectsFamiliarity with procurement systems and contract lifecycle managementBackground in integrating data pipelines with AI or analytics platformsWhy Join Us?
Focused Scope with Future Impact : Client the foundation for an AI-driven negotiation platformCutting-Edge Tools : Work with SAP Ariba, OCR, Azure, and advanced analyticsCollaborative Environment : Partner with Software Developers and AI specialistsFor the interviews to set expectations
Agile Sprint Breakdown
Sprint 1 (Weeks 1-2) : Database & OCR Foundations
Design scalable schema in Azure PostgreSQL for contract metadata and spend dataConfigure Azure Data Factory, Blob Storage, and CI / CD for pipeline deploymentBuild proof-of-concept pipeline for ingesting 500 contracts with OCR-based metadata extractionCollaborate with OCR team to validate extracted fields (e.g., contract dates, parties, spend amounts)Begin schema design collaboration with Software Developer for Negotiation AISprint 2 (Weeks 3-5) : Scaling & Ariba Integration
Scale pipeline to handle 50,000+ contracts with error-handling and retry logicIngest supplier spend data and link it to contract recordsBuild / refine SAP Ariba integration scripts (if applicable)Implement robust error-reporting and logging for Ariba API callsSprint 3 (Weeks 6-8) : AI Data Alignment & UAT
Finalize data models for Negotiation AI (e.g., legal terms, renewal triggers, spend patterns)Collaborate with Software Developer to document real-time or scheduled data accessOptimize database queries and indexing for performanceConduct UAT with stakeholders to validate ingestion and data qualitySprint 4 (Weeks 9-12) : Finalization & Handover
Complete full-scale ingestion and verify metadata extraction accuracyResolve final integration issues and address UAT feedbackDocument workflows, schemas, dependencies, and maintenance stepsDeliver performance report (throughput, error rates, data quality metrics)