Xpanse was established with the goal of increasing access to home ownership for a broader audience. Since our launch, we've been dedicated to simplifying the mortgage lending ecosystem by building innovative software solutions.
We view home ownership as a core component of the 'American Dream,' and our products play a key role in transforming that dream into reality.
We are seeking a skilled Data Engineer to join our team in building advanced data pipelines and infrastructure to support Generative AI (GenAI) applications.
You will play a key role in designing and implementing scalable data systems that enable the efficient processing and storage of large datasets, ensuring that our AI models have access to clean, well-structured, and high-quality data.
This is an opportunity to contribute to the backbone of AI-driven solutions, such as call summarization, data compliance, and predictive analytics.
Job Requirements :
- Data Pipeline Development : Design, build, and maintain robust and scalable data pipelines that support GenAI applications, including real-time and batch data processing.
- Data Integration : Work closely with data scientists, machine learning engineers, and software developers to integrate structured and unstructured data from various sources, ensuring it
s clean and ready for use in AI models.
- Cloud Infrastructure : Deploy and manage data storage solutions on AWS, Azure, GCP, and Snowflake, optimizing for scalability, performance, and cost-efficiency.
- Data Quality Management : Implement and monitor data quality checks to ensure data accuracy, completeness, and consistency across the entire data pipeline.
- ETL Processes : Design and manage ETL (Extract, Transform, Load) processes that efficiently handle large datasets from different data sources, transforming data into formats suitable for analysis.
- Collaboration : Work with cross-functional teams to understand business requirements and translate them into efficient data models and pipelines.
- Performance Optimization : Continuously monitor and optimize data storage, retrieval, and transformation processes to ensure high performance and low latency.
- Security & Compliance : Ensure data pipelines meet security and regulatory requirements, especially in areas like data privacy and compliance in financial services.
Qualifications :
- Bachelors or Masters degree in Computer Science, Engineering, or a related field.
- 3+ years of experience in data engineering or a related role, with a strong focus on building and managing data pipelines.
- Expertise in data pipeline orchestration tools such as Apache Airflow, AWS Glue, Dataflow, or similar technologies.
- Proficiency in working with ETL frameworks and building data transformation processes.
- Strong experience with SQL and NoSQL databases, and knowledge of modern data storage solutions like Snowflake, S3, Redshift, BigQuery, or similar.
- Cloud Platform Experience : Strong experience working on cloud platforms (AWS, Azure, GCP) and familiarity with managing data infrastructure in a cloud-native environment.
- Knowledge of Python, Scala, or Java for data manipulation and automation tasks.
- Strong understanding of data security, governance, and compliance best practices.
- Experience working with large-scale data systems supporting AI / ML models, including vector databases and unstructured data management, preferred.
- Familiarity with MLOps and integrating data pipelines into machine learning workflows, preferred.
- Experience with real-time data processing tools like Kafka or Kinesis, preferred.
- Experience optimizing data architecture to support large datasets for AI / ML-driven applications, preferred.
- Snowflake expertise : Extensive experience building and optimizing data models, managing data workflows, and ensuring scalability within Snowflake, preferred.
- Strong problem-solving skills and the ability to work in a fast-paced, collaborative environment, preferred.