AWS Data Engineer with EMR clusters Piscataway, NJ 12 months Visa : Any visa independent. Data Pipeline Development : Design and implement robust ETL processes to extract, transform, and load data from various sources into data lakes and warehouses.
AWS EMR Clusters : Configure, manage, and optimize Amazon EMR clusters for big data processing using Apache Spark, Hive, or Presto.
Container Orchestration : Utilize Kubernetes for deploying, scaling, and managing containerized applications and services.
CI / CD Implementation : Develop and maintain CI / CD pipelines for automated deployment of data applications and services using tools like Jenkins, GitLab CI, or AWS CodePipeline.
SQL Development : Write complex SQL queries for data manipulation and retrieval, ensuring high performance and scalability.
Data Quality and Governance : Implement data quality checks, monitoring, and logging mechanisms to ensure data reliability and compliance.
Collaboration : Work closely with data scientists, analysts, and other stakeholders to understand data requirements and deliver solutions that meet business needs.
Documentation : Maintain comprehensive documentation of data architecture, processes, and workflows. Qualifications : Education : Bachelor'
s degree in Computer Science, Data Science, Information Technology, or a related field. Experience : 3+ years of experience in data engineering or a related role, with a focus on AWS technologies.
AWS Expertise : Proficient in AWS services such as EMR, S3, RDS, Redshift, Lambda, and CloudFormation. Kubernetes Knowledge : Experience with Kubernetes for container orchestration and microservices architecture.
CI / CD Tools : Familiarity with CI / CD tools and practices, including version control using Git. SQL Proficiency : Strong knowledge of SQL, with experience in relational databases (e.
g., MySQL, PostgreSQL) and data warehousing solutions. Programming Skills : Proficiency in programming languages such as Python, Java, or Scala for data processing tasks.
Problem-Solving Skills : Ability to troubleshoot complex data issues and optimize performance. Communication : Excellent communication skills with the ability to work collaboratively in a team environment.