Must Have Technical / Functional Skills
Programming & Libraries : Expert-level proficiency in Python and its core data science libraries (Pandas, NumPy, Scikit-learn). Strong proficiency in SQL for complex data extraction and manipulation.
Machine Learning Frameworks : Hands-on experience with modern deep learning frameworks such as TensorFlow or PyTorch.
Statistical Modeling : Deep understanding of statistical concepts and a wide range of machine learning algorithms, with proven experience in time-series forecasting and anomaly detection.
Big Data Technologies : Demonstrable experience working with large datasets using distributed computing frameworks, specifically Apache Spark.
Database Systems : Experience querying and working with data from multiple relational database systems (e.g., PostgreSQL, Oracle, MS SQL Server).
Cloud Platforms : Experience building and deploying data science solutions on a major cloud platform (AWS, GCP, or Azure). Familiarity with their native ML services (e.g., AWS SageMaker, Google Vertex AI) is a strong plus.
MLOps Tooling : Practical experience with MLOps principles and tools for model versioning, tracking, and deployment (e.g., MLflow, Docker).
Communication and Storytelling : Excellent verbal and written communication skills, with a proven ability to explain complex technical concepts to a non-technical audience through visual storytelling.
Roles & Responsibilities
Druid Data Modeling & Schema Design : o Design and implement efficient data schemas, dimensions, and metrics within Apache Druid for various analytical use cases (e.g., clickstream, IoT, application monitoring). o Determine optimal partitioning, indexing (bitmap indexes), and rollup strategies to ensure sub-second query performance and efficient storage.
Data Ingestion Pipeline Development : o Develop and manage real-time data ingestion pipelines into Druid from streaming sources like Apache Kafka, Amazon Kinesis, or other message queues. o Implement batch data ingestion processes from data lakes (e.g., HDFS, Amazon S3, Azure Blob, Google Cloud Storage) or other databases. o Ensure data quality, consistency, and exactly-once processing during ingestion.
Query Optimization & Performance Tuning : o Write and optimize complex SQL queries (Druid SQL) for high-performance analytical workloads, including aggregations, filters, and time-series analysis. o Analyze query plans and identify performance bottlenecks, implementing solutions such as segment optimization, query rewriting, or cluster configuration adjustments. o Programming & Libraries : Expert-level proficiency in Python and its core data science libraries (Pandas, NumPy, Scikit-learn). Strong proficiency in SQL for complex data extraction and manipulation.
Machine Learning Frameworks : Hands-on experience with modern deep learning frameworks such as TensorFlow or PyTorch.
Statistical Modeling : Deep understanding of statistical concepts and a wide range of machine learning algorithms, with proven experience in time-series forecasting and anomaly detection.
Big Data Technologies : Demonstrable experience working with large datasets using distributed computing frameworks, specifically Apache Spark.
Database Systems : Experience querying and working with data from multiple relational database systems (e.g., PostgreSQL, Oracle , MS SQL Server).
Cloud Platforms : Experience building and deploying data science solutions on a major cloud platform (AWS, GCP, or Azure). Familiarity with their native ML services (e.g., AWS SageMaker, Google Vertex AI) is a strong plus.
MLOps Tooling : Practical experience with MLOps principles and tools for model versioning, tracking, and deployment (e.g., MLflow, Docker).
Communication and Storytelling : Excellent verbal and written communication skills, with a proven ability to explain complex technical concepts to a non-technical audience through visual storytelling
Salary Range : $100,000-$120,000 a year
#LI-DM1
Data Scientist • Tampa, FL, United States