About This Role
India faces a shortage of 230,000+ data science professionals by 2026, but the acute gap is in data engineering, not data science. GCCs can hire ML engineers, but the data pipelines that feed their models are broken. This role is for a data engineer who combines modern stack proficiency (dbt, Spark, Airflow, cloud-native warehouses) with the ability to design data architectures that support both analytical and ML workloads at enterprise scale.
What You Will Do
- Design and build scalable, production-grade data pipelines for both batch and real-time workloads using Apache Spark, Kafka, and cloud-native tools
- Own the ELT/ETL architecture using dbt, Airflow (or Prefect/Dagster), and cloud data warehouses (Snowflake, BigQuery, Databricks Delta Lake)
- Build and maintain feature stores for ML teams, Feast, Tecton, or custom, ensuring consistent feature availability between training and serving
- Implement data quality frameworks, Great Expectations, Monte Carlo, or Soda, with automated monitoring and alerting
- Design data lakehouse architectures using Delta Lake or Apache Iceberg, time travel, ACID transactions, schema evolution
- Build real-time data streaming pipelines using Kafka, Kinesis, or Pub/Sub for low-latency ML inference serving
- Implement data governance and lineage tracking, dbt docs, OpenLineage, Apache Atlas, or Collibra
- Collaborate closely with ML engineers on feature engineering, training data pipelines, and model monitoring data infrastructure
- Define and enforce data engineering best practices, testing, documentation, version control, CI/CD for data pipelines
What You Need to Succeed
- 5+ years of data engineering experience with production pipeline ownership
- Strong proficiency in Python and SQL for data transformation and pipeline development
- Modern data stack expertise, dbt, Airflow/Prefect, and at least one cloud data warehouse (Snowflake, BigQuery, Databricks)
- Apache Spark experience for large-scale data processing
- Real-time streaming experience with Kafka or equivalent
- Cloud data platform experience, AWS (Glue, S3, Redshift), Azure (Data Factory, ADLS, Synapse), or GCP (Dataflow, BigQuery)
- Data quality and observability tooling experience
- Strong understanding of data modelling, dimensional modelling, Data Vault, or medallion architecture
What Will Give You an Edge
- Feature store implementation experience for ML pipelines
- Delta Lake or Apache Iceberg expertise
- Data governance and lineage tooling experience
- Experience with LLM data pipelines, chunking, embedding, vector database ingestion
- dbt certification or Databricks certifications
What Qfyre Offers
- Data engineering ownership on AI-critical infrastructure, your pipelines power the models
- Modern toolstack with no legacy constraints, dbt, Spark, cloud-native from Day 1
- Remote-first flexibility with strong engineering peer community
- Competitive compensation reflecting acute market scarcity for this profile
Skills and Technologies
Apply for Senior Data Engineer, Modern Data Stack and AI Pipelines
Complete the form below. Our team reviews every application personally, no automated filtering, no keyword matching. We will be in touch within two business days.