Job Title: Data Engineer – BigQuery & Cloud Composer (GCP)
Location: Gurugram
About the Role
We’re looking for a hands-on Data Engineer with strong expertise in Google
BigQuery and
Cloud Composer (Airflow on GCP) to build, orchestrate, and optimize robust
data
pipelines. The candidate will work across file integrations (XLSX/CSV/JSON),
GCS-based
ingestion/export, BigQuery SQL & Stored Procedures, and Python
development for
Composer DAGs, including API integrations and custom operators. The ideal
candidate is
equally comfortable designing scalable data models and writing
production-grade
orchestration code with reliability, lineage, and cost-efficiency in mind.
Key Responsibilities
Pipeline Design & Orchestration
o Design, develop, and manage Cloud Composer (Airflow) DAGs for batch
and near-real-time workloads with robust scheduling, retries, SLAs, and
dependency management.
o Build custom Python operators/sensors/hooks for APIs (REST/JSON),
file processing, and event-driven workflows.
BigQuery Engineering
o Develop ELT/ETL transformations using standard SQL, Stored
Procedures, User-Defined Functions (UDFs), and views/materialized
views.
o Implement performance best practices: partitioning, clustering, pruning,
query optimization, storage optimization, and cost controls.
File & GCS Integrations
o Ingest and process CSV/JSON/XLSX files from/to Google Cloud Storage
(GCS) with schema management, validations, and error handling.
o Implement data export/import pipelines between BigQuery, GCS, and
external systems; manage lifecycle policies and storage tiers.
API & Custom Code Development
o Build Python-based integrations for third-party and internal APIs
(OAuth2/API keys), pagination, throttling, and incremental loads.
o Create reusable libraries/utilities for common data processing tasks
(parsing, normalization, deduplication, enrichment).
DevOps & Delivery
o Use CI/CD (Cloud Build/GitHub Actions/GitLab CI) for DAGs and SQL assets,
environment promotion, and versioning.
o Collaborate with Data Analysts/Scientists and Product teams; document
pipelines, runbooks, and data contracts.
Required Qualifications
3–7 years of experience in Data Engineering with GCP, focusing on BigQuery
and
Cloud Composer (Airflow).
Strong SQL skills and experience with BigQuery Stored Procedures, UDFs, and
performance tuning (partitioning, clustering, join strategies).
Proficiency in Python for data processing (pandas, pyarrow, openpyxl/xlrd)
and
Composer development (operators, sensors, hooks, XComs).
Hands-on with GCS: reading/writing large datasets, lifecycle configuration,
permissions (IAM), and storage cost optimization.
Experience integrating files (XLSX/CSV/JSON) and REST APIs (auth,
pagination,
error/retry patterns).
Solid understanding of Airflow DAG design: idempotency, backfills,
scheduling,
SLAs, retries, task isolation, and parallelism.
Knowledge of testing (unit/integration), data quality checks, and CI/CD for
data
pipelines.
Nice-to-Have
Understanding about SAP S4, Ariba, Fieldglass, Siebel and Kenan data
ETL Tool (SAP Data Services/Striim) development experience
Infra-as-code (Terraform) for Composer/BigQuery/GCS provisioning.
Performance/cost optimization at scale (slot management, reservations,
storage
tiers).
Experience with SCD, CDC, and incremental processing patterns.
Experience with on-prem to GCP migrations and cross-region strategies.
Skill Matrix
Skill Area Required Expertise Proficiency Level (1-5)
BigQuery SQL & Stored
Procedures
Advanced SQL, UDFs,
optimization, partitioning &
clustering
5
Cloud Composer (Airflow) DAG design, operators,
sensors, retries, custom
hooks
5
Python Development Custom operators, API
integration, data parsing
libraries
5
GCS (Google Cloud Storage) File read/write, schema
mgmt, lifecycle policies
4
File Integrations XLSX/CSV/JSON handling,
ingestion pipelines
4
API Integration Auth, pagination,
incremental loads
4
CI/CD for Data Pipelines Versioning, automation,
promotions
3
Apply through whichever channel suits you best.