Python Coder

Urgent

Job Description

Hiring: Data Engineer – Python Expert (Freelance Role)
Employment Type
Contract
Freelance
Work Mode
Remote
Hybrid
Position

Data Engineer – Python Expert

Role Overview

The company is seeking a Senior Data Engineer to design and manage scalable data pipelines that support Large Language Model (LLM) development and AI training workflows.

This role focuses on:

Large-scale Data Engineering
ETL / ELT Pipeline Development
Data Quality & Processing
AI Training Data Preparation
Collaboration with ML & AI Teams
Key Responsibilities
Data Pipeline Architecture
Design and build scalable ETL / ELT pipelines in Python
Process terabyte-scale datasets efficiently
Automate data ingestion and transformation workflows
Data Quality & Processing
Implement:
Data Cleaning
Deduplication
Filtering
Normalization
Define and enforce data quality standards for AI training datasets
Data Transformation
Structure and optimize datasets for LLM training frameworks
Handle formats such as:
JSON
CSV
XML
Parquet
Collaboration with AI Teams
Work closely with:
AI Researchers
ML Engineers
Support:
Training Data Preparation
Data Metrics
Model Training Lifecycle
Optimization & Reliability
Improve pipeline:
Speed
Reliability
Cost Efficiency
Troubleshoot data-related issues during training runs
ML Support (Secondary)
Assist with:
Training Run Monitoring
Debugging Data Pipelines
Experiment Support
Required Qualifications
Experience
8+ Years in:
Data Engineering
Backend Engineering
Data Processing
Python Expertise

Strong hands-on expertise with:

Python
Pandas
NumPy
Dask
Polars
Data Engineering Skills
Large-scale Data Pipeline Development
Data Modeling
Software Engineering Best Practices
Git / CI-CD / Testing
Data Formats

Experience handling:

JSON
CSV
XML
Parquet
Soft Skills
Problem Solving
Attention to Detail
Communication & Collaboration Skills
Preferred Qualifications
LLM & AI Ecosystem

Experience with:

LLaMA
BERT
GPT-family Models
Big Data Technologies
Apache Spark
Ray
Hugging Face Ecosystem
Transformers
Datasets
Tokenizers
ML Frameworks
PyTorch
TensorFlow
Cloud Platforms
AWS
GCP
Azure
Preferred Candidate Profile

Suitable for candidates experienced in:

AI Data Engineering
LLM Training Pipelines
Big Data Systems
ML Infrastructure
Cloud-native Data Platforms
Why Join
Opportunity to work on cutting-edge AI & ML projects
Collaborative engineering environment
Flexible freelance engagement
Continuous learning opportunities

Location