Data Engineer (Python, Data Systems & AI Enablement

KEY CONNECT RECRUITMENT PTE. LTD.

Job Title

Data Engineer (Python, Data Systems & AI Enablement)

Role Overview

Python-focused Data Engineer with strong hands-on coding skills in data-intensive systems. The role focuses on building scalable data pipelines, processing large datasets, and enabling AI/Generative AI applications through well-structured data infrastructure.

Key Responsibilities

Build and maintain scalable data pipelines using Python
Write production-grade Python code specifically for data processing, transformation, and ETL workflows
Perform data cleaning, preprocessing, and feature preparation for analytics and AI use cases
Use data analysis and manipulation tools to handle large datasets efficiently
Develop reusable Python modules for data ingestion and pipeline automation
Perform exploratory data analysis (EDA) to understand data patterns and quality issues
Optimize data workflows for performance, scalability, and reliability
Support data requirements for AI/ML and Generative AI systems
Build data services and APIs to support downstream AI applications
Ensure data quality, consistency, and observability across pipelines

Required Python & Data Libraries (Hands-on Experience Mandatory)

Candidates must have strong practical experience with:

pandas — data manipulation, transformation, and analysis
NumPy — numerical operations and array-based processing
Matplotlib — data visualization and reporting
scikit-learn — basic ML workflows and model evaluation
PyTorch — deep learning and AI model experimentation

AI / Generative AI Enablement

Prepare and structure datasets for ML and LLM-based systems
Support integration of AI models into data pipelines and applications
Enable workflows for Generative AI use cases (RAG systems, agent workflows)
OpenAI
Anthropic
LLaMA
Mistral
Exposure to AI orchestration frameworks such as LangChain, AutoGen, and CrewAI

Core Requirements

Strong hands-on Python coding expertise focused on data systems (critical requirement)
Ability to write clean, efficient, production-grade Python code
Strong understanding of data structures, ETL pipelines, and data workflows
Experience working with large-scale structured and unstructured data
Strong SQL skills for data extraction and manipulation
Understanding of data modeling and analytics workflows
Ability to support end-to-end data-to-AI pipelines

Preferred / Good to Have

Experience with big data or distributed processing systems
Understanding of vector databases and embedding-based retrieval systems
Experience building APIs or services for data/AI systems
Familiarity with cloud platforms (AWS, Azure, GCP)
Exposure to production monitoring and data observability tools

What Success Looks Like

High-quality Python code powering scalable data pipelines
Reliable, clean, and well-structured datasets for AI systems
Efficient ETL workflows with minimal manual intervention
Seamless support for ML and GenAI applications in production