Data Engineer (Python, Data Systems & AI Enablement

KEY CONNECT RECRUITMENT PTE. LTD.

Job Title

Data Engineer (Python, Data Systems & AI Enablement)

Role Overview

Python-focused Data Engineer with strong hands-on coding skills in data-intensive systems. The role focuses on building scalable data pipelines, processing large datasets, and enabling AI/Generative AI applications through well-structured data infrastructure.

Key Responsibilities

  • Build and maintain scalable data pipelines using Python
  • Write production-grade Python code specifically for data processing, transformation, and ETL workflows
  • Perform data cleaning, preprocessing, and feature preparation for analytics and AI use cases
  • Use data analysis and manipulation tools to handle large datasets efficiently
  • Develop reusable Python modules for data ingestion and pipeline automation
  • Perform exploratory data analysis (EDA) to understand data patterns and quality issues
  • Optimize data workflows for performance, scalability, and reliability
  • Support data requirements for AI/ML and Generative AI systems
  • Build data services and APIs to support downstream AI applications
  • Ensure data quality, consistency, and observability across pipelines

Required Python & Data Libraries (Hands-on Experience Mandatory)

Candidates must have strong practical experience with:

  • pandas — data manipulation, transformation, and analysis
  • NumPy — numerical operations and array-based processing
  • Matplotlib — data visualization and reporting
  • scikit-learn — basic ML workflows and model evaluation
  • PyTorch — deep learning and AI model experimentation

AI / Generative AI Enablement

  • Prepare and structure datasets for ML and LLM-based systems
  • Support integration of AI models into data pipelines and applications
  • Enable workflows for Generative AI use cases (RAG systems, agent workflows)
  • OpenAI
  • Anthropic
  • LLaMA
  • Mistral
  • Exposure to AI orchestration frameworks such as LangChain, AutoGen, and CrewAI

Core Requirements

  • Strong hands-on Python coding expertise focused on data systems (critical requirement)
  • Ability to write clean, efficient, production-grade Python code
  • Strong understanding of data structures, ETL pipelines, and data workflows
  • Experience working with large-scale structured and unstructured data
  • Strong SQL skills for data extraction and manipulation
  • Understanding of data modeling and analytics workflows
  • Ability to support end-to-end data-to-AI pipelines

Preferred / Good to Have

  • Experience with big data or distributed processing systems
  • Understanding of vector databases and embedding-based retrieval systems
  • Experience building APIs or services for data/AI systems
  • Familiarity with cloud platforms (AWS, Azure, GCP)
  • Exposure to production monitoring and data observability tools

What Success Looks Like

  • High-quality Python code powering scalable data pipelines
  • Reliable, clean, and well-structured datasets for AI systems
  • Efficient ETL workflows with minimal manual intervention
  • Seamless support for ML and GenAI applications in production