AI Data Architect

KUOK (SINGAPORE) LIMITED

About the Role

We are seeking a passionate AI Data Architect to design and build the data foundation that makes AI work well across Kuok Group —specifically, the hybrid vector and knowledge graph layer (the enterprise semantic memory) that underpins RAG-based use cases across every business unit, as well as the embedding pipelines, ingestion workflows, and data schemas that keep it accurate and current. This role will be reporting to the Principal AI Architect.

This is an architecture-first role. The successful candidate will make structural decisions about how the Group’s unstructured data is organised, retrieved, and made useful to AI systems — working closely with the Head, AI Platform on technical direction and with the Applied AI Engineers who depend on the data layer to build reliable, high-quality solutions.

The role sits at an exciting intersection of data engineering and AI infrastructure. Those who bring a strong data engineering background and a genuine curiosity about vector databases, knowledge graphs, and retrieval design will find a lot to build here.
.

Key Responsibilities

AI Data Foundation Architecture

Design and own the hybrid vector and knowledge graph layer that underpins RAG across all Kuok Group BUs — the enterprise semantic memory that AI use cases draw on
Make structural decisions on how unstructured data is organised for retrieval: chunking strategies, embedding approaches, metadata schemas, and knowledge graph ontologies
Work with the Head, AI Platform to align data foundation design with the broader AI platform architecture and the requirements of active use cases
Document architectural decisions clearly — capturing both the reasoning and the outcomes— so the wider team can work with confidence

Embedding Pipelines & Vector Infrastructure

Build and maintain embedding pipelines: document ingestion, chunking, embedding model selection, and vector DB write workflows
Own the vector database layer (Pinecone, Weaviate, or equivalent) — index management, refresh cadence, performance tuning, and cost management
Design retrieval patterns that serve the needs of applied use cases: similarity search, hybrid search, re-ranking, and metadata filtering
Ensure embedding pipelines are monitored, versioned, and recoverable — data foundation reliability is as important as application reliability

Knowledge Graph Design & Ingestion

Design the knowledge graph layer (Neo4j or equivalent) — ontology modelling, entity and relationship schema, and ingestion workflows from source systems
Work with domain experts across BUs to ensure the knowledge graph accurately reflects the entities, relationships, and terminology that matter in each business context
Build and maintain ETL pipelines that keep the knowledge graph current as source data changes
Knowledge graph capability is being built from the ground up at the Group — this role has a real opportunity to shape how it develops and set the direction for how it scales

Data Quality & Governance

Establish data quality standards for AI-ingested content — source freshness, deduplication, completeness checks, and validation pipelines
Work with BU Domain Data Stewards to validate that domain-specific data is accurate before it enters the AI data layer
Maintain clear data lineage across the AI data foundation — what source data feeds which index or graph, and when it was last refreshed
Partner with the AI Governance & Compliance Lead on data privacy requirements for AI-ingested content, particularly across BUs with sensitive operational data

Collaboration & Standards

Partner with Applied AI Engineers to understand the retrieval requirements of each use case and ensure the data foundation is designed to support them well
Work with the Lead Data Engineer (supporting functions) on the handoff boundary between structured data / BI pipelines and the AI data layer
Maintain documentation of the AI data foundation — schemas, pipeline specs, refresh schedules, and known limitations — so the team can work with the data layer confidently
Contribute to the broader AI Platform cluster's engineering standards and participate in code and design reviews

Requirements

Must-Have

Solid data engineering foundations — you have designed and built ETL / ELT pipeline sat production scale, managed data quality, and worked with structured and semi-structured data in cloud environments
Hands-on experience with vector databases — you have built embedding pipelines, managed indexes, and designed retrieval patterns for RAG or semantic search applications
Understanding of RAG architecture from the data side: chunking strategies, embedding model selection, retrieval optimisation, and the effect of data quality on AI output quality
Experience designing schemas and data models for AI systems — with a strong appreciation for how data structure shapes retrieval quality and downstream AI output
Strong Python skills and comfort with the data engineering tooling ecosystem: pipeline orchestration, data validation, and working with cloud storage and databases
Clear, structured communication skills — you can explain data architecture decisions to both technical peers and non-technical stakeholders

Strong Advantage

Experience with knowledge graph design and ontology modelling — Neo4j or equivalent, including schema design, Cypher querying, and ETL into graph structures
Familiarity with enterprise data environments: federated data sources, multiple business domains, and working across teams with different data ownership models
Experience working closely with applied engineering teams as a data infrastructure provider — you understand how the data layer choices you make affect what engineers can build
Exposure to data governance and compliance requirements in an AI context: data lineage, PII handling, retention policies, and working with compliance stakeholders
Background in unstructured data processing — document parsing, OCR, text extraction, or working with content repositories as AI data sources