AI Data Architect

KUOK (SINGAPORE) LIMITED

About the Role

We are seeking a passionate AI Data Architect to design and build the data foundation that makes AI work well across Kuok Group —specifically, the hybrid vector and knowledge graph layer (the enterprise semantic memory) that underpins RAG-based use cases across every business unit, as well as the embedding pipelines, ingestion workflows, and data schemas that keep it accurate and current. This role will be reporting to the Principal AI Architect.

This is an architecture-first role. The successful candidate will make structural decisions about how the Group’s unstructured data is organised, retrieved, and made useful to AI systems — working closely with the Head, AI Platform on technical direction and with the Applied AI Engineers who depend on the data layer to build reliable, high-quality solutions.

The role sits at an exciting intersection of data engineering and AI infrastructure. Those who bring a strong data engineering background and a genuine curiosity about vector databases, knowledge graphs, and retrieval design will find a lot to build here.
.

Key Responsibilities

AI Data Foundation Architecture

  • Design and own the hybrid vector and knowledge graph layer that underpins RAG across all Kuok Group BUs — the enterprise semantic memory that AI use cases draw on
  • Make structural decisions on how unstructured data is organised for retrieval: chunking strategies, embedding approaches, metadata schemas, and knowledge graph ontologies
  • Work with the Head, AI Platform to align data foundation design with the broader AI platform architecture and the requirements of active use cases
  • Document architectural decisions clearly — capturing both the reasoning and the outcomes— so the wider team can work with confidence

Embedding Pipelines & Vector Infrastructure

  • Build and maintain embedding pipelines: document ingestion, chunking, embedding model selection, and vector DB write workflows
  • Own the vector database layer (Pinecone, Weaviate, or equivalent) — index management, refresh cadence, performance tuning, and cost management
  • Design retrieval patterns that serve the needs of applied use cases: similarity search, hybrid search, re-ranking, and metadata filtering
  • Ensure embedding pipelines are monitored, versioned, and recoverable — data foundation reliability is as important as application reliability

Knowledge Graph Design & Ingestion

  • Design the knowledge graph layer (Neo4j or equivalent) — ontology modelling, entity and relationship schema, and ingestion workflows from source systems
  • Work with domain experts across BUs to ensure the knowledge graph accurately reflects the entities, relationships, and terminology that matter in each business context
  • Build and maintain ETL pipelines that keep the knowledge graph current as source data changes
  • Knowledge graph capability is being built from the ground up at the Group — this role has a real opportunity to shape how it develops and set the direction for how it scales

Data Quality & Governance

  • Establish data quality standards for AI-ingested content — source freshness, deduplication, completeness checks, and validation pipelines
  • Work with BU Domain Data Stewards to validate that domain-specific data is accurate before it enters the AI data layer
  • Maintain clear data lineage across the AI data foundation — what source data feeds which index or graph, and when it was last refreshed
  • Partner with the AI Governance & Compliance Lead on data privacy requirements for AI-ingested content, particularly across BUs with sensitive operational data

Collaboration & Standards

  • Partner with Applied AI Engineers to understand the retrieval requirements of each use case and ensure the data foundation is designed to support them well
  • Work with the Lead Data Engineer (supporting functions) on the handoff boundary between structured data / BI pipelines and the AI data layer
  • Maintain documentation of the AI data foundation — schemas, pipeline specs, refresh schedules, and known limitations — so the team can work with the data layer confidently
  • Contribute to the broader AI Platform cluster's engineering standards and participate in code and design reviews

.

Requirements

Must-Have

  • Solid data engineering foundations — you have designed and built ETL / ELT pipeline sat production scale, managed data quality, and worked with structured and semi-structured data in cloud environments
  • Hands-on experience with vector databases — you have built embedding pipelines, managed indexes, and designed retrieval patterns for RAG or semantic search applications
  • Understanding of RAG architecture from the data side: chunking strategies, embedding model selection, retrieval optimisation, and the effect of data quality on AI output quality
  • Experience designing schemas and data models for AI systems — with a strong appreciation for how data structure shapes retrieval quality and downstream AI output
  • Strong Python skills and comfort with the data engineering tooling ecosystem: pipeline orchestration, data validation, and working with cloud storage and databases
  • Clear, structured communication skills — you can explain data architecture decisions to both technical peers and non-technical stakeholders

.

Strong Advantage

  • Experience with knowledge graph design and ontology modelling — Neo4j or equivalent, including schema design, Cypher querying, and ETL into graph structures
  • Familiarity with enterprise data environments: federated data sources, multiple business domains, and working across teams with different data ownership models
  • Experience working closely with applied engineering teams as a data infrastructure provider — you understand how the data layer choices you make affect what engineers can build
  • Exposure to data governance and compliance requirements in an AI context: data lineage, PII handling, retention policies, and working with compliance stakeholders
  • Background in unstructured data processing — document parsing, OCR, text extraction, or working with content repositories as AI data sources

How to apply

To apply for this job you need to authorize on our website. If you don't have an account yet, please register.