Software Engineer, AI Platform

KRIS INFOTECH PTE. LTD.

You will own how these platforms are built and run: backend services, the developer-facing portal, integrations with internal systems, and the infrastructure underneath.
The data science team remains closely involved as domain experts and primary internal customers.
You are the engineer who makes sure what they need actually ships and holds together in production.
This is a software engineering role first.
You should understand LLM and agent concepts — how tool calling works, why agent loops behave non-deterministically, how context management affects system design.
Multi-agent workflows — moving from single-agent execution to coordinated multi-agent systems that can handle complex, long-horizon tasks across business functions.
Core system integration — deeper connections with Company's operational and commercial systems, so AI agents can act on real data in real time rather than working in isolation.
Real-time decision support — expanding the platform's role from task automation toward live recommendations and decision assistance in high-stakes operational contexts.

What you'll work on:

Agent runtime & execution — session state, agent lifecycle management (pause, resume, plan mode), spec versioning, execution safety.
Auth, security & governance — OAuth integrations, credential rotation, PII controls, prompt injection mitigation, runtime guardrails.
Platform infrastructure — cost and usage tracking, distributed tracing (OpenTelemetry), scaling across business units.
Protocol & ecosystem integrations — MCP, A2A, WebSocket-based tooling, multi-cloud file handling.
Observability & developer tooling — metrics dashboards, debugging infrastructure, the developer portal that internal teams use to build and test agents.

Must-have:

Strong backend engineering experience in Node.js / TypeScript. Track record of owning services end to end in production — not just shipping features to a spec.
Comfortable across the stack. Able to pick up frontend work in React when the ticket calls for it, even if backend is your primary strength.
Solid API design skills. Familiar with authentication and authorisation patterns (OAuth and similar) and production debugging.
Familiarity with observability practices — logging, tracing, metrics. OpenTelemetry or equivalent experience preferred.
Conceptual fluency with LLM agent systems: tool calling, context windows, streaming, non-determinism. Enough to make sound architectural decisions in this domain. No ML or model training background required.
Comfortable with ambiguity. This team is small. You will often decide how something should be built, not just execute a spec.

Nice-to-have:

Hands-on experience with agent frameworks or protocols (LangChain, Google ADK, MCP, A2A, or similar).
Cloud infrastructure experience (AWS).
Background in platform or developer-tooling engineering — building for other engineers as your primary customer.
Interest in AI security: prompt injection, guardrails, and safe agent execution are real parts of this backlog.