Site Reliability Engineer

TECH AALTO PTE. LTD.

Job Summary

We are hiring Site Reliability Engineers whotreat operations as a software problem. You'll keep production healthy, but more importantly you'll build the automation, tooling, and agentic workflows that make running our systems boring and predictable. This is an engineering role - if your instinct on a recurring issue is to write code that removes it,you'll fit in well.

Our client operates in a regulated capital-markets environment, so the bar for reliability, security, and operational rigour is high.

Job Responsibilities

Own production reliability (SLOs, capacity, incident response, postmortems) and turn every incident into a durable fix in code or automation.

Build the platform and tooling that make services easy to deploy, observe, and operate: CI/CD, infrastructure-as-code, observability stacks, runbooks-as-code.

Apply AI agentically across operations (triage, root-cause analysis, remediation, change review) and contribute to our internal agentic ecosystem.

Design and integrate the systems underneath our services: messaging (e.g. Kafka), orchestration (e.g. Kubernetes), and performance-sensitive infrastructure.

Partner with product engineers on release readiness, rollout strategy, and production hardening before things ship.

Continuously reduce toil: measure it, attack it with code, and raise the floor on what "easy to maintain" looks like.

JobRequirements

5+ years in SRE, platform, or infrastructure engineering, with a clear track record of replacing manual work with code

Strong programming ability in at least one modern language (e.g. Go, Python, Kotlin, TypeScript, Rust, etc), you write production code, not just glue scripts

AI-native ways of working: real experience orchestrating agents for ops workflows, not just using AI for autocomplete

Deep hands-on with Kubernetes, IaC (Terraform or equivalent), CI/CD, and modern observability (metrics, logs, traces)

Production experience on a major cloud: GCP preferred, AWS acceptable

Solid foundations in distributed systems and the failure modes that matter in production

Incident-response maturity: calm under pressure, sharp on root cause, disciplined about follow-through

Comfort in complex, regulated environments

Familiarity with the FIX protocol or capital-markets domain

Experience building internal developer platforms or self-service tooling consumed by other engineers

When you apply, you voluntarily consent to the disclosure, collection and use of your personal data for employment/recruitment and related purposes in accordance with the TechAalto Privacy Policy, a copy of which is published at Tech Aalto’s website (https://www.techaalto.com/privacy/)

Confidentiality is assured,and only shortlisted candidates will be notified for interviews.
Tech Aalto Pte Ltd | 24S2130 EA | Pushpanjli Kir | R1657306.