Site Reliability Engineer

TECH AALTO PTE. LTD.

Job Summary

We are hiring Site Reliability Engineers whotreat operations as a software problem. You'll keep production healthy, but more importantly you'll build the automation, tooling, and agentic workflows that make running our systems boring and predictable. This is an engineering role - if your instinct on a recurring issue is to write code that removes it,you'll fit in well.

Our client operates in a regulated capital-markets environment, so the bar for reliability, security, and operational rigour is high.

Job Responsibilities

  • Own production reliability (SLOs, capacity, incident response, postmortems) and turn every incident into a durable fix in code or automation.
  • Build the platform and tooling that make services easy to deploy, observe, and operate: CI/CD, infrastructure-as-code, observability stacks, runbooks-as-code.
  • Apply AI agentically across operations (triage, root-cause analysis, remediation, change review) and contribute to our internal agentic ecosystem.
  • Design and integrate the systems underneath our services: messaging (e.g. Kafka), orchestration (e.g. Kubernetes), and performance-sensitive infrastructure.
  • Partner with product engineers on release readiness, rollout strategy, and production hardening before things ship.
  • Continuously reduce toil: measure it, attack it with code, and raise the floor on what "easy to maintain" looks like.

JobRequirements

  • 5+ years in SRE, platform, or infrastructure engineering, with a clear track record of replacing manual work with code
  • Strong programming ability in at least one modern language (e.g. Go, Python, Kotlin, TypeScript, Rust, etc), you write production code, not just glue scripts
  • AI-native ways of working: real experience orchestrating agents for ops workflows, not just using AI for autocomplete
  • Deep hands-on with Kubernetes, IaC (Terraform or equivalent), CI/CD, and modern observability (metrics, logs, traces)
  • Production experience on a major cloud: GCP preferred, AWS acceptable
  • Solid foundations in distributed systems and the failure modes that matter in production
  • Incident-response maturity: calm under pressure, sharp on root cause, disciplined about follow-through
  • Comfort in complex, regulated environments
  • Familiarity with the FIX protocol or capital-markets domain
  • Experience building internal developer platforms or self-service tooling consumed by other engineers

When you apply, you voluntarily consent to the disclosure, collection and use of your personal data for employment/recruitment and related purposes in accordance with the TechAalto Privacy Policy, a copy of which is published at Tech Aalto’s website (https://www.techaalto.com/privacy/)

Confidentiality is assured,and only shortlisted candidates will be notified for interviews.
Tech Aalto Pte Ltd | 24S2130 EA | Pushpanjli Kir | R1657306.