Site Reliability Engineer
TECH AALTO PTE. LTD.
Job Summary
We are hiring Site Reliability Engineers whotreat operations as a software problem. You'll keep production healthy, but more importantly you'll build the automation, tooling, and agentic workflows that make running our systems boring and predictable. This is an engineering role - if your instinct on a recurring issue is to write code that removes it,you'll fit in well.
Our client operates in a regulated capital-markets environment, so the bar for reliability, security, and operational rigour is high.
Job Responsibilities
- Own production reliability (SLOs, capacity, incident response, postmortems) and turn every incident into a durable fix in code or automation.
- Build the platform and tooling that make services easy to deploy, observe, and operate: CI/CD, infrastructure-as-code, observability stacks, runbooks-as-code.
- Apply AI agentically across operations (triage, root-cause analysis, remediation, change review) and contribute to our internal agentic ecosystem.
- Design and integrate the systems underneath our services: messaging (e.g. Kafka), orchestration (e.g. Kubernetes), and performance-sensitive infrastructure.
- Partner with product engineers on release readiness, rollout strategy, and production hardening before things ship.
- Continuously reduce toil: measure it, attack it with code, and raise the floor on what "easy to maintain" looks like.
JobRequirements
- 5+ years in SRE, platform, or infrastructure engineering, with a clear track record of replacing manual work with code
- Strong programming ability in at least one modern language (e.g. Go, Python, Kotlin, TypeScript, Rust, etc), you write production code, not just glue scripts
- AI-native ways of working: real experience orchestrating agents for ops workflows, not just using AI for autocomplete
- Deep hands-on with Kubernetes, IaC (Terraform or equivalent), CI/CD, and modern observability (metrics, logs, traces)
- Production experience on a major cloud: GCP preferred, AWS acceptable
- Solid foundations in distributed systems and the failure modes that matter in production
- Incident-response maturity: calm under pressure, sharp on root cause, disciplined about follow-through
- Comfort in complex, regulated environments
- Familiarity with the FIX protocol or capital-markets domain
- Experience building internal developer platforms or self-service tooling consumed by other engineers
When you apply, you voluntarily consent to the disclosure, collection and use of your personal data for employment/recruitment and related purposes in accordance with the TechAalto Privacy Policy, a copy of which is published at Tech Aalto’s website (https://www.techaalto.com/privacy/)
Confidentiality is assured,and only shortlisted candidates will be notified for interviews.
Tech Aalto Pte Ltd | 24S2130 EA | Pushpanjli Kir | R1657306.