Lead Site Reliability Engineer (Quality Assurance)

BEATHCHAPMAN (PTE. LTD.)

Client introduction

Our client is an established fintech headquartered in Singapore, operating across payments and foreign exchange with a footprint spanning Asia and beyond, including a significant client base in Greater China. As the business scales towards its next stage of growth, they are building out the senior layer of their reliability function.

They are hiring a newly created Lead Site Reliability Engineer (Quality Assurance) sits within the SRE team as a key technical hire and co-lead with the Head of SRE.

Job responsibilities

  • Own production reliability across the FX and payments platforms - monitoring, observability, alerting, and the definition and tracking of SLIs and SLOs.
  • Lead incident response end to end, including war rooms, post-mortems, root-cause analysis and the upkeep of operational runbooks.
  • Strengthen quality assurance across the platform - improving test coverage, release sign-off standards, and modernising legacy test automation toward more automated and AI-assisted workflows.
  • Support client API integration from sandbox to production, including liquidity provider onboarding and conformance testing.
  • Act as the technical escalation bridge between clients, internal users and engineering, supporting China-based clients directly on production and integration issues.
  • Lead business continuity and disaster recovery testing, including failover, recovery and audit evidence preparation.
  • Contribute to DevOps and tooling improvements across reliability, testing and support.
  • Coach and uplift a team of junior SRE and QA engineers, setting standards and mentoring on best practice as the function matures.
  • At least 6 years of experience in site reliability, production support, platform engineering or technical operations, ideally within fintech, payments, FX, trading systems or another high-availability environment.
  • Strong hands-on troubleshooting across production systems, logs, APIs and application behaviour.
  • Hands-on quality assurance exposure - test automation, release support or regression testing - as the role spans both reliability and quality.
  • Working knowledge of API integration and comfort in client-facing or client-support situations; sandbox-to-production experience is an advantage.
  • Solid fundamentals across cloud and containers (AWS, Docker, Kubernetes), monitoring and observability tooling (Grafana, Prometheus, OpenSearch, CloudWatch), and scripting (Python, Java, Bash).
  • People management or team-lead experience is necessary.
  • Sound grounding in incident management, RCA and BCP/DR practice.
  • Professional spoken and written Mandarin proficiency is required to communicate directly with China-based clients and stakeholders, support production issues, and manage API integration activities.
  • Suited to someone hands-on today who wants to grow into a broader leadership role. Candidates requiring relocation to Singapore are welcome to apply. Relocation expenses will not be provided for this position.

Why you should join them

  • A newly created, high-visibility role as second-in-command within the reliability function, with a genuine path to grow into a deputy or co-leadership position through succession planning.
  • Direct exposure to senior engineering leadership, reporting to the Head of Engineering and working alongside the infrastructure team.
  • Broad ownership across reliability, quality, client integration and incident management, with room to shape how these run rather than inherit a fixed playbook.
  • A modern technical environment with real appetite for AI-assisted tooling across testing, RCA and support, plus a hybrid working arrangement.

JL

Reg. No. R1766249

BeathChapman Pte Ltd

Licence no. 16S8112