Lead Site Reliability Engineer (Quality Assurance)

BEATHCHAPMAN (PTE. LTD.)

Client introduction

Our client is an established fintech headquartered in Singapore, operating across payments and foreign exchange with a footprint spanning Asia and beyond, including a significant client base in Greater China. As the business scales towards its next stage of growth, they are building out the senior layer of their reliability function.

They are hiring a newly created Lead Site Reliability Engineer (Quality Assurance) sits within the SRE team as a key technical hire and co-lead with the Head of SRE.

Job responsibilities

Own production reliability across the FX and payments platforms - monitoring, observability, alerting, and the definition and tracking of SLIs and SLOs.
Lead incident response end to end, including war rooms, post-mortems, root-cause analysis and the upkeep of operational runbooks.
Strengthen quality assurance across the platform - improving test coverage, release sign-off standards, and modernising legacy test automation toward more automated and AI-assisted workflows.
Support client API integration from sandbox to production, including liquidity provider onboarding and conformance testing.
Act as the technical escalation bridge between clients, internal users and engineering, supporting China-based clients directly on production and integration issues.
Lead business continuity and disaster recovery testing, including failover, recovery and audit evidence preparation.
Contribute to DevOps and tooling improvements across reliability, testing and support.
Coach and uplift a team of junior SRE and QA engineers, setting standards and mentoring on best practice as the function matures.

At least 6 years of experience in site reliability, production support, platform engineering or technical operations, ideally within fintech, payments, FX, trading systems or another high-availability environment.
Strong hands-on troubleshooting across production systems, logs, APIs and application behaviour.
Hands-on quality assurance exposure - test automation, release support or regression testing - as the role spans both reliability and quality.
Working knowledge of API integration and comfort in client-facing or client-support situations; sandbox-to-production experience is an advantage.
Solid fundamentals across cloud and containers (AWS, Docker, Kubernetes), monitoring and observability tooling (Grafana, Prometheus, OpenSearch, CloudWatch), and scripting (Python, Java, Bash).
People management or team-lead experience is necessary.
Sound grounding in incident management, RCA and BCP/DR practice.
Professional spoken and written Mandarin proficiency is required to communicate directly with China-based clients and stakeholders, support production issues, and manage API integration activities.
Suited to someone hands-on today who wants to grow into a broader leadership role. Candidates requiring relocation to Singapore are welcome to apply. Relocation expenses will not be provided for this position.

Why you should join them

A newly created, high-visibility role as second-in-command within the reliability function, with a genuine path to grow into a deputy or co-leadership position through succession planning.
Direct exposure to senior engineering leadership, reporting to the Head of Engineering and working alongside the infrastructure team.
Broad ownership across reliability, quality, client integration and incident management, with room to shape how these run rather than inherit a fixed playbook.
A modern technical environment with real appetite for AI-assisted tooling across testing, RCA and support, plus a hybrid working arrangement.

Reg. No. R1766249

BeathChapman Pte Ltd

Licence no. 16S8112