System Lead

COMBUILDER PTE LTD

Key Responsibilities

1. Team Leadership & People Management

  • Lead and supervise a team and oversee 24/7 operational coverage of manpower
  • Provide mentorship, coaching, and skill development for junior engineers.
  • Conduct performance reviews, identify training gaps, and drive continuous improvement across the team.
  • Act as the point-of-escalation for all operational matters.

2. Data Centre Operations Management

  • Ensure smooth day-to-day operation of all systems, networks, security tools, and DC facilities.
  • Oversee all daily operations across data centre infrastructure, security, systems, and network domains to ensure efficiency, stability, and continuous service availability.
  • Ensure compliance with Data Centre SOPs, operational policies, and security guidelines.
  • Coordinate equipment movement, installation, decommissioning, and preventive maintenance activities.
  • Review shift reports, incident logs, and ensure proper documentation for audits.
  • Drive readiness, preventive maintenance, audit compliance, and operational excellence initiatives.

3. Incident & Problem Management

  • Act as the senior escalation point for critical incidents across systems, networks, and security technologies.
  • Provide expert guidance on infrastructure issues including Windows/Linux servers, virtualization, storage, security appliances, and network technologies.
  • Review RCA reports, lead problem management, and drive long-term remediation plans.
  • Oversee complex change requests, service requests, patching cycles, and integration activities.
  • Ensure alignment with ITIL processes and industry best practices.
  • Oversee triaging, root cause analysis (RCA), and ensure timely closure of incidents and service requests.
  • Ensure incidents, alarms and alerts are properly logged, categorised, prioritised, and tracked according to SLA.
  • Liaise with customers, vendors, and internal stakeholders for critical incidents and troubleshooting.

4. Systems & Network Operations Oversight

  • Server, OS, storage systems monitoring and maintenance.
  • Network operations including switches, routers, firewalls, VPN and load balancers.
  • Ensure routine patching, backup operation, and system health checks.
  • Guide the team on best practices for authentication, authorization, encryption and configuration management.

5. Vendor & Stakeholder Management

  • Coordinate with external vendors for maintenance, replacement, and enhancement activities.
  • Ensure timely follow-up on open tickets, service disruptions and preventive maintenance.
  • Communicate operational updates, risks, and issues to management and customers.

6. Compliance, Documentation & Reporting

  • Ensure operational documentation (SOP, checklist, incident report, RCA, inventory, access logs) are updated and accurate.
  • Drive audit readiness for ISO, IT security audits, and internal governance requirements.
  • Prepare periodic operational reports and dashboards for management.

Education & Experience

  • Diploma/Degree in Computer Science, IT, Engineering or related fields.
  • Minimum 10 - 15 years of hands-on experience in IT and cloud infrastructure or data centre operations.
  • Minimum 5 years experience leading a technical team of Azure Infrastructure Engineer, operation centre, or shift-based environment.

Technical Skills

  • IT Infrastructure (Windows / Linux servers, virtualization, storage)
  • Network technologies (L2/L3 concepts, routing, switching, firewalls)
  • Azure Stack Hub / hybrid cloud environments
  • Storage: S2D, SAN/NAS, Unity/VNX
  • Windows HCI
  • Kubernetes / SDN networking knowledge
  • Security tools/products (e.g., BeyondTrust, SEPM, RSA, Palo Alto, Checkpoint, Fortigate, Safenet)
  • Data Centre operations (monitoring tools, backup operations, tape management, hardware handling)
  • Backup operations / Tape library / DR procedures
  • Incident management and ITIL framework
  • Knowledge of authentication, encryption, access management concepts

Key Competencies

  • Strong leadership, communication, and stakeholder management skills.
  • Proven ability to manage crisis situations, critical escalations, and high-severity incidents.
  • Strong analytical and problem-solving mindset.
  • Ability to develop team members, build processes, and drive operational excellence.
  • Able to support 24×7 operations when required (escalation or major incidents).