Network Automation & Reliability Engineer

RECRUIT EXPRESS PTE LTD

Network Automation and Engineering Excellence

  • Design, develop, and maintain production-grade automation frameworks using Ansible, Python, and CI/CD pipelines.
  • Build reusable Ansible collections, Python libraries, and RESTAPI integrations for network and security platforms.
  • Implement GitOps practices for network configuration management, including version control, automated testing, and continuous deployment.
  • Integrate automation workflows with enterprise DevOps toolchains (e.g., GitHub, Ansible, Terraform, ITSM tools).
  • Develop automated validation and rollback mechanisms for network changes.
  • Perform hands-on development, testing, and deployment of automation workflows in production environments.
  • Act as the escalation point for complex network automation and reliability issues, performing deep-dive troubleshooting and root cause analysis.
  • Evaluate and implement emerging automation technologies to enhance scalability, reliability, and efficiency.

Foundational Network Architecture and Operations

  • Serve as the SME for core network domains — routing, switching, firewalls, load balancing, WAN optimization, and hybrid cloud connectivity.
  • Provide architectural oversight for critical network platforms including Cisco, Checkpoint, Palo Alto, F5, Zscaler, Symantec, and AWS.
  • Lead modernization initiatives to ensure high availability, security, and performance across the global network.
  • Drive capacity planning, performance optimization, and lifecycle management to maintain operational excellence.
  • Establish and enforce network reliability standards, ensuring minimal downtime and rapid recovery from incidents.

Firewall and Security Automation

  • Develop and maintain automation for firewall policy management, rule lifecycle, and compliance validation across Checkpoint, Palo Alto, Cisco, and cloud-native firewalls.
  • Integrate firewall automation workflows with change management and compliance systems (e.g., AlgoSec, ITSM tools).
  • Collaborate with Cybersecurity teams to embed security controls and compliance into network automation and operational processes.

Observability, Tooling, and Reliability Engineering:

  • Design and implement observability pipelines for network telemetry (e.g., streaming telemetry, SNMP, NetFlow, sFlow) integrated with Grafana, Prometheus, and ELK.
  • Automate health checks, anomaly detection, and self-healing workflows using observability data.
  • Collaborate with SRE and platform teams to define and implement network SLOs, SLIs, and error budgets.
  • Develop automated testing frameworks for network configuration validation, compliance checks, and pre-deployment simulation (e.g., Batfish, NAPALM, pyATS).
  • Implement chaos and resilience testing for network automation workflows to ensure fault tolerance and recovery.

Cross-Functional Collaboration and Stakeholder Management

  • Collaborate with Cloud, Security, DevOps, and Application teams to ensure seamless integration of network services into enterprise platforms and workflows.
  • Partner with Cybersecurity and Compliance teams to ensure automation adheres to enterprise security and regulatory standards.
  • Manage vendor and service provider relationships, influencing technology roadmaps and ensuring alignment with client direction.
  • Communicate complex technical concepts to non-technical stakeholders, providing clear insights into network performance, risks, and strateclient initiatives.
  • Mentor and upskill network engineers in automation best practices, coding standards, and tool usage.

Governance, Compliance, and Continuous Improvement:

  • Establish governance frameworks for network automation and reliability, ensuring compliance with internal policies, regulatory standards, and industry best practices.
  • Define and track key performance indicators (KPIs) such as automation coverage, change success rate, and mean time to recovery (MTTR).
  • Lead post-incident reviews, root cause analyses, and continuous improvement initiatives to enhance operational resilience.
  • Ensure comprehensive documentation of automation workflows, network configurations, and operational procedures.

Interested applicants please send your resume to ***email_hidden***

Venessa Goh Wee Ni

R24124686

Recruit Express Pte Ltd

EA License No: 99C4599

We regret that only shortlisted candidates will be contacted.