Back to jobs
Job Description
Job Purpose
- Seeking a highly skilled and hands-on System Operation Manager responsible for managing the organization’s system operations, ensuring availability, performance, security, and maintenance. The role oversees system monitoring, incident response, vulnerability management, compliance support, and continuous improvement of optimal performance of the servers.
- This role will work closely with cloud, application, security operation and business teams to ensure stable and reliable IT operations.
Responsibilities
- Manage and maintain Linux and Windows server environments (physical, virtual, and cloud).
- Ensure high availability and optimal performance of servers and related infrastructure.
- Monitor system health, capacity, and resource utilization.
- Manage Active Directory, Group Policy, DNS, DHCP, and file services.
- Administer Linux services such as Apache, Nginx, SSH, Samba, and database servers.
- Implement and maintain system hardening standards.
- Ensure timely patch management and vulnerability remediation.
- Manage privileged access controls and security configurations.
- Support compliance requirements, audits, and security assessments.
- Monitor and respond to security incidents affecting servers and infrastructure.
- Establish proactive monitoring and alerting mechanisms.
- Lead incident response, troubleshooting, and root cause analysis.
- Coordinate major incident management and service restoration activities.
- Prepare post-incident reports and corrective action plans.
- Manage backup solutions and data recovery procedures.Conduct regular backup validation and restoration testing.
- Develop and maintain disaster recovery and business continuity plans.
- Ensure recovery objectives (RPO/RTO) are met.
- Manage cloud platforms such as Microsoft Azure, Amazon Web Services, or Google Cloud.
- Administer virtualization platforms such as VMware, Microsoft Hyper-V, and container platforms.
- Monitor cloud costs and optimize infrastructure utilization.
Qualifications & Work Experience
- Bachelor’s degree in Information Technology, Computer Science, or a related field
- Experience in SOC operations and incident response
- Familiarity with cloud security environments (Azure, AWS, Google Cloud) preferred
- Strong knowledge of server operations, incident response, vulnerability management, network technologies, identity & access management, and cloud security
- Hands-on experience with Azure, AWS, Google Cloud, as well as Linux and Windows operating systems
- Relevant certifications preferred (e.g., CompTIA Linux+, LPI, Linux Foundation, Red Hat, Windows Server Hybrid Administrator, Azure, AWS, Google Cloud)
Skills
- 5–8 years of experience in cybersecurity or IT security operations, with at least 3–5 years in a leadership role
- Strong analytical and problem-solving skills
- Proven capability in crisis and incident management
- Effective communication and stakeholder management skills
- Ability to work under pressure and manage multiple priorities
- Continuous improvement mindset
SMRT Trains Ltd was incorporated in 1987 and operates Singapore’s first mass rapid transit system. Today, we manage and operate train services on the North-South Line, East-West Line, the Circle Line, the Thomson-East Coast Line, and the Bukit Panjang Light Rail Transit. With over 5,000 employees, more than 250 trains, and 141 km of rail tracks across 108 stations, we serve millions of commuters daily.