Job Description
Singtel Digital InfraCo’s RE:AI division is building Asia’s most advanced and sustainable AI infrastructure ecosystem. RE:AI enables enterprises, research institutions, and digital-native businesses to accelerate innovation through responsible, high-performance AI compute and connectivity solutions. Be a Part of Something BIG! As an Operations Engineer supporting Singtel’s GPU-as-a-Service (GPUaaS) platform, you will contribute to the implementation, integration, and day-to-day operations of data centre environments that enable customers’ AI and High-Performance Computing (HPC) workloads. The role involves exposure to both physical data centre operations and supporting software systems used in GPU-oriented facilities. This role offers opportunities to build and deepen expertise in advanced data centre technologies for AI and HPC environments within a dynamic and continuously evolving operational setting. Responsibilities: Data Centre Operations Management Respond to, attend to, and escalate incidents based on defined criticality, impact, and service level agreements (SLAs). Perform hands-on operations involving air-cooled and liquid-cooled systems, as well as electrical systems, within the data centre environment. Participate actively in continuous improvement initiatives for operational processes, with consideration of GPU-oriented data centre requirements. Coordinate and obtain necessary security clearances for visitors and vendors accessing the GPUaaS data centre. Manage vendor activities and ensure compliance with Workplace Safety and Health (WSH) requirements and site regulations. Participate in scheduled or on-call support outside standard working hours, including nights, weekends, and public holidays, as required. Data Centre Facilities Management Monitor data centre facilities and infrastructure across upstream and downstream systems (e.g. power, cooling, leakage detection, environmental controls). Maintain and update data centre documentation, including preparation of operational and incident reports as required. Coordinate with internal and external stakeholders to resolve technical and process-related issues within the GPUaaS data centre. Ensure adherence to established Standard Operating Procedures (SOPs), Methods of Procedure (MOPs), and Emergency Response Procedures (ERPs). Apply knowledge of power and cooling requirements for air-cooled and liquid-cooled servers to support operational enhancements and capacity planning. Coordinate maintenance activities and system shutdowns with stakeholders and vendors to ensure system reliability and availability. Prepare monthly Facilities Management reports on overall data centre health and performance. Identify potential workplace safety and health risks within the data centre environment. Conduct visual inspections of servers and cooling distribution units. Perform server troubleshooting in collaboration with remote engineering teams. Requirements Diploma in Mechanical Engineering, Electrical Engineering, Building Services, or a related discipline. Broad understanding of data centre electrical and mechanical infrastructure, including fire safety systems, building management systems (BMS), equipment maintenance, and space planning. Experience in maintaining and operating data centre equipment, with emphasis on electrical and mechanical systems. Ability to work effectively both independently and as part of a team. Organised, adaptable, and able to respond to changing operational requirements and schedules. Demonstrated willingness to learn and develop skills in GPU-oriented and mission-critical data centre technologies. Rewards that Go Beyond Full suite of health and wellness benefits Ongoing training and development programs Internal mobility opportunities Your Career Growth Starts Here. Apply Now!
