
System Administrator – Advanced Data Center and AI Infrastructure
Job Description
We’re seeking a hands-on System Administrator who thrives in complex data center environments powering next-generation AI and networking platforms. The role involves deploying and maintaining bare-metal and multi-node environments running NVIDIA networking, DGX, and advanced computing systems — focusing on firmware validation infrastructure, BMC management, regression lab automation, and continuous availability of critical test platforms.
The ideal candidate brings deep Linux expertise and is comfortable solving issues at the hardware-firmware boundary such as BIOS, BMC, NIC firmware, and debug interfaces. They also have experience with infrastructure-as-code and monitoring at scale. You’ll join a team supporting rapid silicon bring-up and pre-production validation — where uptime and automation directly accelerate product delivery.
What You’ll Be Doing:
Deploy, configure, and maintain NVIDIA DGX, GB, and HPC systems within our data center.
Monitor and ensure system health through preventive maintenance, upgrades, patching, and resolving issues in both physical and virtual environments.
Implement and update automation for efficient AI and HPC administration via Bash and Python scripting.
Lead integration, onboarding, and optimization for new hardware and edge technologies alongside cross-functional teams.
Provide technical support and collaborate to enable rapid deployment and system bring-up of new technologies.
What We Need to See:
Practical electronics or software engineering diploma, or system administrator certificates (any), or equivalent hands-on experience.
Minimum 3+ years' experience as a System Administrator handling large-scale data center, HPC, or AI infrastructure deployments.
Proven background in Linux server environments and hands-on experience with platforms such as NVIDIA DGX and GB, or HPC clusters.
Solid grasp of system architecture, networking fundamentals, and enterprise storage operations.
Clear experience in automating system administration tasks and improving workflows for AI and HPC infrastructure.
Ways to Stand Out from the Crowd:
Extensive experience with cluster management, platform monitoring, and best practices in high-performance and GPU-accelerated environments.
Certifications in system administration, Linux, or enterprise HPC/AI infrastructure.
Practical experience with rack installation, high-density physical infrastructure, and scalable solutions for demanding workloads.
Proven troubleshooting skills and ability to collaborate across large-scale technical environments.