Back to jobs
BadePosted 1 weeks ago
Full-timeonsitesenior

Job Description

Qualifications:

  • Minimum 5 years of relevant experience in performance testing, system optimization, and HPC environments.
  • Proficiency in Linux system administration, including cluster setup and management.
  • Hands-on experience with Kubernetes (K8S) for container orchestration in AI/ML workloads.
  • Familiarity with CUDA and GPU configurations for AI/ML performance optimization.
  • In-depth knowledge of high-speed networking (e.g., InfiniBand, Ethernet) and related technologies.
  • Understanding of AI/ML frameworks such as PyTorch, TensorFlow, and deployment requirements for large language models (LLMs).
  • Ability to conduct performance testing and benchmarking for servers, GPUs, and HPC systems.
  • Capability to design, configure, and troubleshoot network topologies and components.
  • Server Problem-Solving and Monitoring.
  • Familiar in Virtualization (KVM…etc)/ Network file server / Linux command and Maintain OS / Build and maintain Docker service and K8S platform

See Your Match Score

Sign up and Renata will show you how this job matches your skills and experience.

5001-10000 employees
San Jose, CA, US
Website
System Engineer at Super Micro Computer, Inc. | Renata