Lead Performance and Optimization Engineer
Job Description
WHAT YOU DO AT AMD CHANGES EVERYTHING
At AMD, our mission is to build great products that accelerate next-generation computing experiences—from AI and data centers, to PCs, gaming and embedded systems
Grounded in a culture of innovation and collaboration, we believe real progress comes from bold ideas, human ingenuity and a shared passion to create something extraordinary
When you join AMD, you’ll discover the real differentiator is our culture
We push the limits of innovation to solve the world’s most important challenges—striving for execution excellence, while being direct, humble, collaborative, and inclusive of diverse perspectives
Join us as we shape the future of AI and beyond. Together, we advance your career.
Lead Performance and Optimization Engineer
THE ROLE:
We are seeking a Performance Engineer with strong expertise in serverclass CPUs, CPU microarchitecture, and ML inference, responsible for benchmarking, analysing, and optimizing CPU inference performance using EPYCoptimized ML libraries (e.g., ZenDNN) with common frameworks (PyTorch, TensorFlow, ONNX Runtime)
The role includes handson work in performance debugging, OS/BIOS tuning, thread/core affinity, multiinstance execution, and Python/scriptingbased automation. .
THE PERSON:
The ideal candidate should be passionate about software engineering and possess leadership skills to drive sophisticated issues to resolution
Able to communicate effectively and work optimally with different teams across AMD.
KEY RESPONSIBILITIES:
Performance Engineering & Optimization
- Run and optimize ML inference workloads on CPUs using EPYCoptimized libraries (ZenDNN), improving throughput/latency across singleinstance and multiinstance scenarios.
- Configure and tune NUMA, HugePages, SMT, power/performance modes, CPU isolation, scheduler settings, scaling governors, and other OS/BIOS parameters.
- Design and validate thread/core affinity strategies for singleinstance, multiinstance, multisocket, and frameworklevel multiinstance execution models.
- Optimize workload behaviour through NUMAaware locality, thread scheduling/pinning, batch size tuning, operatorlevel parallelism, and other CPUfocused techniques.
- Contribute to multiinstance execution framework development, including policies for instance partitioning, core allocation, memory distribution, and orchestration of parallel runs on large EPYC systems.
Benchmarking & Analysis
- Develop and run structured benchmarks across EPYC SKUs, core counts, caching/topology variations, sockets, and diverse batch sizes.
- Analyze scaling for singleinstance vs. multiinstance execution, instance placement strategies, and workload isolation.
- Use perf, VTune, ftrace/tracecmd, PMU counters, flame graphs to identify bottlenecks in compute, memory, thread scheduling, or