Job Description
- Design and build scalable infrastructure for both online and offline inference workloads.
- Lead integration of high-performance inference runtimes and serving frameworks, including TensorRT, vLLM, ONNX, and Triton.
- Drive architecture and technical decisions across Bloomberg’s inference platform, balancing latency, throughput, reliability, and cost.
- Partner across engineering teams to improve model deployment, observability, and production performance.
- Mentor junior engineers on system design, debugging, and performance optimization.
- 5+ years of professional software engineering experience.
- Experience designing, building, and operating production distributed systems.
- Strong systems intuition and a track record of debugging and optimizing performance-critical services.
- Ability to own problems end-to-end and quickly ramp up in unfamiliar technical areas.
- 4+ years of demonstrated experience working with an object-oriented programming language.
- A degree in Computer Science, Electrical Engineering, or equivalent practical experience.
- Experience deploying and operating machine learning systems at scale.
- Experience with inference optimization techniques such as batching, caching, request scheduling, or memory-aware serving.
- Familiarity with PyTorch and GPU software stacks such as CUDA and NCCL.
- Exposure to high-performance interconnects and distributed computing technologies such as NVLink, InfiniBand, or MPI.
- Experience with Kubernetes and cloud-native infrastructure.
- Experience with load balancing, request routing, or traffic management systems.
- Autoscaling a heterogeneous compute fleet to match supply and demand aross diverse inference workloads.
- Building production-grade deployment pipelines to safely roll out new models to millions of users.
- Developing new inference capabilities such as structured sampling, prompt caching, and advanced serving optimizations.
- Analyzing observability data from real production workloads to improve latency, throughput, and resource efficiency.
We offer one of the most comprehensive and generous benefits plans available and offer a range of total rewards that may include merit increases, incentive compensation (exempt roles only), paid holidays, paid time off, medical, dental, vision, short and long term disability benefits, 401(k) +match, life insurance, and various wellness programs, among others. The Company does not provide benefits directly to contingent workers/contractors and interns.