Member of Technical Staff - ML Infrastructure & Performance

San Mateo, CAPosted 5 months ago

Full-timeremote

Job Description

Introducing Moonlake, AI for creating real-time interactive content

Mission: Improve Throughput, Latency, & Cost - deploying our models 2–10× faster & cheaper without quality regressions.

Scope of Work:

- GPU performance: CUDA/Triton kernels, FlashAttention family, paged attention, CUDA Graphs.

- Serving stack: TensorRT-LLM/Triton Inference Server, vLLM/TGI; continuous batching; on-GPU KV reuse; speculative decoding/medusa; mixture-of-agents routing.

- Parallelism: FSDP/ZeRO, TP/PP/expert parallel; NCCL tuning.

- Quantization/PEFT: AWQ/GPTQ/FP8; LoRA/DoRA serving.

- Systems: Ray/k8s/Argo, observability (Prom/Grafana/OpenTelemetry), autoscaling, A/B infra, canary + rollback.

Tech signals:

Previous experience at Infra-heavy startups such as Databricks, Roblox

We are committed to being an on-site, in-person team currently based in San Mateo

See Your Match Score

About Embedding VC

Website

More jobs at Embedding VC

AI 数据平台产品经理｜标注 / 评测方向

Palo Alto, CA

Data Scientist

Redwood City, CA

Enterprise Marketing Lead

New York, NY

Brand Designer

New York, NY

Founding Data Engineer

New York, NY

Growth Marketing Lead

New York, NY

Similar roles

Nights Team Member - Greggs

Moto Hospitality · Chippenham, United Kingdom

Restaurant Team Member

Haven · Presthaven Beach Resort, Shore Road, Gronant, Prestatyn North Wales LL19 9TT

Cast Member - Seasonal

Cineplex · Toronto, Canada

Part time Cast Member Northgate

Cineplex · Winnipeg, Canada

Restaurant Team Member, Evening Shift - Unit 891

Wab · 215 N Interstate 45 Service Rd Hutchins TX 75141-4

Team Member 6am-2pm

Circle K · Store 2708935 Albuquerque NM