Member of Technical Staff - Compilers

Palo AltoPosted 1 months ago

Full-timeremote

About Us

Architect is a frontier AI lab for chip design. We build AI models and tools for on-demand custom ASICs at scale. Our goal is to co-design custom ASICs alongside evolving ML workloads, and enable a new era of domain-specific chips that unlock capabilities impossible with current hardware paradigms. Born out of Stanford Research, our team blends AI with Silicon with a founding team from Anthropic, Google DeepMind, Meta SuperIntelligence, xAI, Apple and Intel.

We're looking for staff/principal-level compiler engineers with deep experience building code generation toolchains for custom AI accelerators. Ideal candidates have shipped production compilers at places like Apple, Google (XLA/TPU), Groq, Cerebras, Qualcomm, AMD, or similar.

What You'll Do

As a Member of the Technical Staff on the Compilers team at Architect, you'll own the compiler stack targeting our SIMD/VLIW NPU — from graph ingestion through code generation on production silicon. You'll work directly with the NPU architect to co-design the ISA, closing the loop between compiler needs and hardware decisions.

Own the compiler end-to-end: graph ingestion (ONNX, PyTorch) through IR optimization, AI-driven code generation, instruction scheduling, and register allocation for a SIMD/VLIW NPU.
Implement and own the memory management layer; for instance SW-managed on-chip scratchpad memory with the compiler handling data tiling, bank allocation, DMA scheduling, and double-buffering across SRAM banks.
Design and iterate on mid-end and backend optimization passes: operator fusion, loop transformations, vectorization, and software pipelining to close the gap between peak and achieved throughput.
Co-design the ISA and instruction encoding with the architect and silicon team. Feed real workload performance data back into architectural decisions.
Support quantization and mixed-precision lowering (32bit single-precision FP or INT, along with lower INT8/4, BF16, FP16/8/4 precisions) with correct numerics end-to-end.
Benchmark compiler output against cycle-accurate models, RTL simulation, and FPGA prototypes. Own QoR tracking.
Grow into a compiler team lead as the team scales.

What We'd Like to See

Qualifications & Skills:

Degree: Bachelor's, Master's, or PhD in Computer Science, Computer Engineering, or a closely related field.
Experience: 5+ years building compilers or code generation toolchains for custom accelerators. Must have targeted ML/AI hardware compiler experience, as general-purpose (GCC/LLVM for CPUs) is not sufficient.
Domain Background: Hands-on experience on at least one of: Apple Neural Engine compiler, Google XLA / Edge TPU / TPU codegen, Groq TSP compiler (spatial scheduling, IR dialect design), Cerebras compiler stack, Qualcomm Hexagon NN / AI Engine, AMD AIE / Vitis AI, or similar/equivalent custom accelerator compiler(s).
Backend Mechanics: Strong grasp of instruction scheduling, register allocation, and software pipelining — especially for SIMD/VLIW or spatial architectures.
ML Optimizations: Experience with tiling strategies, loop nest optimization, and operator fusion for ML workloads (such as convolution, attention, element-wise ops, reduction, transpositions, etc.).
SW-Managed Memory: Experience with scratchpad type memory allocation, data layout, DMA orchestration, and multi-buffering.
Coding: Strong C++. Python proficiency. Familiarity with MLIR or LLVM infrastructure.
Leadership: Ability to lead and grow the compiler team over time.

Bonus:

HW/SW co-design experience: defining ISA features, instruction encodings, or hardware interfaces driven by compiler needs.
IR design for ML accelerators (custom dialects, MLIR-based flows, or graph-level IRs like XLA HLO).
ML framework experience (PyTorch, TensorFlow) and portable graph formats (ONNX).
Experience benchmarking and profiling compiler output on real hardware, FPGA, or cycle-accurate simulators.
Understanding of ML inference systems and workload-level optimizations: FlashAttention, RadixAttention, PagedAttention, continuous batching, speculative decoding, KV cache management, and prefill/decode scheduling.
Contributions to open-source ML compiler projects (TVM, MLIR, Triton, XLA).
Domain-specific expertise: Track record on energy-efficient, high-performance HW accelerator bring-up.

What We Offer

Competitive salary and meaningful equity stake
Fast-paced startup with autonomy and visible impact
Cutting-edge challenges at the intersection of AI and silicon design
Direct ownership of the compiler stack as we scale

See Your Match Score

About Architect

More jobs at Architect

Member of Technical Staff - Software

Palo Alto

Member of Technical Staff - Formal Methods

Palo Alto

Member of Technical Staff - Applied AI

Palo Alto

General Application

Palo Alto

Member of Technical Staff - Microarchitect / RTL Design

Palo Alto

Member of Technical Staff - Architect

Palo Alto

Similar roles

Commissary Operations Team Member

TKC Holdings, Inc. · Fort Worth, Texas, United States

Commissary Operations Team Member

TKC Holdings, Inc. · Indianapolis, Indiana, United States

Member Support Advisor (12 month FTC)

prsformusic · Greater London, United Kingdom

Nights Team Member - Greggs

Moto Hospitality · Chippenham, United Kingdom

Back of House Team Member

atis · London, United Kingdom

1099 Medicare Membership Benefits Sales Manager - CT

Leadling · Windsor, Connecticut, United States

Member of Technical Staff - Compilers

Job Description

About Us

What You'll Do

What We'd Like to See

What We Offer

See Your Match Score

More jobs at Architect

Similar roles

More jobs at Architect

Similar roles