Job Description
About Hark
Hark is an artificial intelligence company building advanced, personalized intelligence. One that is proactive, multimodal, and capable of interacting with the world through speech, text, vision, and persistent memory.
We're pairing that intelligence with next-generation hardware to create a universal interface between humans and machines. While today's AI largely operates through chat boxes and decade-old devices, Hark is focused on what comes next: agentic systems that interact naturally with people and the real world.
To get there, we're developing multimodal models and next-generation AI hardware together - designed from the ground up as a single, unified interface for a new era of intelligent systems.
About the Role
We're hiring a Member of Technical Staff (Real-Time Audio) to join our Product Engineering team. Hark’s voice agent holds real-time, full-duplex conversations with people in homes, cars, and noisy rooms. That experience is only as good as the audio underneath it.
This role owns the real-time audio that makes conversations feel natural (echo cancellation, noise suppression, and voice activity detection) as production code in our live client. This is not a research role and not a DSP theory role. We're looking for someone who can do both: understand the signal processing and ship the code.
Responsibilities
- Own audio quality on the client: echo, self-interruption, dropouts, and clipping
- Build and tune the browser audio pipeline with the Web Audio API, AudioWorklet, and getUserMedia constraints
- Work the WebRTC audio path end to end: AEC, noise suppression, and VAD
- Ship DSP to the client as C++/Rust compiled to WebAssembly, and as TypeScript in the audio pipeline
- Tune endpointing, interruption, and turn-taking so the agent listens like a person
- Reduce conversational latency and artifacts across the streaming pipeline
- Work in our React/TypeScript client where audio meets the UI
- Manage features end-to-end from prototyping through production
- Collaborate with designers, platform engineers, and our speech team.
Requirements
- 5+ years of software engineering experience
- Shipped real-time audio into a product used by real users
- Hands-on experience with WebRTC, AEC (echo cancellation), noise suppression, and VAD
- Strong DSP fundamentals: adaptive filtering, STFT, resampling, and gain control
- C/C++ or Rust for production DSP, and experience shipping it to the browser via WebAssembly
- Working knowledge of the browser audio stack: Web Audio API, AudioWorklet, and MediaStream constraints
- Comfort with latency, buffering, and sample rates in a streaming audio pipeline
- Owns features end-to-end and works comfortably in a shared production codebase.
Bonus Qualifications
- Experience working at a voice, speech, or video-conferencing company
- ML for audio: noise suppression, VAD, or source separation (e.g. RNNoise, DeepFilterNet, Silero VAD), and on-device inference (ONNX Runtime, Core ML)
- Familiarity with WebRTC internals (the Audio Processing Module, AEC3, Opus) and voice-agent frameworks (LiveKit, Pipecat)
- TypeScript and React, and comfort working across the product frontend
- Experience with target-speaker isolation, diarization, or barge-in and turn-detection systems for conversational AI.
Compensation
The US base salary range for this full-time position is between $170,000–$400,000 annually.
The pay offered for this position may vary based on several individual factors, including job-related knowledge, skills, and experience. The total compensation package may also include additional components/benefits depending on the specific role. This information will be shared if an employment offer is extended.