Job Description
Description
Key Responsibilities
- Research and develop state-of-the-art voice synthesis models (e.g., TTS, voice cloning, speech-to-speech).
- Build and fine-tune models using frameworks like PyTorch and HuggingFace.
- Design training pipelines and datasets for scalable voice model training.
- Explore techniques for emotional expressiveness, multilingual synthesis, and speaker adaptation.
- Work closely with product and creative teams to ensure models meet quality and production constraints.
- Stay on top of academic and industrial trends in speech synthesis and related fields.
- Strong background in machine learning and deep learning, with focus on speech/audio.
- Hands-on experience with TTS, voice cloning, or related voice synthesis tasks.
- Proficiency with Python and PyTorch; experience with libraries like torchaudio, ESPnet, or similar.
- Experience training models at scale and working with large audio datasets.
- Familiarity with vocoders and transformer-based architectures.
- Strong problem-solving skills, ability to work autonomously in a remote-first environment.
- PhD degree in Computer Science/ Machine Learning and publications in top venues.
- Contributions to open-source speech research or participation in relevant benchmarks.
- Familiarity with adjacent areas like lip-syncing, audio-driven animation, or expressive speech control.
- Experience with voice datasets or proprietary pipelines.
About BRAHMA AI:
BRAHMA AI is the next generation of enterprise media technology formed through the integration of Prime Focus Technologies and Metaphysic. By combining CLEAR®, CLEAR® AI, ATMAN, and VAANI into one ecosystem, BRAHMA AI enables enterprises to manage, create, and distribute content with intelligence, security, and efficiency.
Proven, scalable, and enterprise-tested, BRAHMA AI is helping global organizations accelerate growth, efficiency, and creative impact in the AI-powered era.
