LLM Fine-Tuning Engineer (Open-Weight Models / Secure Environments)
Job Description
We are hiring an LLM Fine-Tuning Engineer to help us adapt and evaluate open-weight language models for use in secure environments.
This role is focused on the full post-training lifecycle: dataset design, supervised fine-tuning, parameter-efficient tuning, synthetic data generation, evaluation, and deployment support. You will work closely with technical teams operating in advanced cybersecurity contexts, helping turn foundation models into reliable task-specific systems.
IMPORTANT NOTE: We are not looking for prompt engineers or “LLM whisperers.” We are looking for someone who understands post-training, data evaluation, and deployment deeply enough to make open-weight models reliable in real operational environments.
What You’ll Do
Fine-tune and adapt open-weight LLMs for specialized use cases in secure/local environments
Design, run, and compare different post-training approaches, including:
supervised fine-tuning (SFT)
parameter-efficient fine-tuning (LoRA / QLoRA and related methods)
preference tuning approaches such as DPO where appropriate
full fine-tuning when justified by the use case
Build and improve high-quality datasets for training and evaluation
Generate synthetic data and use it responsibly to expand coverage, improve robustness, and accelerate iteration
Assess model quality beyond headline metrics
Work with engineering teams to operationalize tuned models for local or restricted deployments
Collaborate with domain experts to translate real operational needs into measurable model requirements
What We’re Looking For
Strong experience fine-tuning LLMs or adjacent foundation models in production or serious research environments
Practical experience with multiple tuning approaches, ideally including SFT, LoRA, QLoRA
Experience building and curating datasets for post-training
Experience generating and validating synthetic data for model training
Strong Python skills and solid experience with:
PyTorch
Hugging Face Transformers
tokenization, data preprocessing and training pipelines
Good understanding of GPU constraints, memory/performance tradeoffs, quantization-aware workflows and practical training optimization
Strong debugging mindset and ability to investigate why a model improved, regressed, or failed
Nice to Have
Experience with secure, air-gapped, or otherwise restricted deployment environments
Experience deploying or serving tuned models
Background in cybersecurity, especially offensive security, vulnerability research
Experience with distributed training frameworks
How We Think About the Role
This is not a “prompt engineer” role, and it is not limited to running a few LoRA jobs. We want someone who can think deeply about:
when fine-tuning is the right tool versus prompting or orchestration
how to create data that improves behavior instead of just inflating metrics
how to evaluate models in ways that reflect real operational value
how to make open-weight models reliable in constrained environments