Job Description
The Opportunity:
We are looking for a Senior Machine Learning Engineer with strong technical expertise to design, build, and deploy scalable ML/AI solutions. In this role, you will be responsible for developing robust machine learning pipelines, optimizing model performance, and contributing to the design of ML/AI products that drive real business impact. You will collaborate closely with data scientists, product managers, and platform engineers to bring advanced ML capabilities into production.
Key Responsibilities:
- Lead and mentor a team of ML engineers, guiding technical decisions, best practices, and development processes.
- Develop and train machine learning models, including feature engineering, model selection, hyperparameter tuning, and evaluation.
- Drive the transition of ML models from proof-of-concept to production deployment.
- Design, build, and maintain model deployment pipelines, including CI/CD workflows, model versioning, and automated testing.
- Develop and maintain robust monitoring systems to track model performance, data drift, feature quality, and system health.
- Design and develop backend services and APIs to integrate ML models with product systems and user-facing applications.
- Apply strong MLOps principles to ensure reliability, reproducibility, and operational excellence of ML systems.
- Collaborate with cross-functional teams (engineering, product, data, analytics) to deliver impactful, high-quality ML solutions.
Qualifications
- Master’s degree in computer science, Machine Learning, Data Science, or related field.
- 10+ years of experience in ML engineering, model development, or related technical roles.
- Proven experience training, developing, and deploying ML models in real-world production environments.
- Experience leading or mentoring engineering teams and driving complex ML projects to successful production deployment.
- Strong experience with API development (e.g., FastAPI, Flask, Django) and backend service design.
- Proficiency in Python and standard ML frameworks (Scikit-learn, TensorFlow, PyTorch).
- Strong understanding of MLOps tools and practices, such as MLflow, Kubeflow, Airflow, Prefect, Docker, Kubernetes.
- Experience building feature pipelines, data processing workflows, and optimizing model performance.
- Experience with monitoring tools and performance alerting (e.g., Prometheus, Grafana, custom monitoring).
- Preferred:
- Experience with distributed computing frameworks (Spark, Ray, Dask).
- Familiarity with microservices, event-driven architectures, or message queues.
- Knowledge of cloud platforms (AWS/GCP) or platform-agnostic orchestration tools.
- Experience working in a fast-paced, experimentation-driven environment.
