About the role

Role Overview

Build, scale, and maintain core machine learning models and pipelines that power real-time decision-making systems. Translate applied research into high-throughput production services while meeting strict performance, latency, and reliability requirements.

Responsibilities

Design and deploy production-grade ML models for real-time inference, including:
- Recommendation systems
- Classification
- Predictive modeling
Build and optimize scalable data preprocessing and feature engineering pipelines using Apache Spark, Ray, or PySpark
Create and maintain automated CI/CD pipelines for ML training, evaluation, and deployment using tools such as Kubeflow, MLflow, or Argo Workflows
Implement real-time monitoring and alerting for:
- Model drift / concept drift
- System latency
Optimize inference performance using quantization, pruning, and GPU acceleration with TensorRT or ONNX
Collaborate with platform engineers to scale vector databases and feature stores for low-latency retrieval

Requirements

3–6 years professional experience as an ML Engineer (or Software Engineer) working with production-grade ML systems
Strong proficiency in Python and hands-on experience with core ML frameworks such as PyTorch, TensorFlow, or XGBoost
Experience deploying models to cloud environments (AWS, GCP, or Azure) using Docker and Kubernetes
Solid software engineering fundamentals (e.g., version control, unit testing, design patterns)
BS/MS in Computer Science, Data Science, Mathematics, or related field

Bonus

Experience with Triton Inference Server (including Triton CLI) or building custom Triton backends for high-throughput serving

About Scale.jobs

Scale.jobs is a hiring platform/company that connects candidates with roles in high-growth technology organizations. The posting emphasizes building production-grade machine learning systems, focusing on scalable model pipelines and real-time decision-making infrastructure.

Machine Learning Engineer

Tags