About the role

Role Overview

As an MLOps Engineer, you will bridge machine learning research with robust, production-grade systems. You’ll own the infrastructure, pipelines, and CI/CD workflows needed to deploy, monitor, and scale ML models across the organization, ensuring reproducible training and production inference that meets latency and reliability SLAs.

Responsibilities

Design, build, and maintain ML pipelines for automated retraining and batch inference (e.g., Kubeflow, Airflow, Prefect)
Develop and manage feature stores (e.g., Feast or Tecton) to keep feature engineering consistent between offline training and online serving
Deploy models as high-throughput, low-latency microservices using NVIDIA Triton, KServe, or FastAPI
Implement monitoring and alerting for model drift, data quality, and system performance using Prometheus, Grafana, and Evidently AI
Containerize ML workloads with Docker and orchestrate them on Kubernetes across multi-tenant environments
Build CI/CD pipelines for automated testing, integration, and deployment (e.g., GitOps, GitHub Actions, GitLab CI)

Requirements

3–6 years of experience in DevOps, MLOps, or Software Engineering, with a strong focus on ML infrastructure
Proficiency in Python and familiarity with shell scripting
Experience with Infrastructure as Code using Terraform
Cloud platform familiarity: AWS, GCP, or Azure
Hands-on experience with Kubernetes and containerized application orchestration at scale
Familiarity with ML lifecycle tools such as MLflow, Weights & Biases, or SageMaker Pipelines
Strong software engineering practices: version control, automated testing, and code review

Bonus

Experience deploying LLMs (e.g., vLLM, Ollama)
Experience with distributed training frameworks (e.g., Ray, Spark)
BS/MS in Computer Science or related field

About Scale.jobs

Scale.jobs is hiring for an MLOps role focused on building production-grade machine learning infrastructure. The position centers on bridging ML research and operational systems by owning pipelines, deployment, monitoring, and scaling of ML models.