MLOps Engineer
Scale.jobs
midbackenddevops Atlanta, GA 2 days ago via LinkedIn
See how well this job matches your profile
Sign up to get an AI match score and generate a tailored application in seconds.
Get your match scoreTags
MLOpsPythonCI/CDKubernetesDockerTerraformAWSGCPMonitoringFeature Stores
About the role
Role Overview
The MLOps Engineer bridges machine learning development and production operations by designing, building, and maintaining automated infrastructure and pipelines for reliable ML model deployment, monitoring, and scaling.
Responsibilities
- Design and implement automated CI/CD pipelines to move ML models from experimental code to production deployments.
- Build and maintain feature stores and training/data pipelines (e.g., Feast, Spark, dbt).
- Develop and manage model serving infrastructure using Kubernetes (EKS/GKE) and ML serving tools such as KServe, Triton Inference Server, or BentoML.
- Implement monitoring, logging, and alerting to detect data drift, concept drift, and performance regressions using Prometheus and Grafana.
- Optimize inference performance using techniques like quantization, pruning, or ONNX Runtime integration.
- Establish model governance practices, including model registry management, versioning, and automated audit trails.
Requirements
- 3–6 years of experience in DevOps/SRE/Software Engineering, including at least 2 years focused on MLOps.
- Strong Python skills.
- Solid experience with Docker and Kubernetes.
- Hands-on experience with orchestration/ML workflow tools: Kubeflow, Airflow, MLflow, or Prefect.
- Deep understanding of cloud infrastructure (AWS or GCP) and Infrastructure as Code with Terraform.
- Familiarity with ML frameworks and data libraries: PyTorch, TensorFlow, scikit-learn.
Bonus
- Experience in LLMOps, vector databases, and/or Triton Inference Server.
About Scale.jobs
Scale.jobs is a company in the technology and recruiting space that helps connect talent with opportunities. The role described focuses on MLOps engineering, spanning machine learning deployment, monitoring, and production infrastructure at scale.
Scraped 6/18/2026