Machine Learning Operations (MLOps) Engineer
Jobgether
full-remoteseniorpermanentbackenddevops United States Yesterday via LinkedIn
See how well this job matches your profile
Sign up to get an AI match score and generate a tailored application in seconds.
Get your match scoreTags
MLOpsPythonDockerKubernetesMLflowCI/CDObservabilitySnowflakeDistributed SystemsModel Serving
About the role
Role: Machine Learning Operations (MLOps) Engineer
Build and operate the infrastructure behind a modern machine learning platform that powers production-grade AI systems. You’ll design scalable end-to-end workflows for training, deployment, and monitoring, working across software engineering, DevOps, and ML.
Responsibilities
- Design, build, and maintain infrastructure and tooling across the full ML lifecycle (training → experimentation → deployment → monitoring)
- Build scalable ML infrastructure for training, evaluation, deployment, and inference
- Develop and maintain containerized systems using Docker and Kubernetes for distributed workloads
- Build and orchestrate distributed training pipelines and workflow automation
- Use and maintain ML lifecycle tooling such as MLflow (experiment tracking, versioning, reproducibility)
- Own and optimize production inference systems (low-latency, high-availability model serving)
- Implement and maintain CI/CD pipelines for ML models (automated deployment, versioning, rollback)
- Build and manage data pipelines integrated with Snowflake and related data systems
- Implement observability (monitoring, logging, alerting) for model performance, drift detection, and system health
- Partner with ML engineers to improve platform usability, reliability, and developer experience
Requirements
- 5+ years of experience in software engineering, DevOps, or MLOps
- Strong software engineering background with hands-on ML infrastructure operation at scale
- Python proficiency and experience building production-grade distributed systems
- Hands-on experience with Docker and Kubernetes
- Proven experience designing and maintaining CI/CD for production systems
- Familiarity with MLflow (or equivalent)
- Experience with data platforms such as Snowflake (or similar cloud data warehouses)
- Strong system design fundamentals (microservices, APIs, scalable architectures)
- Excellent debugging/troubleshooting in complex distributed environments
- Strong collaboration skills with ML engineers and data teams
Education
- Bachelor’s or Master’s degree in Computer Science/Engineering (or equivalent practical experience)
Benefits (Highlights)
- Fully remote work; unlimited vacation policy
- Comprehensive healthcare coverage (medical, dental, vision)
- 401(k) plan, paid parental leave
- Startup environment focused on engineering impact and automation
About Jobgether
Jobgether is an employment marketplace that uses an AI-powered matching process to connect candidates with job opportunities. This listing is posted on Jobgether on behalf of a partner company, focused on building and scaling a production-grade machine learning platform.
Scraped 4/15/2026