xelys jobs xelys jobs

Machine Learning Operations (MLOps) Engineer

Jobgether

full-remoteseniorpermanentbackenddevops United States Yesterday via LinkedIn

See how well this job matches your profile

Sign up to get an AI match score and generate a tailored application in seconds.

Get your match score

Tags

MLOpsPythonDockerKubernetesMLflowCI/CDObservabilitySnowflakeDistributed SystemsModel Serving

About the role

Role: Machine Learning Operations (MLOps) Engineer

Build and operate the infrastructure behind a modern machine learning platform that powers production-grade AI systems. You’ll design scalable end-to-end workflows for training, deployment, and monitoring, working across software engineering, DevOps, and ML.

Responsibilities

  • Design, build, and maintain infrastructure and tooling across the full ML lifecycle (training → experimentation → deployment → monitoring)
  • Build scalable ML infrastructure for training, evaluation, deployment, and inference
  • Develop and maintain containerized systems using Docker and Kubernetes for distributed workloads
  • Build and orchestrate distributed training pipelines and workflow automation
  • Use and maintain ML lifecycle tooling such as MLflow (experiment tracking, versioning, reproducibility)
  • Own and optimize production inference systems (low-latency, high-availability model serving)
  • Implement and maintain CI/CD pipelines for ML models (automated deployment, versioning, rollback)
  • Build and manage data pipelines integrated with Snowflake and related data systems
  • Implement observability (monitoring, logging, alerting) for model performance, drift detection, and system health
  • Partner with ML engineers to improve platform usability, reliability, and developer experience

Requirements

  • 5+ years of experience in software engineering, DevOps, or MLOps
  • Strong software engineering background with hands-on ML infrastructure operation at scale
  • Python proficiency and experience building production-grade distributed systems
  • Hands-on experience with Docker and Kubernetes
  • Proven experience designing and maintaining CI/CD for production systems
  • Familiarity with MLflow (or equivalent)
  • Experience with data platforms such as Snowflake (or similar cloud data warehouses)
  • Strong system design fundamentals (microservices, APIs, scalable architectures)
  • Excellent debugging/troubleshooting in complex distributed environments
  • Strong collaboration skills with ML engineers and data teams

Education

  • Bachelor’s or Master’s degree in Computer Science/Engineering (or equivalent practical experience)

Benefits (Highlights)

  • Fully remote work; unlimited vacation policy
  • Comprehensive healthcare coverage (medical, dental, vision)
  • 401(k) plan, paid parental leave
  • Startup environment focused on engineering impact and automation

About Jobgether

Jobgether is an employment marketplace that uses an AI-powered matching process to connect candidates with job opportunities. This listing is posted on Jobgether on behalf of a partner company, focused on building and scaling a production-grade machine learning platform.

Scraped 4/15/2026

xelys jobs xelys jobs

Built for remote job seekers. Powered by AI.