xelys jobs xelys jobs

Senior MLOps / ML Infrastructure Engineer

IMCS Group

full-remoteseniorcontractbackenddevops San Francisco Bay Area Yesterday via LinkedIn

See how well this job matches your profile

Sign up to get an AI match score and generate a tailored application in seconds.

Get your match score

Tags

MLOpsML InfrastructureDistributed SystemsPipeline AutomationGKETraining and ServingFeature EngineeringKnowledge DistillationML WorkflowsAutomation

About the role

Role Overview

Senior MLOps / ML Infrastructure Engineer (Contract: 6 months on W2, Remote) joining the core platform team. The role builds scalable, standardized ML workflows and infrastructure to accelerate experimentation and deployment across the organization.

Responsibilities

  • Design, develop, and maintain scalable ML workflows and pipelines
  • Build and improve ML infrastructure for training and serving, including GKE-based systems
  • Automate end-to-end ML workflows to improve efficiency and reduce operational overhead
  • Develop robust data sampling and feature generation platforms
  • Standardize ML training, deployment, and knowledge distillation pipelines
  • Partner with researchers and engineers to enable large-scale ML experimentation
  • Drive foundational ML platform tooling and adoption across the org

Key Initiatives

  • Scalable ML workflows/pipelines for large-scale ML systems
  • Automation of end-to-end ML workflows
  • GKE-based training and serving infrastructure
  • Knowledge distillation and foundational training tooling

Qualifications

Must-have

  • 5–10+ years experience in large-scale ML systems, MLOps, or ML infrastructure
  • Strong expertise in ML workflows, distributed systems, and pipeline automation
  • Experience with GKE and scalable ML training/serving platforms
  • Ownership-driven, collaborative, pragmatic approach; strong communication

Preferred

  • Experience at large-scale ML/AI companies (e.g., Google, Meta, Amazon, Microsoft)
  • Hands-on MLE/ML infra experience (not purely theoretical or pure DevOps)

Success Metrics / KPIs

  • Faster time-to-market for ML experiments
  • Improved training efficiency and infrastructure uptime
  • Pipeline reliability, cost optimization, and stable deployments
  • High platform adoption and reduced onboarding time for ML workflows

Scraped 5/15/2026

xelys jobs xelys jobs

Built for remote job seekers. Powered by AI.