Senior MLOps / ML Infrastructure Engineer
IMCS Group
full-remoteseniorcontractbackenddevops San Francisco Bay Area Yesterday via LinkedIn
See how well this job matches your profile
Sign up to get an AI match score and generate a tailored application in seconds.
Get your match scoreTags
MLOpsML InfrastructureDistributed SystemsPipeline AutomationGKETraining and ServingFeature EngineeringKnowledge DistillationML WorkflowsAutomation
About the role
Role Overview
Senior MLOps / ML Infrastructure Engineer (Contract: 6 months on W2, Remote) joining the core platform team. The role builds scalable, standardized ML workflows and infrastructure to accelerate experimentation and deployment across the organization.
Responsibilities
- Design, develop, and maintain scalable ML workflows and pipelines
- Build and improve ML infrastructure for training and serving, including GKE-based systems
- Automate end-to-end ML workflows to improve efficiency and reduce operational overhead
- Develop robust data sampling and feature generation platforms
- Standardize ML training, deployment, and knowledge distillation pipelines
- Partner with researchers and engineers to enable large-scale ML experimentation
- Drive foundational ML platform tooling and adoption across the org
Key Initiatives
- Scalable ML workflows/pipelines for large-scale ML systems
- Automation of end-to-end ML workflows
- GKE-based training and serving infrastructure
- Knowledge distillation and foundational training tooling
Qualifications
Must-have
- 5–10+ years experience in large-scale ML systems, MLOps, or ML infrastructure
- Strong expertise in ML workflows, distributed systems, and pipeline automation
- Experience with GKE and scalable ML training/serving platforms
- Ownership-driven, collaborative, pragmatic approach; strong communication
Preferred
- Experience at large-scale ML/AI companies (e.g., Google, Meta, Amazon, Microsoft)
- Hands-on MLE/ML infra experience (not purely theoretical or pure DevOps)
Success Metrics / KPIs
- Faster time-to-market for ML experiments
- Improved training efficiency and infrastructure uptime
- Pipeline reliability, cost optimization, and stable deployments
- High platform adoption and reduced onboarding time for ML workflows
Scraped 5/15/2026