Principal MLOps Engineer
Raft
See how well this job matches your profile
Sign up to get an AI match score and generate a tailored application in seconds.
Get your match scoreTags
About the role
Role overview
Principal MLOps Engineer (U.S. based)
Raft is building mission-critical AI and data platforms for the Department of Defense (DoD). You will help design, deploy, and mature Raft’s end-to-end ML platform and the MLOps infrastructure that supports model development, evaluation, deployment, monitoring, and lifecycle management across cloud and constrained environments.
Responsibilities
- Design, build, and maintain secure, scalable MLOps infrastructure and deployment pipelines for production ML systems
- Mature internal ML platform capabilities across the model lifecycle (packaging, registry/catalog workflows, deployment, monitoring, operational support)
- Deploy and manage ML workloads on Kubernetes, including GPU-enabled clusters
- Build/maintain model serving and inference infrastructure for multiple ML use cases (traditional ML, computer vision, speech/audio, and LLM-based systems)
- Create and operate CI/CD workflows for ML services, model artifacts, and platform components
- Improve observability, reliability, security, and maintainability across ML infrastructure and services
- Standardize runtime patterns, serving frameworks, and deployment architectures for production ML workloads
- Contribute to infrastructure decisions across edge, on-prem, and cloud deployment environments
- Support compliance-driven deployment practices and secure software supply chain requirements (defense environment)
- Partner with ML engineers, software engineers, and product teams to move models from experimentation to reliable production deployment
Requirements
- 7+ years hands-on experience in software engineering, platform engineering, DevOps, MLOps, or related technical roles
- 5+ years experience with Docker and Kubernetes in production
- 5+ years experience supporting enterprise cloud infrastructure/applications in AWS, Azure, or similar environments
Nice-to-haves / additional signals
- Experience evaluating and standardizing deployment/serving runtime patterns for ML at scale
- Experience with secure production operations and compliance/supply-chain practices in regulated environments (defense-oriented)
- Familiarity with GPU infrastructure, model serving, and observability for ML systems
Location / eligibility
- U.S.-based role requiring U.S. citizenship and work performed within the continental U.S.
About Raft
Raft is a customer-obsessed, non-traditional defense tech company building AI/ML and data solutions for U.S. military and government agencies. The company focuses on autonomous data fusion and agentic AI, delivering cloud-native platforms and mission applications that support time-sensitive decision-making.
Scraped 4/23/2026