Principal MLOps Engineer
Raft
See how well this job matches your profile
Sign up to get an AI match score and generate a tailored application in seconds.
Get your match scoreTags
About the role
Role Overview
You will be a Principal MLOps Engineer helping design, deploy, and mature Raft’s end-to-end machine learning platform and MLOps infrastructure. Raft builds mission-critical AI/data platforms for the DoD, including low-latency pipelines, model lifecycle management, and secure production operations across cloud and constrained environments.
What You’ll Do
- Design, build, and maintain secure, scalable MLOps infrastructure and deployment pipelines for production ML systems.
- Mature internal ML platform and model lifecycle capabilities (model packaging, registry/catalog workflows, deployments, monitoring, operational support).
- Deploy and manage ML workloads on Kubernetes, including GPU-enabled clusters.
- Build and operate model serving/inference infrastructure for multiple ML types:
- traditional ML, computer vision, speech/audio, and LLM-based systems
- Create and maintain CI/CD workflows for ML services, model artifacts, and platform components.
- Partner with ML engineers, software engineers, and product teams to move models from experimentation to reliable production.
- Improve observability, reliability, security, and maintainability across ML infrastructure.
- Evaluate and standardize runtime patterns, serving frameworks, and deployment architectures.
- Contribute to infrastructure decisions across edge, on-prem, and cloud environments.
- Support compliance-driven deployment practices and secure software supply chain requirements (defense environment).
What You’re Looking For
- 7+ years hands-on experience in software engineering, platform engineering, DevOps, MLOps, or related roles.
- 5+ years experience with Docker and Kubernetes in production environments.
- 5+ years experience supporting enterprise cloud infrastructure/applications on AWS, Azure, or similar.
- Strong experience provisioning and operating production infrastructure for ML systems (the posting continues beyond the excerpt).
Preferred/Implied Fit
- Deep understanding of both production ML infrastructure and the practical needs of ML engineers.
- Experience with secure production operations, monitoring/observability, and model lifecycle practices for AI/ML at scale.
Security/Eligibility
- U.S. citizenship required and work must be conducted within the continental U.S.
About Raft
Raft is a customer-obsessed non-traditional defense tech company providing AI/ML and data solutions for U.S. military and government agencies. It focuses on autonomous data fusion and agentic AI, building cloud-native, large-scale distributed data and mission applications that support time-sensitive decision-making. The role is part of its work delivering mission-critical AI and data platforms for the Department of Defense.
Scraped 4/19/2026