xelys jobs xelys jobs

Principal MLOps Engineer

Raft

on-siteleadpermanentdevopsbackend United States 4 days ago via LinkedIn

See how well this job matches your profile

Sign up to get an AI match score and generate a tailored application in seconds.

Get your match score

Tags

MLOpsKubernetesDockerAWSAzureCI/CDModel ServingLLMsGPU InfrastructureMachine Learning Lifecycle

About the role

Role Overview

You will be a Principal MLOps Engineer helping design, deploy, and mature Raft’s end-to-end machine learning platform and MLOps infrastructure. Raft builds mission-critical AI/data platforms for the DoD, including low-latency pipelines, model lifecycle management, and secure production operations across cloud and constrained environments.

What You’ll Do

  • Design, build, and maintain secure, scalable MLOps infrastructure and deployment pipelines for production ML systems.
  • Mature internal ML platform and model lifecycle capabilities (model packaging, registry/catalog workflows, deployments, monitoring, operational support).
  • Deploy and manage ML workloads on Kubernetes, including GPU-enabled clusters.
  • Build and operate model serving/inference infrastructure for multiple ML types:
    • traditional ML, computer vision, speech/audio, and LLM-based systems
  • Create and maintain CI/CD workflows for ML services, model artifacts, and platform components.
  • Partner with ML engineers, software engineers, and product teams to move models from experimentation to reliable production.
  • Improve observability, reliability, security, and maintainability across ML infrastructure.
  • Evaluate and standardize runtime patterns, serving frameworks, and deployment architectures.
  • Contribute to infrastructure decisions across edge, on-prem, and cloud environments.
  • Support compliance-driven deployment practices and secure software supply chain requirements (defense environment).

What You’re Looking For

  • 7+ years hands-on experience in software engineering, platform engineering, DevOps, MLOps, or related roles.
  • 5+ years experience with Docker and Kubernetes in production environments.
  • 5+ years experience supporting enterprise cloud infrastructure/applications on AWS, Azure, or similar.
  • Strong experience provisioning and operating production infrastructure for ML systems (the posting continues beyond the excerpt).

Preferred/Implied Fit

  • Deep understanding of both production ML infrastructure and the practical needs of ML engineers.
  • Experience with secure production operations, monitoring/observability, and model lifecycle practices for AI/ML at scale.

Security/Eligibility

  • U.S. citizenship required and work must be conducted within the continental U.S.

About Raft

Raft is a customer-obsessed non-traditional defense tech company providing AI/ML and data solutions for U.S. military and government agencies. It focuses on autonomous data fusion and agentic AI, building cloud-native, large-scale distributed data and mission applications that support time-sensitive decision-making. The role is part of its work delivering mission-critical AI and data platforms for the Department of Defense.

Scraped 4/19/2026

xelys jobs xelys jobs

Built for remote job seekers. Powered by AI.