Role Overview

Rackner is seeking an MLOps Engineer to deploy and manage the full lifecycle of production-grade AI/ML systems in a secure, mission-focused environment. This is not a research role—models must become reliable, deployable, and auditable.

Responsibilities

Own the ML lifecycle (end-to-end)
- Build and operate production ML pipelines
- Orchestrate workflows with Kubeflow, Airflow, or Argo
- Implement model versioning, lineage, and reproducibility standards
Operationalize AI/ML systems
- Deploy models into secure, constrained environments
- Move from experimentation to containerized pipelines and production systems
- Support batch and real-time inference architectures
Engineer for reliability
- Ensure reproducibility, auditability, stability
- Monitor model performance and system health using Prometheus, Grafana, and OpenTelemetry
- Detect and resolve issues like model drift and system degradation
Build cloud-native ML infrastructure
- Deploy and manage Kubernetes-based ML workloads
- Containerize pipelines with Docker
- Support scalable training and inference workflows
Establish data discipline
- Feature engineering and dataset preparation
- Data versioning/governance (e.g., lakeFS)
- Apply metadata and data management standards
Create repeatable systems
- Produce runbooks, playbooks, and documentation for operational sustainability

Requirements

Strong programming skills in Python
Experience deploying ML systems into production environments
Hands-on experience with:
- ML pipeline orchestration tools: Kubeflow, Airflow, or Argo
- Experiment tracking: MLflow or ClearML
Infrastructure & systems:
- Kubernetes and containerized systems (Docker)
- Familiarity with CI/CD pipelines
- Understanding of distributed systems and scalable architectures
ML application exposure (deployment/integration focus):
- LLMs / transformer-based models and/or
- Computer vision systems (e.g., YOLO, Faster R-CNN)
Reliability-first mindset and ability to operate in complex, evolving environments

Clearance Requirements

Active TS/SCI clearance strongly preferred
Secret clearance candidates may be considered and supported for upgrade
Non-cleared candidates must be U.S. citizens eligible to obtain/maintain clearance and able to work in a CAC-enabled/secure environment

Why This Role

Build production systems rather than prototypes
Work across ML, infrastructure, and deployment pipelines
Develop high-demand MLOps expertise in constrained, high-trust environments

MLOps Engineer — AI/ML Systems & Deployment (TS/SCI Preferred)

Tags

About the role

Role Overview

Responsibilities

Requirements

Clearance Requirements

Why This Role

About Rackner