Role overview

MLOps / Infrastructure Engineer (Remote, U.S.-based) You’ll build and operate infrastructure for a real-time ML-powered content moderation system that detects and triages abuse, threats, and edge-case language. The role is hands-on and sits at the intersection of machine learning, systems, and product delivery, partnering with ML engineers, researchers, and clients.

Responsibilities

Design and maintain cloud infrastructure on GCP or AWS for:
- real-time model serving
- data ingestion and evaluation workflows
Deploy and optimize APIs for low-latency ML model access and embedding search systems
Manage the end-to-end training data flow (sourcing, cleaning, preparing for model consumption) with a focus on accuracy, scalability, and efficiency
Build observability tooling for production ML pipelines (latency, error rates, request volumes, drift)
Automate model deployment, retraining, and evaluation pipelines using CI/CD for ML
Help package models for serving alongside ML engineers
Manage and optimize vector databases and semantic search infrastructure (e.g., Pinecone, FAISS, Vertex Matching Engine)
Ensure security, compliance, and uptime for safety-critical infrastructure

Requirements

3–8 years experience deploying ML systems or high-availability backend systems
Shipped and maintained production infrastructure at scale, supporting ML workflows
Experience with GCP, AWS, or similar platforms (including managed ML services)
Proficient with Terraform, Docker, Kubernetes (or similar infrastructure tools)
Understands performance tradeoffs for model serving and embedding search pipelines
Ability to collaborate cross-functionally with ML, security, and product teams
Builder mindset and comfort working in ambiguous environments

Nice to have

Vector databases / ANN systems, ideally on GCP or AWS
Experience serving LLMs or embedding-based models via API
Monitoring/logging/metrics tools (e.g., Prometheus, Grafana, Sentry)
Familiarity with trust & safety, abuse detection, or policy enforcement systems

First 3 months success criteria

Deployed and monitored a real-time ML inference system with clear observability
Implemented an API with <200ms latency for embedding/classifier inference
Streamlined deployment and retraining workflows with ML engineers
Built logging/monitoring to understand performance and classifier behavior

Compensation & benefits

$130K–$230K base salary (dependent on experience and location)
Performance-based annual bonus
Professional development support (education, conferences, training)
Fully remote, U.S.-based
Comprehensive health, dental, and vision; generous PTO
401(k) retirement plan

MLOps / Infrastructure Engineer

Tags

About the role