MLOps / Infrastructure Engineer
10a Labs
See how well this job matches your profile
Sign up to get an AI match score and generate a tailored application in seconds.
Get your match scoreTags
About the role
Role overview
MLOps / Infrastructure Engineer (Remote, U.S.-based) You’ll build and operate infrastructure for a real-time ML-powered content moderation system that detects and triages abuse, threats, and edge-case language. The role is hands-on and sits at the intersection of machine learning, systems, and product delivery, partnering with ML engineers, researchers, and clients.
Responsibilities
- Design and maintain cloud infrastructure on GCP or AWS for:
- real-time model serving
- data ingestion and evaluation workflows
- Deploy and optimize APIs for low-latency ML model access and embedding search systems
- Manage the end-to-end training data flow (sourcing, cleaning, preparing for model consumption) with a focus on accuracy, scalability, and efficiency
- Build observability tooling for production ML pipelines (latency, error rates, request volumes, drift)
- Automate model deployment, retraining, and evaluation pipelines using CI/CD for ML
- Help package models for serving alongside ML engineers
- Manage and optimize vector databases and semantic search infrastructure (e.g., Pinecone, FAISS, Vertex Matching Engine)
- Ensure security, compliance, and uptime for safety-critical infrastructure
Requirements
- 3–8 years experience deploying ML systems or high-availability backend systems
- Shipped and maintained production infrastructure at scale, supporting ML workflows
- Experience with GCP, AWS, or similar platforms (including managed ML services)
- Proficient with Terraform, Docker, Kubernetes (or similar infrastructure tools)
- Understands performance tradeoffs for model serving and embedding search pipelines
- Ability to collaborate cross-functionally with ML, security, and product teams
- Builder mindset and comfort working in ambiguous environments
Nice to have
- Vector databases / ANN systems, ideally on GCP or AWS
- Experience serving LLMs or embedding-based models via API
- Monitoring/logging/metrics tools (e.g., Prometheus, Grafana, Sentry)
- Familiarity with trust & safety, abuse detection, or policy enforcement systems
First 3 months success criteria
- Deployed and monitored a real-time ML inference system with clear observability
- Implemented an API with <200ms latency for embedding/classifier inference
- Streamlined deployment and retraining workflows with ML engineers
- Built logging/monitoring to understand performance and classifier behavior
Compensation & benefits
- $130K–$230K base salary (dependent on experience and location)
- Performance-based annual bonus
- Professional development support (education, conferences, training)
- Fully remote, U.S.-based
- Comprehensive health, dental, and vision; generous PTO
- 401(k) retirement plan
About 10a Labs
10a Labs provides a safety and threat-intelligence layer for frontier and enterprise AI teams. It supports adversarial red teaming, model evaluations, and intelligence collection to help organizations deploy AI systems safely and reliably.
Scraped 4/11/2026