xelys jobs xelys jobs

Machine Learning Operations Engineer

K1X, Inc.

full-remoteseniorpermanentdatadevops United States Today via LinkedIn

See how well this job matches your profile

Sign up to get an AI match score and generate a tailored application in seconds.

Get your match score

Tags

MLOpsPythonDockerKubernetesMLflowNVIDIA Triton Inference ServerCI/CDSnowflakeDistributed SystemsMonitoring

About the role

Role Overview

K1X is hiring an experienced Machine Learning Operations (MLOps) Engineer to build and operate the ML infrastructure powering AI and machine learning across its products. This is a hands-on role focused on scalable systems, training/deployment pipelines, and tooling reliability in production.

Responsibilities

  • Design and build scalable ML infrastructure for training, evaluation, and deployment
  • Create and maintain containerized environments using Docker and Kubernetes
  • Build and manage distributed training pipelines and orchestration workflows
  • Implement ML lifecycle tooling, including MLflow for experiment tracking and reproducibility
  • Own and operate production inference systems, including NVIDIA Triton Inference Server
  • Design low-latency, high-availability model serving architectures
  • Implement CI/CD for ML deployment, versioning, and rollback strategies
  • Build and maintain data pipelines integrated with Snowflake
  • Provide monitoring, logging, and alerting for model performance, drift detection, and system health
  • Partner with ML Engineers to improve developer experience and accelerate delivery

Requirements

  • BS/MS in Computer Science/Engineering or equivalent experience
  • 5+ years in software engineering, DevOps, or MLOps
  • Strong Python proficiency for production-grade systems
  • Hands-on experience with Docker, Kubernetes, and distributed systems
  • Experience building and maintaining CI/CD pipelines
  • Familiarity with ML lifecycle tools such as MLflow (or similar)
  • Experience with cloud-based data platforms such as Snowflake
  • Strong system design fundamentals: APIs and microservices architectures
  • Proven ability to debug and troubleshoot issues across distributed systems

Nice-to-Haves

  • Experience with NVIDIA Triton Inference Server and inference infrastructure
  • Experience building large-scale GPU/distributed training infrastructure
  • Familiarity with feature stores, data versioning, and experiment tracking systems
  • Experience with NLP or document processing pipelines
  • Exposure to observability tools like Prometheus and Grafana
  • Experience in SaaS environments with high availability and performance needs
  • Strong bias toward automation, scalability, and continuous improvement
  • Collaborative cross-functional mindset with engineering and data teams

About K1X, Inc.

K1X, Inc. builds an all-digital K-1 experience by replacing legacy workflows with scalable software and AI-driven automation. The company is expanding its machine learning capabilities and investing in a production-grade ML platform for model development, deployment, and monitoring.

Scraped 4/9/2026

xelys jobs xelys jobs

Built for remote job seekers. Powered by AI.