xelys jobs xelys jobs

Machine Learning Engineer

Scale.jobs

hybridmidpermanentbackenddata San Francisco, CA Today via LinkedIn

See how well this job matches your profile

Sign up to get an AI match score and generate a tailored application in seconds.

Get your match score

Tags

Machine LearningPythonPyTorchHugging Face TransformersRAGLLM InferenceMLOpsFastAPIgRPCVector Search

About the role

Role Overview

Build and scale production machine learning services that bridge prototyping and highly available systems. You will work with product and data platform teams to deliver both traditional ML and modern LLM application patterns, including retrieval and agentic workflows.

Responsibilities

  • Develop, optimize, and maintain scalable ML pipelines for:
    • model training and validation
    • batch and streaming inference
  • Build and deploy LLM applications such as:
    • Retrieval-Augmented Generation (RAG)
    • agentic workflows
    • fine-tuning scripts using state-of-the-art LLMs
  • Implement robust ML evaluation and testing to detect:
    • regressions
    • hallucinations
    • performance degradation
  • Expose model inference via high-throughput, low-latency APIs with FastAPI or gRPC, collaborating with backend engineers
  • Establish automated MLOps for:
    • model monitoring
    • data drift detection
    • CI/CD for ML assets
  • Optimize inference latency and GPU utilization using techniques and tools such as:
    • quantization and pruning
    • model compilation libraries (e.g., TensorRT, vLLM)

Requirements

  • 3–6 years of experience as a Machine Learning Engineer or Software Engineer on production-grade AI systems
  • Strong proficiency in Python
  • Solid ML framework experience with PyTorch, scikit-learn, and Hugging Face Transformers
  • Hands-on experience with vector search databases (e.g., Pinecone, Qdrant, Milvus, pgvector)
  • Experience with orchestration/tools like LangChain or LlamaIndex
  • Understanding of relational and non-relational databases; experience building feature pipelines in SQL, pandas, or PySpark
  • Containerization and orchestration familiarity: Docker, Kubernetes
  • Cloud ML orchestration experience with AWS SageMaker, GCP Vertex AI, or Run:ai

Bonus

  • Experience with Triton Inference Server
  • Kubernetes-native ML tools (Kubeflow, KServe)
  • Contributions to open-source ML/LLM repositories

About Scale.jobs

Scale.jobs is a company operating in the AI and hiring space, focused on building and scaling software systems that leverage machine learning and LLM capabilities. The role description indicates they develop customer-facing products by embedding predictive intelligence and generative AI into production systems, supported by robust MLOps and scalable ML infrastructure.

Scraped 6/19/2026

xelys jobs xelys jobs

Built for remote job seekers. Powered by AI.