About the role

Role Overview

Build and scale production machine learning services that bridge prototyping and highly available systems. You will work with product and data platform teams to deliver both traditional ML and modern LLM application patterns, including retrieval and agentic workflows.

Responsibilities

Develop, optimize, and maintain scalable ML pipelines for:
- model training and validation
- batch and streaming inference
Build and deploy LLM applications such as:
- Retrieval-Augmented Generation (RAG)
- agentic workflows
- fine-tuning scripts using state-of-the-art LLMs
Implement robust ML evaluation and testing to detect:
- regressions
- hallucinations
- performance degradation
Expose model inference via high-throughput, low-latency APIs with FastAPI or gRPC, collaborating with backend engineers
Establish automated MLOps for:
- model monitoring
- data drift detection
- CI/CD for ML assets
Optimize inference latency and GPU utilization using techniques and tools such as:
- quantization and pruning
- model compilation libraries (e.g., TensorRT, vLLM)

Requirements

3–6 years of experience as a Machine Learning Engineer or Software Engineer on production-grade AI systems
Strong proficiency in Python
Solid ML framework experience with PyTorch, scikit-learn, and Hugging Face Transformers
Hands-on experience with vector search databases (e.g., Pinecone, Qdrant, Milvus, pgvector)
Experience with orchestration/tools like LangChain or LlamaIndex
Understanding of relational and non-relational databases; experience building feature pipelines in SQL, pandas, or PySpark
Containerization and orchestration familiarity: Docker, Kubernetes
Cloud ML orchestration experience with AWS SageMaker, GCP Vertex AI, or Run:ai

Bonus

Experience with Triton Inference Server
Kubernetes-native ML tools (Kubeflow, KServe)
Contributions to open-source ML/LLM repositories

About Scale.jobs

Scale.jobs is a company operating in the AI and hiring space, focused on building and scaling software systems that leverage machine learning and LLM capabilities. The role description indicates they develop customer-facing products by embedding predictive intelligence and generative AI into production systems, supported by robust MLOps and scalable ML infrastructure.