Backend Engineer - Remote
CodeGeniusRecruit
full-remotemidbackend United States Today via LinkedIn
See how well this job matches your profile
Sign up to get an AI match score and generate a tailored application in seconds.
Get your match scoreTags
Distributed SystemsPythonKubernetesDockerAWSGCPLLM InferenceAgent OrchestrationObservabilityBackend APIs
About the role
Role overview
Backend Engineer (Remote), part-time (30–40 hours/week). You’ll design and build backend/distributed infrastructure to train, deploy, and scale AI agents on high-performance compute environments.
What you’ll be doing
- Design, build, and optimize distributed infrastructure for AI agent training, deployment, and scaling
- Develop core backend systems: services, APIs, and orchestration layers for agent lifecycles
- Support tool execution, memory access, and multi-agent coordination
- Collaborate with research and applied AI teams to integrate model-serving pipelines and agent reasoning loops into production
- Build and maintain agent runtime infrastructure:
- task scheduling
- state management
- inter-agent communication
- execution reliability
- Implement monitoring, observability, and fault tolerance for long-running agent processes and distributed workflows
- Evaluate and improve performance across compute, networking, storage, and inference layers; identify and resolve bottlenecks
- Participate in synchronous collaboration sessions (4-hour windows, 2–3 times/week) to review architecture, troubleshoot distributed systems, and iterate on designs
Requirements
- Strong foundation in Computer Science, Software Engineering, or Systems Design
- Experience building large-scale distributed systems
- Proficiency in one or more backend/systems languages: Go, Rust, Python, C++, Java, Scala, C#, Kotlin, TypeScript/JavaScript
- Experience with cloud infrastructure (AWS, GCP, or Azure)
- Experience with containerization/orchestration: Docker and Kubernetes
- Strong production experience designing backend services, APIs, and distributed systems
- Knowledge of networking, data streaming, caching, and performance optimization in distributed systems
- Excellent collaboration and communication skills
- Ability to commit 30–40 hours/week, including required synchronous sessions
Nice to have
- Familiarity with LLM inference pipelines, agent frameworks, multi-agent architectures, or reinforcement learning environments
Scraped 4/9/2026