xelys jobs xelys jobs

Network DevOps Engineer, RDMA Fabric Automation

Vultr

full-remoteseniorpermanentbackenddevopssecurity Anywhere in the World 2 days ago via WWR
90,000 - 130,000 USD/annual

See how well this job matches your profile

Sign up to get an AI match score and generate a tailored application in seconds.

Get your match score

Tags

RoCEv2RDMAAnsiblePythonCI/CDTelemetryPrometheusKafkaNetBoxLinux Networking

About the role

Role Overview

As a Network DevOps Engineer (RDMA Fabric Automation) at Vultr, you’ll help evolve, automate, and operate RoCE-based Ethernet fabrics. This role sits at the intersection of network engineering, operations, automation, and observability, where you’ll build the tooling and telemetry pipelines that keep fabrics fast, deterministic, and reliable at global scale.

Key Responsibilities

  • Automate deployment and operations of large-scale RDMA (RoCEv2) Ethernet fabrics across Vultr data centers.
  • Build Ansible and Python-based frameworks to provision, validate, and remediate underlay and overlay networks.
  • Integrate automation with Vultr source-of-truth systems (e.g., NetBox, OpsMill) for intent-driven configuration and validation.
  • Develop telemetry ingestion/correlation pipelines for real-time network health and performance metrics (e.g., gNMI, Prometheus, Kafka, custom collectors).
  • Collaborate with platform/orchestration/product engineering teams to optimize:
    • RDMA performance
    • PFC/ECN behavior
    • Path symmetry across fabrics
  • Implement CI/CD workflows for network configuration changes, including validation, pre-checks, and rollbacks.
  • Investigate complex networking behavior across layers (e.g., flow hashing, congestion domains, ECMP, overlay interactions).
  • Contribute to design of next-generation GPU/AI interconnect fabrics integrated into Vultr’s global network architecture.

Requirements

  • Strong understanding of modern data center networking: EVPN-VXLAN, BGP, MLAG, QoS, traffic engineering.
  • Deep familiarity with RoCEv2/RDMA transport tuning, ECN/PFC, and lossless Ethernet design.
  • Proven automation experience with Ansible, and programming in Python (plus experience with Golang, Rust, or PHP).
  • Experience with telemetry/monitoring stacks such as Prometheus, Grafana, Loki, ELK.
  • Experience integrating with network source-of-truth tools such as NetBox, Nautobot, OpsMill.
  • CI/CD familiarity with GitHub Actions, Jenkins, ArgoCD.
  • Strong Linux networking background (e.g., namespaces, netlink, system-level debugging).

Compensation

  • $90,000 - $130,000 (varies by location, years of experience, background, and skills).

About Vultr

Vultr is a high-performance cloud infrastructure company providing global Cloud Compute, Cloud GPU, Bare Metal, and Cloud Storage. Trusted by customers across 185 countries, Vultr operates 32 data center locations and focuses on making infrastructure accessible for enterprises and AI innovators. The company is privately held and operates with a mission-driven, high-growth engineering culture.

Scraped 6/11/2026

xelys jobs xelys jobs

Built for remote job seekers. Powered by AI.