DevOps Engineer
Drexel University
See how well this job matches your profile
Sign up to get an AI match score and generate a tailored application in seconds.
Get your match scoreTags
About the role
Role Overview
The DevOps Engineer will help build and operate Drexel URCF’s new shared computing platform for GPU-accelerated workloads, including AI model training. The platform is actively under development and is transitioning toward container-native tools and workflows from a traditional HPC environment.
Responsibilities
- Automation & cluster operations: Develop and maintain automation for provisioning, configuring, and managing the cluster (e.g., Ansible, Warewulf, Kubernetes manifests, shell scripting).
- Kubernetes platform layer: Contribute to Kubernetes networking, storage integration, security policies, and workload orchestration.
- Storage infrastructure & integrations: Help build storage systems including iRODS and Globus/Globus Connect Server for data transfer, plus integrations between storage and compute.
- End-to-end troubleshooting: Diagnose issues across the stack, from bare-metal boot problems to container orchestration bugs.
- Documentation: Write and maintain operational and user-facing documentation.
- Coordinate with IT: Work with Drexel IT on shared infrastructure topics such as networking, DNS, and firewall rules.
- User-facing portal: Contribute to web application development for a portal supporting project management, permissions, and usage tracking.
Requirements
- Education: Bachelor’s degree in Computer Science/Engineering or related field (or equivalent education and work experience).
- Experience: 1–3 years.
- Skills:
- Linux systems administration and/or configuration management
- Containers and/or container orchestration
- Comfort working in a terminal with Git, SSH, and a text editor
- Proficiency in at least one scripting language (Python or Bash)
- Strong written communication
- Ability to work independently and manage time in a fully remote role
Preferred Qualifications
- Kubernetes experience
- Bare-metal provisioning and/or HPC cluster management experience
- Familiarity with one or more of: Ansible, Warewulf, RKE2, Cilium, Kubeflow, Weka, iRODS, Globus (and general infrastructure-as-code)
- Web application development experience
- Experience in an academic/research computing environment
Contract / Funding Notes
- Grant-funded position through September 1, 2027 (employment contingent on continued funding).
About Drexel University
Drexel University’s University Research Computing Facility (URCF) is building a new shared, GPU-accelerated research computing platform for AI and other workloads. The platform combines GPU/CPU compute nodes, Kubernetes-based orchestration, and large-scale storage/metadata and data-transfer systems to support research projects.
Scraped 4/1/2026