Senior Site Reliability Engineer
DriveWealth
See how well this job matches your profile
Sign up to get an AI match score and generate a tailored application in seconds.
Get your match scoreTags
About the role
Role Overview
As a Senior Site Reliability Engineer, you will design and modernize the Brokerage-as-a-Service platform’s reliability, scalability, and self-healing capabilities. The role focuses on reducing toil by building internal SRE platforms, automating workflows, and ensuring the Kubernetes-based ecosystem can handle global market demand.
Responsibilities
- Engineering & Automation: Build internal tools and SRE platforms to eliminate repetitive tasks and improve developer velocity.
- Infrastructure as Code: Maintain modular, reusable Terraform and manage GitOps via ArgoCD.
- Observability & Reliability:
- Implement OpenTelemetry standards.
- Use the Grafana stack (Alloy, Loki, Tempo, Mimir) for deep system health insights.
- Define and manage SLIs, SLOs, and error budgets.
- Platform Governance: Review architecture and Kubernetes metrics to drive high availability, capacity planning, and cost optimization across AWS regions.
- Incident Engineering: Lead incident response, perform complex root-cause analysis (RCA), and promote a blameless post-mortem culture.
- Collaboration: Partner with engineering teams to adopt new tools, security standards, and reliability best practices.
Requirements
- Linux & Networking: Strong Linux administration and deep TCP/IP, OSI model, DNS knowledge; capable of advanced network troubleshooting.
- FinTech/Regulated Environment: Experience in regulated financial environments or with FIX/API connectivity.
- Production Kubernetes: Hands-on experience with production clusters (RBAC, autoscaling, Helm, multi-cluster patterns).
- AWS Cloud Native: Strong AWS fundamentals, security, and high-availability patterns; automation with boto3 and AWS CLI.
- CI/CD & GitOps: Experience operating secure automated delivery pipelines and GitOps workflows with ArgoCD.
- Programming/Scripting: Proficiency in Python or Golang plus Bash and Ansible.
- Security Mindset: Experience with secrets management, vulnerability scanning, and software supply chain security.
- AI Tooling (Nice to Have): Familiarity with using LLMs / Public MCPs / Bedrock Agents to enhance SRE workflows.
Location / On-call
- New York, NY
- Includes critical on-call responsibilities to support 24/7 global operations (primary mission is reducing manual intervention).
About DriveWealth
DriveWealth is a global B2B financial technology company focused on democratizing access to financial independence. Through an API-based platform, it enables partners to deliver seamless investing and trading experiences worldwide, including equities, mutual funds, ETFs, fixed income, and options. The company operates with both fintech agility and Wall Street-level stability, discipline, and regulatory compliance.
Scraped 4/24/2026