xelys jobs xelys jobs

DevOps Engineer

Careflow

full-remoteseniorpermanentdevopssecurity San Francisco, CA 3 days ago via LinkedIn

See how well this job matches your profile

Sign up to get an AI match score and generate a tailored application in seconds.

Get your match score

Tags

Google Cloud Platform (GCP)DevOpsSite Reliability Engineering (SRE)CI/CDObservabilityMonitoring & AlertingIncident ResponseSecurityReliability EngineeringDisaster Recovery

About the role

Role Overview

Careflow is hiring an experienced DevOps Engineer to own and improve its cloud infrastructure, security, observability, and operational reliability. You’ll work across the stack to keep the platform secure, scalable, performant, and highly available, while improving deployment, monitoring, incident response, and automation.

Responsibilities

Cloud Infrastructure & Operations

  • Manage and maintain the Google Cloud Platform (GCP) environment
  • Design and improve infrastructure for scalability, reliability, and cost efficiency
  • Own networking, compute, databases, storage, and other GCP services
  • Monitor system health and proactively address performance bottlenecks

Monitoring, Logging & Observability

  • Build and maintain centralized logging and monitoring
  • Create dashboards and alerts for system health, application performance, and business-critical workflows
  • Establish operational metrics and usage tracking
  • Lead incident response and root cause analysis

Security & Compliance

  • Implement and maintain security best practices for infrastructure and applications
  • Manage identity and access controls, secrets management, and environment security
  • Conduct security reviews and handle vulnerability remediation
  • Support compliance initiatives and audit readiness

CI/CD & Automation

  • Improve deployment pipelines and release processes
  • Automate infrastructure provisioning and operational workflows
  • Enhance development environments and deployment reliability
  • Reduce manual operational work through automation

Reliability Engineering

  • Improve uptime, resiliency, backup, and disaster recovery processes
  • Define service-level objectives (SLOs) and operational standards
  • Drive improvements in stability and performance

Cross-Functional Support

  • Partner with engineering, product, and leadership teams
  • Provide technical guidance on infrastructure and operational considerations
  • Participate in on-call and operational support rotation

Bonus Responsibilities

  • Troubleshoot and fix application-level issues when needed
  • Contribute code improvements and bug fixes across the platform
  • Assist with performance optimization and debugging

What Success Looks Like (First 90 Days)

  • Gain ownership of GCP infrastructure and environments
  • Establish visibility into performance, reliability, and usage metrics
  • Improve monitoring/alerting and incident response processes
  • Identify and address security and operational risks
  • Reduce infrastructure-related issues and deployment friction
  • Become a trusted technical resource for operational excellence

Requirements

  • 5+ years of DevOps, Site Reliability Engineering, Cloud Engineering, or related experience
  • Strong hands-on experience with GCP
  • Experience building and maintaining CI/CD pipelines
  • Strong understanding of infrastructure monitoring, logging, and alerting systems

Role Details

  • Employment type: Full-Time
  • Location: Fully Remote
  • Schedule: Flexible, with availability for Saturday coverage and an additional weekday day off
  • Reports to: Lead Architect

About Careflow

Careflow is a software company building and operating a cloud-based platform. The company focuses on reliability, security, scalability, and operational excellence as it grows its services in production environments.

Scraped 6/17/2026

xelys jobs xelys jobs

Built for remote job seekers. Powered by AI.