Role Overview

ProfitSolv is hiring an AI Data Engineer (Remote) to build a greenfield centralized data platform on AWS. You’ll combine data engineering and AI engineering—e.g., writing dbt models in the morning and designing a RAG pipeline in the afternoon.

Responsibilities

Build and maintain a Medallion Lakehouse (Bronze/Silver/Gold) on S3 using:
- Apache Iceberg, AWS Glue Data Catalog, dbt Cloud (Athena adapter)
Configure and manage AWS DMS for ongoing CDC from ~1,000 SQL Server instances
Ingest data using Amazon ECS Fargate tasks for SaaS API ingestion
Orchestrate pipelines with Amazon MWAA (Airflow)
Develop dbt Cloud transformations from Bronze → Silver → Gold
Define business metrics in the dbt Semantic Layer for BI tools and AI agents
Manage Redshift Serverless + Spectrum as the read engine
Tune Iceberg table layouts, partitioning, and compaction for performance
Implement Lake Formation tag-based governance for multi-product data isolation
Onboard acquisitions to the platform in weeks, not months
Build batch embedding pipelines for legal documents and client records
Manage vector storage using OpenSearch Serverless or pgvector on Aurora
Design and ship RAG pipelines for legal domain use cases (chunking, retrieval ranking, context management)
Build MCP servers exposing the dbt Semantic Layer and platform APIs to AI agents (Claude, internal copilots, customer-facing features)
Ensure compliance/security/governance with IAM roles, encryption policies, and metadata cataloging

Requirements

5+ years of hands-on data engineering experience, focused on AWS (S3, Glue, Athena, Redshift or equivalent)
Production-grade dbt experience (Core or Cloud), including testing, macros, and documentation best practices
Experience implementing CDC patterns with AWS DMS, Debezium, or similar tools
Ability to design and operate production Airflow DAGs (MWAA or self-hosted)
Hands-on experience building at least one production-ready RAG pipeline (chunking, embeddings, vector storage, retrieval)
Strong SQL (primary) and Python (data pipelines + AI workflows)
Working knowledge of TypeScript for MCP server development
Infrastructure-as-code experience (e.g., Terraform)
Comfortable making architectural decisions independently in a high-autonomy environment

Nice to Haves

Experience building MCP servers or similar AI tool-use integrations
dbt Cloud Semantic Layer / MetricFlow experience
Experience with Apache Iceberg (explicitly mentioned as desirable)

Ai Data Engineer - Remote

Tags

About the role

Role Overview

Responsibilities

Requirements

Nice to Haves

About ProfitSolv