Ai Data Engineer - Remote
ProfitSolv
full-remoteseniorpermanentfullstack United States 2 days ago via LinkedIn
See how well this job matches your profile
Sign up to get an AI match score and generate a tailored application in seconds.
Get your match scoreTags
AWSApache Icebergdbtdbt Semantic LayerAirflow (MWAA)Change Data Capture (CDC)RAGRetrieval-Augmented GenerationOpenSearchTerraform
About the role
Role Overview
ProfitSolv is hiring an AI Data Engineer (Remote) to build a greenfield centralized data platform on AWS. You’ll combine data engineering and AI engineering—e.g., writing dbt models in the morning and designing a RAG pipeline in the afternoon.
Responsibilities
- Build and maintain a Medallion Lakehouse (Bronze/Silver/Gold) on S3 using:
- Apache Iceberg, AWS Glue Data Catalog, dbt Cloud (Athena adapter)
- Configure and manage AWS DMS for ongoing CDC from ~1,000 SQL Server instances
- Ingest data using Amazon ECS Fargate tasks for SaaS API ingestion
- Orchestrate pipelines with Amazon MWAA (Airflow)
- Develop dbt Cloud transformations from Bronze → Silver → Gold
- Define business metrics in the dbt Semantic Layer for BI tools and AI agents
- Manage Redshift Serverless + Spectrum as the read engine
- Tune Iceberg table layouts, partitioning, and compaction for performance
- Implement Lake Formation tag-based governance for multi-product data isolation
- Onboard acquisitions to the platform in weeks, not months
- Build batch embedding pipelines for legal documents and client records
- Manage vector storage using OpenSearch Serverless or pgvector on Aurora
- Design and ship RAG pipelines for legal domain use cases (chunking, retrieval ranking, context management)
- Build MCP servers exposing the dbt Semantic Layer and platform APIs to AI agents (Claude, internal copilots, customer-facing features)
- Ensure compliance/security/governance with IAM roles, encryption policies, and metadata cataloging
Requirements
- 5+ years of hands-on data engineering experience, focused on AWS (S3, Glue, Athena, Redshift or equivalent)
- Production-grade dbt experience (Core or Cloud), including testing, macros, and documentation best practices
- Experience implementing CDC patterns with AWS DMS, Debezium, or similar tools
- Ability to design and operate production Airflow DAGs (MWAA or self-hosted)
- Hands-on experience building at least one production-ready RAG pipeline (chunking, embeddings, vector storage, retrieval)
- Strong SQL (primary) and Python (data pipelines + AI workflows)
- Working knowledge of TypeScript for MCP server development
- Infrastructure-as-code experience (e.g., Terraform)
- Comfortable making architectural decisions independently in a high-autonomy environment
Nice to Haves
- Experience building MCP servers or similar AI tool-use integrations
- dbt Cloud Semantic Layer / MetricFlow experience
- Experience with Apache Iceberg (explicitly mentioned as desirable)
About ProfitSolv
ProfitSolv is a SaaS business services provider focused on the legal and accounting industry. The company builds platforms that unify data across portfolios and enable AI-driven experiences for customers.
Scraped 4/9/2026