
AI Data Architect
Job Description
As a Data/AI Architect, you'll design and build data-driven cloud architectures on AWS — from S3 data lakes and Glue ETL pipelines to data warehouses and RAG-powered AI systems. You'll own the full data stack across a variety of industries and projects: one engagement you're designing a Redshift data warehouse with medallion architecture processing 31M transactions/month, the next you're building a Bedrock Knowledge Base with OpenSearch vector search. Real ownership, real variety.
What You'll Do:
- Design and build S3 data lakes with multi-zone organization, partitioning strategies, lifecycle policies, and encryption
- Implement medallion architecture (bronze/silver/gold) for data warehouses on Redshift, Snowflake, or Databricks
- Build AWS Glue ETL pipelines (Python Shell and Spark) with incremental extraction, Data Catalog management, and optimized Parquet output
- Design star/snowflake schemas, materialized views, and gold-layer models optimized for BI consumption (QuickSight, PowerBI)
- Configure data warehouse platforms — Redshift with Zero-ETL from Aurora, Snowflake with Snowpipe, Databricks with Delta Lake and Auto Loader
- Design RAG systems using Bedrock Knowledge Base with OpenSearch Serverless vector search and Titan Embeddings
- Architect document AI pipelines using Textract, Comprehend, and Bedrock for entity extraction
- Design SageMaker ML pipelines for training, Model Registry, and inference
- Lead data discovery sessions with client stakeholders and present architecture recommendations to technical and business audiences
- Mentor delivery team members on data architecture patterns and AWS data services
- Contribute to R&D projects evaluating emerging AWS data and AI capabilities
Required Skills:
- 5+ years professional IT experience, 2+ years professional AWS experience
- At least one AWS Professional-level certification (Solutions Architect Professional or Data Engineer Specialty preferred)
- Python for data pipelines (Glue jobs, Lambda, SageMaker scripts) and PySpark for Glue Spark jobs
- SQL and NoSQL on AWS — Aurora PostgreSQL, RDS PostgreSQL, DocumentDB, DynamoDB — including schema design and query optimization
- Data modeling — conceptual, logical, and physical models for AWS data platforms; normalized silver-layer schemas, denormalized star/snowflake gold-layer schemas, data dictionaries
- Dimensional modeling and medallion architecture (bronze/silver/gold) on Redshift, Snowflake, or Databricks, including materialized views and incremental refresh patterns
- AWS Glue ETL (Python Shell and Spark), Glue Data Catalog, and crawlers
- S3 data lake architecture with partitioning, lifecycle policies, and encryptions
Preferred:
- RAG systems with Bedrock Knowledge Base and OpenSearch Serverless vector search
- Amazon SageMaker for ML training, Model Registry, and inference
- AWS HealthLake, FHIR R4 transformation, and HIPAA-compliant data pipelines
- Document AI with Amazon Textract and Comprehend
- Amazon Athena, QuickSight, or PowerBI integration
- Terraform or CloudFormation for data infrastructure as code
- Step Functions, EventBridge, and Lambda for event-driven pipeline orchestration
As a Data/AI Architect, you'll design and build data-driven cloud architectures on AWS — from S3 data lakes and Glue ETL pipelines to data warehouses and RAG-powered AI systems. You'll own the full data stack across a variety of industries and projects: one engagement you're designing a Redshift data warehouse with medallion architecture processing 31M transactions/month, the next you're building a Bedrock Knowledge Base with OpenSearch vector search. Real ownership, real variety.
What You'll Do:
- Design and build S3 data lakes with multi-zone organization, partitioning strategies, lifecycle policies, and encryption
- Implement medallion architecture (bronze/silver/gold) for data warehouses on Redshift, Snowflake, or Databricks
- Build AWS Glue ETL pipelines (Python Shell and Spark) with incremental extraction, Data Catalog management, and optimized Parquet output
- Design star/snowflake schemas, materialized views, and gold-layer models optimized for BI consumption (QuickSight, PowerBI)
- Configure data warehouse platforms — Redshift with Zero-ETL from Aurora, Snowflake with Snowpipe, Databricks with Delta Lake and Auto Loader
- Design RAG systems using Bedrock Knowledge Base with OpenSearch Serverless vector search and Titan Embeddings
- Architect document AI pipelines using Textract, Comprehend, and Bedrock for entity extraction
- Design SageMaker ML pipelines for training, Model Registry, and inference
- Lead data discovery sessions with client stakeholders and present architecture recommendations to technical and business audiences
- Mentor delivery team members on data architecture patterns and AWS data services
- Contribute to R&D projects evaluating emerging AWS data and AI capabilities
Required Skills:
- 5+ years professional IT experience, 2+ years professional AWS experience
- At least one AWS Professional-level certification (Solutions Architect Professional or Data Engineer Specialty preferred)
- Python for data pipelines (Glue jobs, Lambda, SageMaker scripts) and PySpark for Glue Spark jobs
- SQL and NoSQL on AWS — Aurora PostgreSQL, RDS PostgreSQL, DocumentDB, DynamoDB — including schema design and query optimization
- Data modeling — conceptual, logical, and physical models for AWS data platforms; normalized silver-layer schemas, denormalized star/snowflake gold-layer schemas, data dictionaries
- Dimensional modeling and medallion architecture (bronze/silver/gold) on Redshift, Snowflake, or Databricks, including materialized views and incremental refresh patterns
- AWS Glue ETL (Python Shell and Spark), Glue Data Catalog, and crawlers
- S3 data lake architecture with partitioning, lifecycle policies, and encryptions
Preferred:
- RAG systems with Bedrock Knowledge Base and OpenSearch Serverless vector search
- Amazon SageMaker for ML training, Model Registry, and inference
- AWS HealthLake, FHIR R4 transformation, and HIPAA-compliant data pipelines
- Document AI with Amazon Textract and Comprehend
- Amazon Athena, QuickSight, or PowerBI integration
- Terraform or CloudFormation for data infrastructure as code
- Step Functions, EventBridge, and Lambda for event-driven pipeline orchestration