Back to jobs
Innovative Solutions

AI Data Architect

Reports into Rochester, NY$150K - $200KPosted 11 months ago
Full-timeonsite

Job Description

As a Data/AI Architect, you'll design and build data-driven cloud architectures on AWS — from S3 data lakes and Glue ETL pipelines to data warehouses and RAG-powered AI systems. You'll own the full data stack across a variety of industries and projects: one engagement you're designing a Redshift data warehouse with medallion architecture processing 31M transactions/month, the next you're building a Bedrock Knowledge Base with OpenSearch vector search. Real ownership, real variety.

 


What You'll Do:

  • Design and build S3 data lakes with multi-zone organization, partitioning strategies, lifecycle policies, and encryption
  • Implement medallion architecture (bronze/silver/gold) for data warehouses on Redshift, Snowflake, or Databricks
  • Build AWS Glue ETL pipelines (Python Shell and Spark) with incremental extraction, Data Catalog management, and optimized Parquet output
  • Design star/snowflake schemas, materialized views, and gold-layer models optimized for BI consumption (QuickSight, PowerBI)
  • Configure data warehouse platforms — Redshift with Zero-ETL from Aurora, Snowflake with Snowpipe, Databricks with Delta Lake and Auto Loader
  • Design RAG systems using Bedrock Knowledge Base with OpenSearch Serverless vector search and Titan Embeddings
  • Architect document AI pipelines using Textract, Comprehend, and Bedrock for entity extraction
  • Design SageMaker ML pipelines for training, Model Registry, and inference
  • Lead data discovery sessions with client stakeholders and present architecture recommendations to technical and business audiences
  • Mentor delivery team members on data architecture patterns and AWS data services
  • Contribute to R&D projects evaluating emerging AWS data and AI capabilities
 

Required Skills:

  • 5+ years professional IT experience, 2+ years professional AWS experience
  • At least one AWS Professional-level certification (Solutions Architect Professional or Data Engineer Specialty preferred)
  • Python for data pipelines (Glue jobs, Lambda, SageMaker scripts) and PySpark for Glue Spark jobs
  • SQL and NoSQL on AWS — Aurora PostgreSQL, RDS PostgreSQL, DocumentDB, DynamoDB — including schema design and query optimization
  • Data modeling — conceptual, logical, and physical models for AWS data platforms; normalized silver-layer schemas, denormalized star/snowflake gold-layer schemas, data dictionaries
  • Dimensional modeling and medallion architecture (bronze/silver/gold) on Redshift, Snowflake, or Databricks, including materialized views and incremental refresh patterns
  • AWS Glue ETL (Python Shell and Spark), Glue Data Catalog, and crawlers
  • S3 data lake architecture with partitioning, lifecycle policies, and encryptions
 

Preferred:

  • RAG systems with Bedrock Knowledge Base and OpenSearch Serverless vector search
  • Amazon SageMaker for ML training, Model Registry, and inference
  • AWS HealthLake, FHIR R4 transformation, and HIPAA-compliant data pipelines
  • Document AI with Amazon Textract and Comprehend
  • Amazon Athena, QuickSight, or PowerBI integration
  • Terraform or CloudFormation for data infrastructure as code
  • Step Functions, EventBridge, and Lambda for event-driven pipeline orchestration

As a Data/AI Architect, you'll design and build data-driven cloud architectures on AWS — from S3 data lakes and Glue ETL pipelines to data warehouses and RAG-powered AI systems. You'll own the full data stack across a variety of industries and projects: one engagement you're designing a Redshift data warehouse with medallion architecture processing 31M transactions/month, the next you're building a Bedrock Knowledge Base with OpenSearch vector search. Real ownership, real variety.

 


What You'll Do:

  • Design and build S3 data lakes with multi-zone organization, partitioning strategies, lifecycle policies, and encryption
  • Implement medallion architecture (bronze/silver/gold) for data warehouses on Redshift, Snowflake, or Databricks
  • Build AWS Glue ETL pipelines (Python Shell and Spark) with incremental extraction, Data Catalog management, and optimized Parquet output
  • Design star/snowflake schemas, materialized views, and gold-layer models optimized for BI consumption (QuickSight, PowerBI)
  • Configure data warehouse platforms — Redshift with Zero-ETL from Aurora, Snowflake with Snowpipe, Databricks with Delta Lake and Auto Loader
  • Design RAG systems using Bedrock Knowledge Base with OpenSearch Serverless vector search and Titan Embeddings
  • Architect document AI pipelines using Textract, Comprehend, and Bedrock for entity extraction
  • Design SageMaker ML pipelines for training, Model Registry, and inference
  • Lead data discovery sessions with client stakeholders and present architecture recommendations to technical and business audiences
  • Mentor delivery team members on data architecture patterns and AWS data services
  • Contribute to R&D projects evaluating emerging AWS data and AI capabilities
 

Required Skills:

  • 5+ years professional IT experience, 2+ years professional AWS experience
  • At least one AWS Professional-level certification (Solutions Architect Professional or Data Engineer Specialty preferred)
  • Python for data pipelines (Glue jobs, Lambda, SageMaker scripts) and PySpark for Glue Spark jobs
  • SQL and NoSQL on AWS — Aurora PostgreSQL, RDS PostgreSQL, DocumentDB, DynamoDB — including schema design and query optimization
  • Data modeling — conceptual, logical, and physical models for AWS data platforms; normalized silver-layer schemas, denormalized star/snowflake gold-layer schemas, data dictionaries
  • Dimensional modeling and medallion architecture (bronze/silver/gold) on Redshift, Snowflake, or Databricks, including materialized views and incremental refresh patterns
  • AWS Glue ETL (Python Shell and Spark), Glue Data Catalog, and crawlers
  • S3 data lake architecture with partitioning, lifecycle policies, and encryptions
 

Preferred:

  • RAG systems with Bedrock Knowledge Base and OpenSearch Serverless vector search
  • Amazon SageMaker for ML training, Model Registry, and inference
  • AWS HealthLake, FHIR R4 transformation, and HIPAA-compliant data pipelines
  • Document AI with Amazon Textract and Comprehend
  • Amazon Athena, QuickSight, or PowerBI integration
  • Terraform or CloudFormation for data infrastructure as code
  • Step Functions, EventBridge, and Lambda for event-driven pipeline orchestration

See Your Match Score

Sign up and Renata will show you how this job matches your skills and experience.

AI Data Architect at Innovative Solutions | Renata