Job Description
PLEASE NOTE: This position requires an ACTIVE Top Secret/SCI Clearance with Polygraph. To be considered for this position, you MUST have an ACTIVE Clearance Level of Top Secret/SCI with Polygraph
Position Code: 10-SO0522-1
Location: Herndon, VA
- Design, develop, and maintain ETL/ELT pipelines that support data warehouse, analytics, and application needs. Must be experienced with large data sets [hundreds of thousands of records, GB and TB size data sets]
- Extract, transform, and load data from various sources into centralized storage solutions.
- Design and enhance search and discovery platforms across large volumes of structured and unstructured data
- Perform data ingestion, ETL, and integration across enterprise and multi-source environments
- Optimize ETL workflows for performance, scalability, and reliability.
- Conduct data validation, profiling, and quality checks to ensure accuracy and completeness.
- Troubleshoot and resolve data inconsistencies, pipeline failures, or performance bottlenecks.
- Build and maintain cloud-native solutions (AWS) aligned to secure and resilient architecture patterns
- Partner with mission operators, analysts, and senior stakeholders to define requirements and deliver mission-relevant analytics
- Translate mission needs into technical designs, architectures, and implementation roadmaps, ensuring alignment to operational objectives
- Deliver clear, compelling visualizations, dashboards, and executive-level briefings that communicate analytic insights and recommendations
- Provide technical leadership and mentorship, including hands-on development, code review, and team development
- Own delivery of analytic capabilities from concept through deployment, accreditation, and sustainment
- Support system accreditation, data governance, and security architecture, ensuring data integrity and compliance within classified environments
- Bachelor’s degree in Computer Science, Information Systems, Engineering, or related field (or equivalent experience).
- Minimum 6-8 years working in Linux Operating system with updating the system for efficient parallel processing, understanding memory, storage and processing data at scale
- Minimum 6-8 years in Object Oriented programming. Python is preferred software development language
- Minimum 6-8 years of demonstrated experience with applications in the Commercial Cloud Services (C2S) environment or an Amazon Web Services cloud environment. Willing to consider substituting C2S if candidate has a minimum 4-6 years of cloud computing technology to include Azure, Oracle, Google, etc.
- Minimum 4-6 years of demonstrated (Extract, Transform, Load - ETL) with large structured and unstructured raw data sets. Strong experience with ETL tools such as Informatica, Talend, SSIS, AWS Glue, or Azure Data Factory.
- Proficiency in SQL, including complex queries and query optimization.
- 6-8 Years of experience with AWS platform including understanding EC2, RCS instance types
- Strong understanding of data warehousing concepts, data modeling, and schema design.
- Hands-on experience with scripting languages such as Python, Bash, or PowerShell.
- Familiarity with relational and NoSQL databases.
- Experience using version control systems such as Git.
- Experience working with big data technologies (e.g., Spark, Hadoop, Databricks).
- Experience with transformer-based models (e.g., BERT) and modern NLP architectures
- Background in document exploitation, e-discovery, or large-scale search platforms
- Experience with multi-modal analytics (OCR, image recognition, text + image fusion)
- Familiarity with search technologies (Solr, Elasticsearch, Lucene)
- Experience with containerization and DevSecOps pipelines (Docker, CI/CD)
- Cloudera or similar big data certifications
- Experience developing risk scoring, anomaly detection, or predictive analytic models
