
Site Reliability Engineer - Disaster Recovery & Business Continuity
Job Description
About Charles River Associates
For over 50 years, Charles River Associates has been a premier consulting firm that offers employees a place to learn from a diverse group of consultants, industry experts, and academics. At CRA you will be exposed to leading minds who use economic, financial, and business analysis to solve complex world problems for an impressive roster of clients, including major law firms, Fortune 100 companies, and government agencies. Through a collegial environment, formal and informal training opportunities, and a broad array of professional development resources, your experience at CRA will open doors for you throughout your career.
The Information Technology (ITS) department at Charles River Associates is currently a team of more than 40 professionals dedicated to enhancing, maintaining, and developing the firm's technology infrastructure and security. The team is comprised of four functions:
- Service Delivery & Telecom
- Enterprise Application Solutions
- Infrastructure, Networking and Cloud Solutions
- Information Security
Information Technology staff are based in the Boston, Chicago, London, Munich, New York, Oakland, San Francisco, College Station and Washington, DC offices.
Mainly a Microsoft house, CRA is looking to maximize the performance of our on-premise systems and hybrid infrastructure, meaning experience with cloud technologies is essential for this role.
Position Overview
The Site Reliability Engineer (SRE) helps ensure CRA’s critical business services are reliable, scalable, and performant across on-premises and cloud environments. This role blends software engineering and operations practices to reduce manual toil through automation, improve service observability, and strengthen incident response. The SRE partners closely with infrastructure, security, application, and service delivery teams to define measurable reliability targets (SLIs/SLOs), implement resilient architectures, and drive continuous improvement through blameless post-incident learning.
Key Responsibilities
- Hands-on System Engineering experience with core enterprise infrastructure platforms and services, including Windows Server, VMware vSphere, VMware Site Recovery Manager (SRM), SAN technologies, and the Rubrik ecosystem, with the ability to understand dependencies, recovery workflows, and failure modes across on-premises and cloud environments
- Service Ownership & Reliability Targets: Partner with service owners to define and maintain service level indicators (SLIs) and service level objectives (SLOs) for availability, latency, and performance; track error budgets and reliability risk.
- Observability: Implement and continuously improve monitoring, logging, alerting, and dashboards to provide actionable, symptom-based signals and reduce mean time to detect/respond (MTTD/MTTR).
- Blameless Postmortems & Continuous Improvement: Facilitate post-incident reviews, identify root causes and contributing factors, and drive remediation items to completion; standardize learnings into runbooks and operational practices.
- DR Testing Program Build-Out: Design and launch a scalable DR testing program (scope, test types, cadence, success criteria, and evidence capture) in partnership with application, infrastructure, and security teams; maintain runbooks and lead regular tabletop and technical recovery exercises to validate RTO/RPO assumptions and improve recoverability.
- DR Readiness: Contribute to reliability architecture and disaster recovery readiness for key services, including dependency mapping, recovery testing inputs, and validation of recovery procedures.
- Cross-Functional Collaboration: Work day-to-day with infrastructure, network, cloud, security, and application teams to improve operational excellence, reliability culture, and shared ownership of production outcomes.
Relevant Skills & Experience
- Experience operating and improving reliability of production services (on-prem and/or cloud), including incident response, operational readiness, and service ownership
- Working knowledge of SRE concepts and practices such as SLIs/SLOs, error budgets, monitoring/alerting strategy, and blameless postmortems
- Experience with observability tooling and practices (logs, metrics, tracing, dashboards) and using data to drive reliability and performance improvements
- Experience with disaster recovery orchestration and recovery testing using VMware Site Recovery Manager (SRM) and Azure Site Recovery (ASR) (or similar public cloud DR services)
- Proven experience building and operating a DR testing program, including dependency mapping, test planning, coordination across stakeholders, execution of tabletop and technical failover tests, documentation of results, and tracking remediation actions to closure
- Strong cross-functional communication and teamwork skills; comfortable partnering with engineering, security, and operations teams to drive shared outcomes
- Ability to document and standardize operational procedures (runbooks), participate in on-call rotations, and manage multiple priorities in a fast-moving environment
Career Growth and Benefits
- CRA’s robust skills development programs, including a commitment to offering 100 hours of training annually through formal and informal programs, encourage you to thrive as an individual and team member. Beginning with research and analysis skill building, training continues with technical training, presentation skills, internal seminars, and career mentoring and performance coaching from an assigned senior colleague. Additional leadership and collaboration opportunities exist through internal firm development activities.
- We offer a comprehensive total rewards program including a superior benefits package, wellness programming to support physical, mental, emotional and financial well-being, and in-house immigration support for foreign nationals and international business travelers.
Work Location Flexibility
CRA creates a work environment that enables our colleagues to benefit from being together in the office to best deliver on our promise of career growth, mentorship and inclusivity. At the same time, we recognize that individuals realize a range of benefits when working from home periodically. We currently expect that individuals spend at least 3 to 4 days a week working in the office (which may include traveling to another CRA office or to client meetings), with specific days determined in coordination with your practice or team.
Our Commitment to Equal Employment Opportunity
Charles River Associates is an equal opportunity employer (EOE). All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, age, disability, status as a protected veteran, or any other protected characteristic under applicable law.
Salary and other compensation
A good-faith estimate of the annual base salary range for this position is $130,000 - $150,000. Stating pay within this range may vary based on factors such as education level, experience, skills, geographic location, market conditions, and other qualifications of the successful candidate. This position may be eligible for additional bonus incentive compensation.
CRA offers a comprehensive benefits package, subject to eligibility requirements, which may include: medical, dental, and vision insurance; 401(k) retirement plan with employer match; life and disability insurance; paid time off (vacation, sick leave, holidays); paid parental leave; wellness programs and employee assistance resources; and commuter benefits.