Back to jobs
Job Description
Benchmark dataset project evaluating AI models on visual document understanding and instruction-following in the Architecture (building) domain. Experts author complex, grounded tasks with a clear ground-truth output and objective rubric. ~15–20 hrs/week, remote, US/Canada.
