Strategic Program & Ops Lead (Data Projects)
AI, etc.). Experience designing or managing subjective evaluation pipelines (pairwise ranking, rubric-based evaluation, or similar...
AI, etc.). Experience designing or managing subjective evaluation pipelines (pairwise ranking, rubric-based evaluation, or similar...
AI, etc.). Experience designing or managing subjective evaluation pipelines (pairwise ranking, rubric-based evaluation, or similar...
in the loop workflows in collaboration with product/UX and data engineering. Build preference and reward models: pairwise... offline/online metrics, pairwise and rubric based human evals, red teaming, safety/guardrail tests, A/B experiments, and win...
. Adaptation of methods for generalized pairwise comparisons (including DOOR, hierarchical outcomes, net benefit, win odds and win...
frameworks (e.g., model-graded evals, pairwise/rubric scoring, preference learning). o Exposure to AI safety testing, including...
pairwise tests, Continuous Developmental Integration (CDI), Integrated Ground Test (GTI), Distributed Ground Test (GTD), cyber...
pairwise tests, Continuous Developmental Integration (CDI), Integrated Ground Test (GTI), Distributed Ground Test (GTD), cyber...
participates. Ground test types include pairwise tests, Continuous Developmental Integration (CDI), Integrated Ground Test (GTI...