Evaluation Scenario Writer - AI Agent Testing Specialist
with structured formats like JSON/YAML for scenario description. Can define expected agent behaviors (gold paths) and scoring logic...
with structured formats like JSON/YAML for scenario description. Can define expected agent behaviors (gold paths) and scoring logic...
project focused on validating and improving complex task structures, policy logic, and agent evaluation frameworks. Throughout... thinking — you might be a great fit. What you’ll be doing: Reviewing evaluation tasks and scenarios for logic, completeness...
: Critical Thinking - using logic and reasoning to identify the strengths and weaknesses of alternative solutions, conclusions...
. Participate in planning meetings, design reviews, and progress evaluations. Ensure all schedule assumptions and logic are well...
with structured formats like JSON/YAML for scenario description. Can define expected agent behaviors (gold paths) and scoring logic...
, conduct data analysis, identify data quality issues, document reporting logic and data definitions, and support ad-hoc...
with structured formats like JSON/YAML for scenario description. Can define expected agent behaviors (gold paths) and scoring logic...
with structured formats like JSON/YAML for scenario description. Can define expected agent behaviors (gold paths) and scoring logic...
with structured formats like JSON/YAML for scenario description. Can define expected agent behaviors (gold paths) and scoring logic...
. Establish structural checks, scenario logic, and sanity thresholds for automated evaluation of forecasts. Make explicit the... modeling logic including: TPP → eligible pool → penetration → pricing/net → ramp curve → LOE * Ability to explain real vs...