Evaluation Scenario Writer - AI Agent Testing Specialist
with structured formats like JSON/YAML for scenario description. Can define expected agent behaviors (gold paths) and scoring logic...
with structured formats like JSON/YAML for scenario description. Can define expected agent behaviors (gold paths) and scoring logic...
with structured formats like JSON/YAML for scenario description. Can define expected agent behaviors (gold paths) and scoring logic...
project focused on validating and improving complex task structures, policy logic, and agent evaluation frameworks. Throughout... thinking — you might be a great fit. What you’ll be doing: Reviewing evaluation tasks and scenarios for logic, completeness...
, conduct data analysis, identify data quality issues, document reporting logic and data definitions, and support ad-hoc...
project focused on validating and improving complex task structures, policy logic, and agent evaluation frameworks. Throughout... thinking — you might be a great fit. What you’ll be doing: Reviewing evaluation tasks and scenarios for logic, completeness...
project focused on validating and improving complex task structures, policy logic, and agent evaluation frameworks. Throughout... thinking — you might be a great fit. What you’ll be doing: Reviewing evaluation tasks and scenarios for logic, completeness...
with structured formats like JSON/YAML for scenario description. Can define expected agent behaviors (gold paths) and scoring logic...
project focused on validating and improving complex task structures, policy logic, and agent evaluation frameworks. Throughout... thinking — you might be a great fit. What you’ll be doing: Reviewing evaluation tasks and scenarios for logic, completeness...
project focused on validating and improving complex task structures, policy logic, and agent evaluation frameworks. Throughout... thinking — you might be a great fit. What you’ll be doing: Reviewing evaluation tasks and scenarios for logic, completeness...
project focused on validating and improving complex task structures, policy logic, and agent evaluation frameworks. Throughout... thinking — you might be a great fit. What you’ll be doing: Reviewing evaluation tasks and scenarios for logic, completeness...