, you will design and validate challenging benchmark tasks to help surface and diagnose reasoning gaps in a target model. Your work.... Identify tasks where the target model fails, specifically classifying failures in physics reasoning and mathematical derivation...
, you will design and validate challenging benchmark tasks to help surface and diagnose reasoning gaps in a target model. Your work.... Identify tasks where the target model fails, specifically classifying failures in physics reasoning and mathematical derivation...
Lugar:
Seattle, WA | 04/06/2026 07:06:47 AM | Salario: S/. $100 per hour | Empresa:
SaidGig, you will design and validate challenging benchmark tasks to help surface and diagnose reasoning gaps in a target model. Your work.... Identify tasks where the target model fails, specifically classifying failures in physics reasoning and mathematical derivation...
Lugar:
Phoenix, AZ | 04/06/2026 07:06:47 AM | Salario: S/. $100 per hour | Empresa:
SaidGig, you will design and validate challenging benchmark tasks to help surface and diagnose reasoning gaps in a target model. Your work.... Identify tasks where the target model fails, specifically classifying failures in physics reasoning and mathematical derivation...
Lugar:
Miami, FL | 04/06/2026 07:06:47 AM | Salario: S/. $100 per hour | Empresa:
SaidGig and problem-solving gaps in target models. The work will involve creating robust, real-world tasks with executable Python tests... development environment, preparing all necessary components using Python. Evaluation and Analysis: Assess the target model...
Lugar:
Austin, TX | 04/06/2026 07:06:47 AM | Salario: S/. $110 per year | Empresa:
SaidGig and problem-solving gaps in target models. The work will involve creating robust, real-world tasks with executable Python tests... development environment, preparing all necessary components using Python. Evaluation and Analysis: Assess the target model...
and problem-solving gaps in target models. The work will involve creating robust, real-world tasks with executable Python tests... development environment, preparing all necessary components using Python. Evaluation and Analysis: Assess the target model...
and problem-solving gaps in target models. The work will involve creating robust, real-world tasks with executable Python tests... development environment, preparing all necessary components using Python. Evaluation and Analysis: Assess the target model...
Lugar:
Tampa, FL | 04/06/2026 07:06:47 AM | Salario: S/. $110 per year | Empresa:
SaidGig and problem-solving gaps in target models. The work will involve creating robust, real-world tasks with executable Python tests... development environment, preparing all necessary components using Python. Evaluation and Analysis: Assess the target model...
and problem-solving gaps in target models. The work will involve creating robust, real-world tasks with executable Python tests... development environment, preparing all necessary components using Python. Evaluation and Analysis: Assess the target model...
Lugar:
Denver, CO | 04/06/2026 07:06:47 AM | Salario: S/. $110 per year | Empresa:
SaidGig