, you will design and validate challenging benchmark tasks to help surface and diagnose reasoning gaps in a target model. Your work.... Identify tasks where the target model fails, specifically classifying failures in physics reasoning and mathematical derivation...
, you will design and validate challenging benchmark tasks to help surface and diagnose reasoning gaps in a target model. Your work.... Identify tasks where the target model fails, specifically classifying failures in physics reasoning and mathematical derivation...
Lugar:
Tampa, FL | 02/06/2026 20:06:16 PM | Salario: S/. $100 per hour | Empresa:
SaidGig, you will design and validate challenging benchmark tasks to help surface and diagnose reasoning gaps in a target model. Your work.... Identify tasks where the target model fails, specifically classifying failures in physics reasoning and mathematical derivation...
, you will design and validate challenging benchmark tasks to help surface and diagnose reasoning gaps in a target model. Your work.... Identify tasks where the target model fails, specifically classifying failures in physics reasoning and mathematical derivation...
Lugar:
Denver, CO | 02/06/2026 20:06:06 PM | Salario: S/. $100 per hour | Empresa:
SaidGig, you will design and validate challenging benchmark tasks to help surface and diagnose reasoning gaps in a target model. Your work.... Identify tasks where the target model fails, specifically classifying failures in physics reasoning and mathematical derivation...
, you will design and validate challenging benchmark tasks to help surface and diagnose reasoning gaps in a target model. Your work.... Identify tasks where the target model fails, specifically classifying failures in physics reasoning and mathematical derivation...
, you will design and validate challenging benchmark tasks to help surface and diagnose reasoning gaps in a target model. Your work.... Identify tasks where the target model fails, specifically classifying failures in physics reasoning and mathematical derivation...
Lugar:
Houston, TX | 02/06/2026 20:06:48 PM | Salario: S/. $100 per hour | Empresa:
SaidGig, you will design and validate challenging benchmark tasks to help surface and diagnose reasoning gaps in a target model. Your work.... Identify tasks where the target model fails, specifically classifying failures in physics reasoning and mathematical derivation...
Lugar:
Boston, MA | 02/06/2026 20:06:46 PM | Salario: S/. $100 per hour | Empresa:
SaidGig, you will design and validate challenging benchmark tasks to help surface and diagnose reasoning gaps in a target model. Your work.... Identify tasks where the target model fails, specifically classifying failures in physics reasoning and mathematical derivation...
, you will design and validate challenging benchmark tasks to help surface and diagnose reasoning gaps in a target model. Your work.... Identify tasks where the target model fails, specifically classifying failures in physics reasoning and mathematical derivation...
Lugar:
Atlanta, GA | 02/06/2026 20:06:28 PM | Salario: S/. $100 per hour | Empresa:
SaidGig