Freelance Agent Evaluation Engineer
application codebase Write tests that accept all correct solutions and reject incorrect ones - neither too strict (breaking..., don't miss bad solutions, and don't break on good ones Review code written by agents, analyze why an agent failed...