Freelance Agent Evaluation Engineer
with development tools (terminal, CLI), MCP servers (repository, task tracker, messenger, documentation, etc.), and a real web...
with development tools (terminal, CLI), MCP servers (repository, task tracker, messenger, documentation, etc.), and a real web...
experts to contribute to advanced AI evaluation and benchmarking projects focused on realistic terminal-based... terminal-based benchmark tasks for AI evaluation systems Create technically deep debugging and investigation scenarios...