AI Agent Evaluation Analyst (Freelance)
, or AI-generated content. Familiarity with QA or test-case thinking (edge cases, failure modes, “what could go wrong... and communicate in your field of expertise....
, or AI-generated content. Familiarity with QA or test-case thinking (edge cases, failure modes, “what could go wrong... and communicate in your field of expertise....
constraints, APIs, and CI/CD workflows. Iterate on prompt strategies using test results and human feedback to improve reliability...., GitHub Actions, GitLab CI, Jenkins). Deep understanding of testing principles and test-driven development (TDD...
looking for someone who can design realistic and structured evaluation scenarios for LLM-based agents. You’ll create test cases that simulate... structured test cases that simulate complex human workflows. Define gold-standard behavior and scoring logic to evaluate agent...
analysis, and test generation. Familiarity with cloud platforms (AWS, Azure), containers, and Kubernetes. Strong skills.... Influence how future AI models understand and communicate in your field of expertise....
definitions Creating or extending tools that writers and QAs use to test agents Working closely with infrastructure engineers... to ensure compatibility Occasionally helping with test writing or debug sessions when needed Although we’re...
constraints, APIs, and CI/CD workflows. Iterate on prompt strategies using test results and human feedback to improve reliability...., GitHub Actions, GitLab CI, Jenkins). Deep understanding of testing principles and test-driven development (TDD...
, or AI-generated content. Familiarity with QA or test-case thinking (edge cases, failure modes, “what could go wrong... and communicate in your field of expertise....
) or above. Proficient with code review, quality analysis, and identifying/fixing code smells, anti-patterns, and test gaps in Ruby codebases.... Experience with test integration in CI/CD environments (GitHub Actions, GitLab CI, Jenkins, or CircleCI) Experience using...
looking for someone who can design realistic and structured evaluation scenarios for LLM-based agents. You’ll create test cases that simulate... structured test cases that simulate complex human workflows. Define gold-standard behavior and scoring logic to evaluate agent...
of AI-assisted tools for code generation, refactoring, and test automation. Strong skills in crash analysis, memory debugging... your portfolio. Influence how future AI models understand and communicate in your field of expertise....