Onlano
AI / Machine LearningRemote

Freelance Agent Evaluation Engineer

Mindrift - Agency

Birmingham, West Midlands, ๐Ÿ‡ฌ๐Ÿ‡ง United KingdomFreelance - Mid level (2-4 years)0 applicantsCloses Jul 31, 2026

Salary

GBP 67,501 - 67,501 / year

Apply for this job

Job description

Job details

  • Location: Birmingham, West Midlands
  • Work mode: Remote
  • Employment type: Freelance (Not an internship)
  • Salary: GBP 67,501 per year

Role overview

Mindrift is seeking a Freelance Agent Evaluation Engineer to help build high-quality datasets for evaluating AI coding agents. In this role, you will focus on testing how AI models handle real-world developer tasks by creating challenging scenarios and rigorous evaluation criteria to improve AI system performance.

Job details

This is a Freelance position based in Birmingham, West Midlands, operating on a Remote basis. This role is not an internship. The compensation for this project-based engagement is 67,501 GBP.

Responsibilities

  • Develop complex, real-world developer tasks to test AI coding agent capabilities.
  • Establish clear and objective evaluation criteria for AI-generated code outputs.
  • Analyze model performance and provide detailed feedback for AI system improvement.
  • Create diverse datasets that challenge the reasoning and coding logic of AI models.
  • Collaborate with tech teams to refine the evaluation framework for coding agents.

Requirements

  • Proven experience in software development with strong coding proficiency.
  • Ability to design complex technical tasks and edge cases for software testing.
  • High level of English proficiency for documentation and communication.
  • Analytical mindset with the ability to evaluate code quality and efficiency.
  • Experience with AI models or LLM evaluation is highly desirable.

Benefits

  • Flexible remote work environment
  • Opportunity to work with leading tech companies
  • Engagement in cutting-edge AI development projects
  • Competitive project-based compensation

Keywords

AI EvaluationLLM TestingCoding AgentsDataset CreationSoftware EngineeringAI TrainingPrompt Engineering