Onlano
AI / Machine LearningRemote

Freelance Agent Evaluation Engineer

Mindrift - Company

Birmingham, West Midlands, ๐Ÿ‡ฌ๐Ÿ‡ง United KingdomFreelance - Mid level (2-4 years)0 applicantsCloses Jul 23, 2026

Salary

GBP 67,501 - 67,501 / year

Apply for this job

Job description

Job details

  • Location: Birmingham, West Midlands
  • Work mode: Remote
  • Employment type: Freelance (Not an internship)
  • Salary: GBP 67,501 per year

Role overview

Mindrift is seeking a Freelance Agent Evaluation Engineer to help improve the capabilities of AI coding agents. In this role, you will contribute to the development of high-quality datasets used to test how AI models handle complex, real-world developer tasks. You will be responsible for creating challenging technical scenarios and defining the criteria used to evaluate the accuracy and efficiency of AI-generated code.

Job details

This is a freelance, project-based position located in Birmingham, West Midlands. The role is Remote and is not an internship. The compensation for this engagement is 67,501 GBP.

Responsibilities

  • Develop complex, real-world coding tasks to test AI agent performance
  • Create detailed evaluation criteria to measure model accuracy and reliability
  • Analyze AI-generated code to identify failures and areas for improvement
  • Collaborate with tech teams to refine dataset quality and diversity
  • Document edge cases and technical constraints for AI model training

Requirements

  • Strong proficiency in software development and multiple programming languages
  • Experience in testing, evaluating, or fine-tuning AI/ML models
  • Ability to design challenging technical benchmarks for coding tasks
  • Professional fluency in English for documentation and communication
  • Proven track record of solving complex developer-level problems

Benefits

  • Flexible project-based working hours
  • Opportunity to work with leading global tech companies
  • Remote work environment
  • Competitive freelance compensation

Keywords

AI EvaluationLLM TestingSoftware EngineeringDataset CreationCoding AgentsQuality AssurancePythonPrompt Engineering