Onlano
AI / Machine LearningRemote

Freelance Agent Evaluation Engineer

Mindrift - Company

London, UK, ๐Ÿ‡ฌ๐Ÿ‡ง United KingdomFreelance - Mid level (2-4 years)0 applicantsCloses Jul 23, 2026

Salary

GBP 71,783 - 71,783 / year

Apply for this job

Job description

Job details

  • Location: London, UK
  • Work mode: Remote
  • Employment type: Freelance (Not an internship)
  • Salary: GBP 71,783 per year

Role overview

Mindrift is seeking a Freelance Agent Evaluation Engineer to help build high-quality datasets for evaluating AI coding agents. In this role, you will design complex, real-world developer tasks and establish rigorous evaluation criteria to measure how effectively AI models handle software engineering challenges. This is a project-based opportunity ideal for specialists looking to influence the development of next-generation AI systems.

Job details

This is a Freelance position based in London, UK, operating as a Remote opportunity. This role is not an Internship. The compensation for this project is 71,783 GBP.

Responsibilities

  • Develop challenging, real-world coding tasks to test the capabilities of AI agents.
  • Create detailed evaluation criteria and benchmarks for AI model performance.
  • Analyze AI-generated code for accuracy, efficiency, and adherence to best practices.
  • Contribute to the refinement of datasets used for training and testing AI systems.
  • Collaborate with tech teams to identify gaps in current AI coding capabilities.

Requirements

  • Proven experience in software development with strong coding proficiency.
  • Ability to design complex technical scenarios and edge cases for testing.
  • Professional level proficiency in English, both written and verbal.
  • Strong analytical skills to evaluate the logic and correctness of AI outputs.
  • Experience with AI tools or LLM evaluation is highly desirable.

Benefits

  • Flexible remote work environment.
  • Opportunity to work with leading global tech companies.
  • Engagement in cutting-edge AI research and development.
  • Competitive project-based compensation.

Keywords

AI EvaluationLLMCoding AgentsDataset CreationSoftware EngineeringPythonModel TestingPrompt Engineering