AI / Machine LearningRemote

Freelance Agent Evaluation Engineer

Mindrift - Company

London, UK, 🇬🇧 United KingdomFreelance - Mid level (2-4 years)0 applicantsCloses Jul 23, 2026

Salary

GBP 71,783 - 71,783 / year

Apply for this job

Job description

Job details

Location: London, UK
Work mode: Remote
Employment type: Freelance (Not an internship)
Salary: GBP 71,783 per year

Role overview

Mindrift is seeking a Freelance Agent Evaluation Engineer to help build high-quality datasets for evaluating AI coding agents. In this role, you will design complex, real-world developer tasks and establish rigorous evaluation criteria to measure how effectively AI models handle software engineering challenges. This is a project-based opportunity ideal for specialists looking to influence the development of next-generation AI systems.

Job details

This is a Freelance position based in London, UK, operating as a Remote opportunity. This role is not an Internship. The compensation for this project is 71,783 GBP.

Responsibilities

Develop challenging, real-world coding tasks to test the capabilities of AI agents.
Create detailed evaluation criteria and benchmarks for AI model performance.
Analyze AI-generated code for accuracy, efficiency, and adherence to best practices.
Contribute to the refinement of datasets used for training and testing AI systems.
Collaborate with tech teams to identify gaps in current AI coding capabilities.

Requirements

Proven experience in software development with strong coding proficiency.
Ability to design complex technical scenarios and edge cases for testing.
Professional level proficiency in English, both written and verbal.
Strong analytical skills to evaluate the logic and correctness of AI outputs.
Experience with AI tools or LLM evaluation is highly desirable.

Benefits

Flexible remote work environment.
Opportunity to work with leading global tech companies.
Engagement in cutting-edge AI research and development.
Competitive project-based compensation.

Keywords

AI EvaluationLLMCoding AgentsDataset CreationSoftware EngineeringPythonModel TestingPrompt Engineering