AI / Machine LearningRemote

Freelance Agent Evaluation Engineer

Mindrift - Company

UK, 🇬🇧 United KingdomFreelance - Mid level (2-4 years)0 applicantsCloses Jul 1, 2026

Salary

GBP 77,221 - 77,221 / year

Apply for this job

Job description

Job details

Location: UK
Work mode: Remote
Employment type: Freelance (Not an internship)
Salary: GBP 77,221 per year

Role overview

Mindrift is seeking a Freelance Agent Evaluation Engineer to help build datasets that evaluate AI coding agents and their ability to handle real-world developer tasks. This project-based opportunity involves creating challenging tasks and evaluation criteria for leading tech companies focused on testing and improving AI systems. The role is remote and available across the UK, offering freelance flexibility for specialists with strong coding and AI evaluation experience.

Job details

Location: UK (Remote) Type: Freelance, project-based engagement Salary: £77,221 GBP Requirements: CV must be submitted in English with English proficiency level indicated

This is not a permanent employment position but a project-based opportunity to work with cutting-edge AI evaluation systems.

Responsibilities

Create challenging tasks to evaluate AI coding agent performance
Develop evaluation criteria for real-world developer scenarios
Test and assess AI model capabilities on coding tasks
Build datasets for AI system improvement and benchmarking
Collaborate with tech companies on AI evaluation projects

Requirements

Strong coding experience in Python or similar languages
Understanding of AI systems and machine learning models
Fluent English proficiency (written and spoken)
Experience with software testing or quality assurance
Ability to design realistic developer task scenarios
Self-directed work style suitable for project-based engagement

Benefits

Competitive freelance rate of £77,221
Fully remote work from anywhere in the UK
Flexible project-based schedule
Work with leading tech companies on AI innovation

Keywords

AI evaluationmachine learningcoding agentsdataset creationPythonmodel testingAI systemsdeveloper tools