AI / Machine LearningRemote

Freelance Agent Evaluation Engineer

Mindrift - Agency

Manchester, Greater Manchester, 🇬🇧 United KingdomFreelance - Mid level (2-4 years)0 applicantsCloses Jul 31, 2026

Salary

GBP 67,833 - 67,833 / year

Apply for this job

Job description

Job details

Location: Manchester, Greater Manchester
Work mode: Remote
Employment type: Freelance (Not an internship)
Salary: GBP 67,833 per year

Role overview

Mindrift is seeking a Freelance Agent Evaluation Engineer to help refine the capabilities of AI coding agents. In this role, you will contribute to the development of high-quality datasets used to test how AI models handle complex, real-world developer tasks. You will be responsible for creating challenging scenarios and defining strict evaluation criteria to ensure AI systems meet professional software engineering standards.

Job details

This is a Freelance position based in Manchester, Greater Manchester, operating on a Remote basis. This is not an Internship. The compensation for this project-based engagement is 67,833 GBP.

Responsibilities

Design and implement challenging coding tasks to evaluate AI agent performance
Develop comprehensive evaluation criteria and benchmarks for AI-generated code
Analyze AI model outputs to identify edge cases and failure modes
Create high-quality datasets that simulate real-world developer workflows
Collaborate with tech teams to improve the accuracy of AI coding systems

Requirements

Strong proficiency in software development and professional coding practices
Experience with AI models, LLMs, or machine learning evaluation
Ability to write clear, technical documentation and evaluation rubrics
Fluent level of English proficiency for technical communication
Proven track record of solving complex algorithmic or architectural problems

Benefits

Flexible freelance working arrangements
Opportunity to work with leading global tech companies
Exposure to cutting-edge AI development and testing
Remote work flexibility

Keywords

AI EvaluationLLM TestingCoding AgentsDataset CreationSoftware EngineeringPrompt EngineeringQuality Assurance