AI / Machine LearningRemote

Freelance Agent Evaluation Engineer

Mindrift - Agency

Glasgow, Scotland, 🇬🇧 United KingdomFreelance - Mid level (2-4 years)0 applicantsCloses Jul 31, 2026

Salary

GBP 66,355 - 66,355 / year

Apply for this job

Job description

Job details

Location: Glasgow, Scotland
Work mode: Remote
Employment type: Freelance (Not an internship)
Salary: GBP 66,355 per year

Role overview

Mindrift is seeking a Freelance Agent Evaluation Engineer to help build high-quality datasets for evaluating AI coding agents. In this role, you will design challenging real-world developer tasks and establish rigorous evaluation criteria to measure how effectively AI models handle complex software engineering problems. This is a project-based opportunity ideal for specialists looking to influence the next generation of AI tools.

Job details

This is a Freelance, project-based position located in Glasgow, Scotland. The role is Remote and is not an Internship. The compensation for this engagement is 66,355 GBP.

Responsibilities

Create complex, real-world developer tasks to test AI coding agent capabilities.
Develop detailed evaluation criteria and benchmarks for model performance.
Analyze AI-generated code for accuracy, efficiency, and security.
Collaborate with tech teams to refine dataset quality and diversity.
Provide expert feedback on model behavior to improve AI reasoning.

Requirements

Strong proficiency in software development and multiple programming languages.
Experience in testing, evaluating, or fine-tuning AI systems or LLMs.
Ability to design edge-case scenarios that challenge AI coding logic.
Professional level of English proficiency for documentation and reporting.
Proven track record of delivering high-quality technical work independently.

Benefits

Flexible project-based work schedule.
Opportunity to work with leading global tech companies.
Contribution to cutting-edge AI development.
Competitive freelance compensation.

Keywords

AI EvaluationLLMCoding AgentsDataset CreationSoftware EngineeringPrompt EngineeringQuality Assurance