Onlano
AI / Machine LearningRemote

Freelance Agent Evaluation Engineer

Mindrift - Agency

Glasgow, Scotland, ๐Ÿ‡ฌ๐Ÿ‡ง United KingdomFreelance - Mid level (2-4 years)0 applicantsCloses Jul 31, 2026

Salary

GBP 66,355 - 66,355 / year

Apply for this job

Job description

Job details

  • Location: Glasgow, Scotland
  • Work mode: Remote
  • Employment type: Freelance (Not an internship)
  • Salary: GBP 66,355 per year

Role overview

Mindrift is seeking a Freelance Agent Evaluation Engineer to help build high-quality datasets for evaluating AI coding agents. In this role, you will design challenging real-world developer tasks and establish rigorous evaluation criteria to measure how effectively AI models handle complex software engineering problems. This is a project-based opportunity ideal for specialists looking to influence the next generation of AI tools.

Job details

This is a Freelance, project-based position located in Glasgow, Scotland. The role is Remote and is not an Internship. The compensation for this engagement is 66,355 GBP.

Responsibilities

  • Create complex, real-world developer tasks to test AI coding agent capabilities.
  • Develop detailed evaluation criteria and benchmarks for model performance.
  • Analyze AI-generated code for accuracy, efficiency, and security.
  • Collaborate with tech teams to refine dataset quality and diversity.
  • Provide expert feedback on model behavior to improve AI reasoning.

Requirements

  • Strong proficiency in software development and multiple programming languages.
  • Experience in testing, evaluating, or fine-tuning AI systems or LLMs.
  • Ability to design edge-case scenarios that challenge AI coding logic.
  • Professional level of English proficiency for documentation and reporting.
  • Proven track record of delivering high-quality technical work independently.

Benefits

  • Flexible project-based work schedule.
  • Opportunity to work with leading global tech companies.
  • Contribution to cutting-edge AI development.
  • Competitive freelance compensation.

Keywords

AI EvaluationLLMCoding AgentsDataset CreationSoftware EngineeringPrompt EngineeringQuality Assurance