Onlano
AI / Machine LearningRemote

Freelance Agent Evaluation Engineer

Mindrift - Agency

Manchester, Greater Manchester, ๐Ÿ‡ฌ๐Ÿ‡ง United KingdomFreelance - Mid level (2-4 years)0 applicantsCloses Jul 31, 2026

Salary

GBP 67,833 - 67,833 / year

Apply for this job

Job description

Job details

  • Location: Manchester, Greater Manchester
  • Work mode: Remote
  • Employment type: Freelance (Not an internship)
  • Salary: GBP 67,833 per year

Role overview

Mindrift is seeking a Freelance Agent Evaluation Engineer to help refine the capabilities of AI coding agents. In this role, you will contribute to the development of high-quality datasets used to test how AI models handle complex, real-world developer tasks. You will be responsible for creating challenging scenarios and defining strict evaluation criteria to ensure AI systems meet professional software engineering standards.

Job details

This is a Freelance position based in Manchester, Greater Manchester, operating on a Remote basis. This is not an Internship. The compensation for this project-based engagement is 67,833 GBP.

Responsibilities

  • Design and implement challenging coding tasks to evaluate AI agent performance
  • Develop comprehensive evaluation criteria and benchmarks for AI-generated code
  • Analyze AI model outputs to identify edge cases and failure modes
  • Create high-quality datasets that simulate real-world developer workflows
  • Collaborate with tech teams to improve the accuracy of AI coding systems

Requirements

  • Strong proficiency in software development and professional coding practices
  • Experience with AI models, LLMs, or machine learning evaluation
  • Ability to write clear, technical documentation and evaluation rubrics
  • Fluent level of English proficiency for technical communication
  • Proven track record of solving complex algorithmic or architectural problems

Benefits

  • Flexible freelance working arrangements
  • Opportunity to work with leading global tech companies
  • Exposure to cutting-edge AI development and testing
  • Remote work flexibility

Keywords

AI EvaluationLLM TestingCoding AgentsDataset CreationSoftware EngineeringPrompt EngineeringQuality Assurance