Job description
Job details
- Location: Manchester, Greater Manchester
- Work mode: Remote
- Employment type: Freelance (Not an internship)
- Salary: GBP 67,833 per year
Role overview
Mindrift is seeking a Freelance Agent Evaluation Engineer to help build high-quality datasets for evaluating AI coding agents. In this project-based role, you will focus on testing how AI models handle real-world developer tasks by creating challenging scenarios and rigorous evaluation criteria to improve AI system performance.
Job details
This is a Freelance position based in Manchester, Greater Manchester, operating on a Remote basis. This role is not an Internship. The offered salary for this engagement is 67,833 GBP.
Responsibilities
- Develop complex, real-world developer tasks to test AI coding agents
- Define clear evaluation criteria to measure model accuracy and efficiency
- Analyze AI-generated code to identify edge cases and failure points
- Collaborate with tech teams to refine dataset quality for AI training
- Provide detailed feedback on model performance across various coding challenges
Requirements
- Strong proficiency in software development and multiple programming languages
- Experience in testing, evaluating, or fine-tuning AI and LLM systems
- Ability to design challenging technical benchmarks for coding agents
- Professional level fluency in English for documentation and reporting
- Proven track record of solving complex real-world engineering problems
Benefits
- Flexible remote work arrangement
- Opportunity to work with leading global tech companies
- Engagement with cutting-edge AI development projects
Keywords
AI EvaluationLLM TestingCoding AgentsDataset CreationSoftware EngineeringAI Quality Assurance