Job description
Job details
- Location: Birmingham, West Midlands
- Work mode: Remote
- Employment type: Freelance (Not an internship)
- Salary: GBP 67,501 per year
Role overview
Mindrift is seeking a Freelance Agent Evaluation Engineer to help improve the capabilities of AI coding agents. In this role, you will contribute to the development of high-quality datasets used to test how AI models handle complex, real-world developer tasks. You will be responsible for creating challenging technical scenarios and defining the criteria used to evaluate the accuracy and efficiency of AI-generated code.
Job details
This is a freelance, project-based position located in Birmingham, West Midlands. The role is Remote and is not an internship. The compensation for this engagement is 67,501 GBP.
Responsibilities
- Develop complex, real-world coding tasks to test AI agent performance
- Create detailed evaluation criteria to measure model accuracy and reliability
- Analyze AI-generated code to identify failures and areas for improvement
- Collaborate with tech teams to refine dataset quality and diversity
- Document edge cases and technical constraints for AI model training
Requirements
- Strong proficiency in software development and multiple programming languages
- Experience in testing, evaluating, or fine-tuning AI/ML models
- Ability to design challenging technical benchmarks for coding tasks
- Professional fluency in English for documentation and communication
- Proven track record of solving complex developer-level problems
Benefits
- Flexible project-based working hours
- Opportunity to work with leading global tech companies
- Remote work environment
- Competitive freelance compensation
Keywords
AI EvaluationLLM TestingSoftware EngineeringDataset CreationCoding AgentsQuality AssurancePythonPrompt Engineering