Job description
Job details
- Location: Birmingham, West Midlands
- Work mode: Remote
- Employment type: Freelance (Not an internship)
- Salary: GBP 67,501 per year
Role overview
Mindrift is seeking a Freelance Agent Evaluation Engineer to help build high-quality datasets for evaluating AI coding agents. In this role, you will focus on testing how AI models handle real-world developer tasks by creating challenging scenarios and rigorous evaluation criteria to improve AI system performance.
Job details
This is a Freelance position based in Birmingham, West Midlands, operating on a Remote basis. This role is not an internship. The compensation for this project-based engagement is 67,501 GBP.
Responsibilities
- Develop complex, real-world developer tasks to test AI coding agent capabilities.
- Establish clear and objective evaluation criteria for AI-generated code outputs.
- Analyze model performance and provide detailed feedback for AI system improvement.
- Create diverse datasets that challenge the reasoning and coding logic of AI models.
- Collaborate with tech teams to refine the evaluation framework for coding agents.
Requirements
- Proven experience in software development with strong coding proficiency.
- Ability to design complex technical tasks and edge cases for software testing.
- High level of English proficiency for documentation and communication.
- Analytical mindset with the ability to evaluate code quality and efficiency.
- Experience with AI models or LLM evaluation is highly desirable.
Benefits
- Flexible remote work environment
- Opportunity to work with leading tech companies
- Engagement in cutting-edge AI development projects
- Competitive project-based compensation
Keywords
AI EvaluationLLM TestingCoding AgentsDataset CreationSoftware EngineeringAI TrainingPrompt Engineering