Job description
Job details
- Location: London, UK
- Work mode: Remote
- Employment type: Freelance (Not an internship)
- Salary: GBP 71,783 per year
Role overview
Mindrift is seeking a Freelance Agent Evaluation Engineer to help build high-quality datasets for evaluating AI coding agents. In this role, you will focus on assessing how well AI models handle real-world developer tasks by creating complex scenarios and rigorous evaluation criteria. This is a project-based opportunity designed for specialists who want to contribute to the improvement of next-generation AI systems for leading tech companies.
Job details
This is a Freelance position based in London, UK, operating on a Remote basis. This role is not an Internship. The compensation for this project-based engagement is 71,783 GBP.
Responsibilities
- Develop challenging real-world developer tasks to test AI coding agents
- Define clear and objective evaluation criteria for model responses
- Analyze AI-generated code for accuracy, efficiency, and security
- Collaborate with AI researchers to refine dataset quality
- Provide detailed feedback on model performance and failure points
Requirements
- Proven experience in software development and coding best practices
- Strong proficiency in English, both written and verbal
- Ability to create complex technical test cases and benchmarks
- Analytical mindset with a focus on edge-case detection
- Experience with AI models or LLM evaluation is highly preferred
Benefits
- Flexible remote work environment
- Opportunity to work with leading global tech companies
- Engagement in cutting-edge AI development projects
Keywords
AI EvaluationLLM TestingCoding AgentsDataset CreationSoftware EngineeringPrompt EngineeringQuality Assurance