Job description
Job details
- Location: Manchester, Greater Manchester
- Work mode: Remote
- Employment type: Freelance (Not an internship)
- Salary: GBP 67,833 per year
Role overview
Mindrift is seeking a Freelance Agent Evaluation Engineer to help refine the capabilities of AI coding agents. In this role, you will contribute to the development of high-quality datasets used to test how AI models handle complex, real-world developer tasks. You will be responsible for creating challenging scenarios and defining strict evaluation criteria to ensure AI systems meet professional software engineering standards.
Job details
This is a Freelance position based in Manchester, Greater Manchester, operating on a Remote basis. This is not an Internship. The compensation for this project-based engagement is 67,833 GBP.
Responsibilities
- Design and implement challenging coding tasks to evaluate AI agent performance
- Develop comprehensive evaluation criteria and benchmarks for AI-generated code
- Analyze AI model outputs to identify edge cases and failure modes
- Create high-quality datasets that simulate real-world developer workflows
- Collaborate with tech teams to improve the accuracy of AI coding systems
Requirements
- Strong proficiency in software development and professional coding practices
- Experience with AI models, LLMs, or machine learning evaluation
- Ability to write clear, technical documentation and evaluation rubrics
- Fluent level of English proficiency for technical communication
- Proven track record of solving complex algorithmic or architectural problems
Benefits
- Flexible freelance working arrangements
- Opportunity to work with leading global tech companies
- Exposure to cutting-edge AI development and testing
- Remote work flexibility
Keywords
AI EvaluationLLM TestingCoding AgentsDataset CreationSoftware EngineeringPrompt EngineeringQuality Assurance