Job description
Job details
- Location: London, UK
- Work mode: Remote
- Employment type: Freelance (Not an internship)
- Salary: GBP 71,783 per year
Role overview
Mindrift is seeking a Freelance Agent Evaluation Engineer to help build high-quality datasets for evaluating AI coding agents. In this role, you will design complex, real-world developer tasks and establish rigorous evaluation criteria to measure how effectively AI models handle software engineering challenges. This is a project-based opportunity ideal for specialists looking to influence the development of next-generation AI systems.
Job details
This is a Freelance position based in London, UK, operating as a Remote opportunity. This role is not an Internship. The compensation for this project is 71,783 GBP.
Responsibilities
- Develop challenging, real-world coding tasks to test the capabilities of AI agents.
- Create detailed evaluation criteria and benchmarks for AI model performance.
- Analyze AI-generated code for accuracy, efficiency, and adherence to best practices.
- Contribute to the refinement of datasets used for training and testing AI systems.
- Collaborate with tech teams to identify gaps in current AI coding capabilities.
Requirements
- Proven experience in software development with strong coding proficiency.
- Ability to design complex technical scenarios and edge cases for testing.
- Professional level proficiency in English, both written and verbal.
- Strong analytical skills to evaluate the logic and correctness of AI outputs.
- Experience with AI tools or LLM evaluation is highly desirable.
Benefits
- Flexible remote work environment.
- Opportunity to work with leading global tech companies.
- Engagement in cutting-edge AI research and development.
- Competitive project-based compensation.
Keywords
AI EvaluationLLMCoding AgentsDataset CreationSoftware EngineeringPythonModel TestingPrompt Engineering