Job description
Role overview
Mindrift connects specialists with project-based AI opportunities for leading tech companies, focusing on testing and improving AI systems. This freelance role involves building datasets to evaluate AI coding agents' performance on real-world developer tasks.
You'll design challenging technical tasks and create evaluation criteria to measure how effectively AI models handle software development challenges. The position is project-based and does not include permanent employment.
Responsibilities
- Design and implement evaluation datasets for AI coding agents
- Create realistic software development tasks to test AI capabilities
- Develop scoring criteria to measure AI model performance
- Collaborate with AI teams to improve evaluation methodologies
Requirements
- 2+ years experience in AI/ML testing or software development
- Strong understanding of coding workflows and software development practices
- Proficiency in English (written/verbal communication)
- Experience with AI evaluation frameworks or methodologies
Benefits
- Project-based work with leading tech companies
- Contribute to cutting-edge AI evaluation research
- Flexible freelance arrangement
Keywords
AI evaluationmachine learning testingcoding agentsdataset creationAI systemsdeveloper tasksmodel evaluationEnglish proficiency