AI Evaluation Specialist · Prompt Engineer

Dhruv Patel

Building evaluation infrastructure and training data for frontier AI systems.

01About

Computer Engineering graduate with 1+ year of hands-on experience in LLM evaluation, prompt engineering, and AI training data creation.

Currently contracted across three AI platforms — executing structured SxS evaluations and hallucination detection at Mercor AI, authoring SWE-bench-style tasks and gold-standard datasets at AfterQuery, and performing large-scale bilingual data annotation at Innodata.

Skilled in RLHF pipelines, Generative AI output assessment, AI Alignment, bias detection, and PII identification. Combines strong analytical evaluation skills with full-stack development expertise.

dhruvpatel2284@gmail.com
+91 9898570058
India (Remote)
20+AI Projects
5Companies
1+Year

02Experience

Mercor AI

San Francisco, USA — Remote

AI Evaluation Expert (Contract)

  • Evaluated LLM outputs across sports, general knowledge, audio, and video tasks to improve accuracy, reasoning, and safety.
  • Contributed to 6 specialized workflows: Cricket Expert (APAC), EU Football, Basketball, Audio, Short Video Captioning, Generalist.
  • Performed SxS evaluation benchmarking helpfulness, correctness, and contextual understanding.
  • Designed prompt writing scenarios to test edge cases and real-world reasoning.
  • Identified hallucinations, logical inconsistencies, bias, and PII risks in AI-generated responses.
  • Evaluated multimodal outputs (text, audio, video) under strict annotation guidelines.
LLM EvalSxSHallucination DetectionRLHFMultimodalPrompt Engineering

AfterQuery Expert

San Francisco, CA — Remote

SWE Benchmark Engineer (Contract)

  • Built and validated SWE-bench style debugging and benchmark tasks for frontier AI coding agents across multiple evaluation projects.
  • Designed Dockerized environments, automated test harnesses, and reproducible bug-fix workflows for realistic AI agent evaluation.
  • Developed full-stack and terminal-based engineering tasks involving React, Node.js, Express, MongoDB, Python, Flask, Linux, and CI pipelines.
  • Performed regression testing, Git-based validation, patch verification, and infrastructure debugging for benchmark reliability and reproducibility.
  • Contributed to large-scale AI training and evaluation workflows by creating high-quality task specifications and human-generated datasets.
SWE-BenchDockerReactNode.jsPythonCI/CDGit

Innodata India Pvt. Ltd.

Remote

AI Data Annotator & Bilingual Content Evaluator (Contract)

  • Performed large-scale annotation: text classification, sentiment analysis, and NER.
  • Evaluated prompt-response alignment for contextual accuracy and completeness.
  • Identified bias, safety issues, and hallucinations; verified factual correctness from credible sources.
  • Followed strict annotation guidelines for multilingual (English/Hindi) AI training datasets.
NERAnnotationSentiment AnalysisBilingualQuality Assurance

Venom Technologies

Anand, Gujarat

Web Developer Intern

  • Developed customer and admin modules using React.js and Firebase with secure authentication.
  • Integrated Razorpay payment gateway, improving checkout performance by ~40%.
  • Implemented Zustand state management to optimize application performance.
ReactFirebaseRazorpayZustand

Bluebell Compuserve Pvt. Ltd.

Anand

Web Development Intern

  • Built a full-stack pizza ordering system using React.js and MongoDB.
  • Improved UI/UX design to enhance usability and customer experience.
ReactMongoDBUI/UX

03Skills

AI Evaluation & Quality

LLM EvaluationSxS EvaluationHallucination DetectionBias & SafetyPII IdentificationPrompt-Response AlignmentRLHFAI AlignmentBenchmarkingMultimodal Eval

Prompt Engineering & Data

Prompt EngineeringPrompt OptimizationTask AuthoringSWE-Bench CreationData AnnotationNERSentiment AnalysisHuman Feedback

Frontend

React.jsNext.jsTypeScriptHTML/CSSTailwindBootstrap

Backend & Databases

PythonJavaScriptJavaC++FirebaseMongoDBPostgreSQLMySQLREST APIs

Tools & Platforms

DockerGitLinuxPostmanMercor StudioAirtableLabelBoxParimango

04Projects

LN Dev Mart

Full-Stack Retail System

Retail dashboard with POS billing, real-time inventory tracking, automated GST reporting. Reduced build time 40% with Turbopack. Secure API routes and KPI-driven analytics.

Next.jsReactTypeScriptPostgreSQLTailwind

Attendance System

Face Recognition Pipeline

Real-time face recognition for automated attendance. >90% detection accuracy with live video processing and CSV-based logging.

PythonOpenCVdlib

05Education & Certifications

Charutar Vidya Mandal University

B.E. Computer Engineering · Aug 2021 – May 2025

CGPA 8.35 / 10

  • Machine Learning with Python — IBM
  • Introduction to DevOps — IBM
  • SQL — University of Michigan
  • Data Analysis & Visualization — University at Buffalo
  • IELTS — Band 7.0

English (Professional) · Hindi (Native) · Gujarati (Native)

Let's Connect

Open for AI evaluation, prompt engineering, and full-stack development opportunities.