Dhruv Patel — AI Evaluation Specialist & Prompt Engineer

01About

Computer Engineering graduate with 1+ year of hands-on experience in LLM evaluation, prompt engineering, and AI training data creation.

Currently contracted across three AI platforms — executing structured SxS evaluations and hallucination detection at Mercor AI, authoring SWE-bench-style tasks and gold-standard datasets at AfterQuery, and performing large-scale bilingual data annotation at Innodata.

Skilled in RLHF pipelines, Generative AI output assessment, AI Alignment, bias detection, and PII identification. Combines strong analytical evaluation skills with full-stack development expertise.

02Experience

Mercor AI

San Francisco, USA — Remote

Jan 2026 – Present

AI Evaluation Expert (Contract)

Evaluated LLM outputs across sports, general knowledge, audio, and video tasks to improve accuracy, reasoning, and safety.
Contributed to 6 specialized workflows: Cricket Expert (APAC), EU Football, Basketball, Audio, Short Video Captioning, Generalist.
Performed SxS evaluation benchmarking helpfulness, correctness, and contextual understanding.
Designed prompt writing scenarios to test edge cases and real-world reasoning.
Identified hallucinations, logical inconsistencies, bias, and PII risks in AI-generated responses.
Evaluated multimodal outputs (text, audio, video) under strict annotation guidelines.

LLM EvalSxSHallucination DetectionRLHFMultimodalPrompt Engineering

AfterQuery Expert

San Francisco, CA — Remote

Apr 2026 – Present

SWE Benchmark Engineer (Contract)

Built and validated SWE-bench style debugging and benchmark tasks for frontier AI coding agents across multiple evaluation projects.
Designed Dockerized environments, automated test harnesses, and reproducible bug-fix workflows for realistic AI agent evaluation.
Developed full-stack and terminal-based engineering tasks involving React, Node.js, Express, MongoDB, Python, Flask, Linux, and CI pipelines.
Performed regression testing, Git-based validation, patch verification, and infrastructure debugging for benchmark reliability and reproducibility.
Contributed to large-scale AI training and evaluation workflows by creating high-quality task specifications and human-generated datasets.

SWE-BenchDockerReactNode.jsPythonCI/CDGit

Innodata India Pvt. Ltd.

Remote

Feb 2026 – Present

AI Data Annotator & Bilingual Content Evaluator (Contract)

Performed large-scale annotation: text classification, sentiment analysis, and NER.
Evaluated prompt-response alignment for contextual accuracy and completeness.
Identified bias, safety issues, and hallucinations; verified factual correctness from credible sources.
Followed strict annotation guidelines for multilingual (English/Hindi) AI training datasets.

NERAnnotationSentiment AnalysisBilingualQuality Assurance

Venom Technologies

Anand, Gujarat

Jan – May 2025

Web Developer Intern

Developed customer and admin modules using React.js and Firebase with secure authentication.
Integrated Razorpay payment gateway, improving checkout performance by ~40%.
Implemented Zustand state management to optimize application performance.

ReactFirebaseRazorpayZustand

Bluebell Compuserve Pvt. Ltd.

Anand

May 2024

Web Development Intern

Built a full-stack pizza ordering system using React.js and MongoDB.
Improved UI/UX design to enhance usability and customer experience.

ReactMongoDBUI/UX

03Skills

AI Evaluation & Quality

LLM EvaluationSxS EvaluationHallucination DetectionBias & SafetyPII IdentificationPrompt-Response AlignmentRLHFAI AlignmentBenchmarkingMultimodal Eval

Prompt Engineering & Data

Prompt EngineeringPrompt OptimizationTask AuthoringSWE-Bench CreationData AnnotationNERSentiment AnalysisHuman Feedback

Frontend

React.jsNext.jsTypeScriptHTML/CSSTailwindBootstrap

Backend & Databases

PythonJavaScriptJavaC++FirebaseMongoDBPostgreSQLMySQLREST APIs

Tools & Platforms

DockerGitLinuxPostmanMercor StudioAirtableLabelBoxParimango

04Projects

LN Dev Mart

Full-Stack Retail System

Retail dashboard with POS billing, real-time inventory tracking, automated GST reporting. Reduced build time 40% with Turbopack. Secure API routes and KPI-driven analytics.

Next.jsReactTypeScriptPostgreSQLTailwind

Attendance System

Face Recognition Pipeline

Real-time face recognition for automated attendance. >90% detection accuracy with live video processing and CSV-based logging.

PythonOpenCVdlib

05Education & Certifications

Charutar Vidya Mandal University

B.E. Computer Engineering · Aug 2021 – May 2025

CGPA 8.35 / 10

Machine Learning with Python — IBM
Introduction to DevOps — IBM
SQL — University of Michigan
Data Analysis & Visualization — University at Buffalo
IELTS — Band 7.0

English (Professional) · Hindi (Native) · Gujarati (Native)

Let's Connect

Open for AI evaluation, prompt engineering, and full-stack development opportunities.

Email Phone LinkedIn GitHub