NLP Engineer & Computer Vision – Rebuild OCR→LLM Comic Translation Pipeline (Convex + Python) - Contract to Hire

Remote, USA Full-time
We’re hiring an experienced Computer Vision + NLP Engineer to rebuild our entire Korean → English comic translation tool from scratch. The current system works, but we need a clean, modular, much faster, and more accurate version built on top of Convex instead of Supabase for real-time updates. You will replicate the existing workflow exactly, and improve it across accuracy, performance, and architecture. This is a rebuild from scratch. # Current Validated Workflow (What You Will Rebuild and Improve)Our existing tool processes full chapters with this pipeline:1.Upload – Chapter images are uploaded. 2. Text Detection – CRAFT generates bounding boxes around text. 3. Text Extraction (OCR) – Gemini 2.5 Pro extracts Korean text inside each bounding box. 4. Panel Detection – OpenCV identifies comic panels in each image. 5. Panel Filtering – Gemini 2.5 Pro removes inaccurate/outlier panels. 6. Alignment – Remaining text boxes are matched to their correct panels. 7. Translation – Gemini 2.5 Pro produces English translations using panel and chapter context. This workflow is already validated and must behave the same, just faster, cleaner, more accurate, and modular.#Your Job inThis ProjectRebuild this entire system from zero with a modern, maintainable architecture that gives us:Better accuracy• More precise bounding boxes• Higher OCR accuracy (including stylized Korean fonts)• Better panel detection and filtering• More consistent, human-like translationsMuch faster overall performance• Dramatically reduced processing time per chapter• Efficient batching and async operations• Minimal latency from upload to final resultsA modular, replaceable architectureEvery step must be isolated behind a clear interface so we can easily swap components:• Replace CRAFT → PaddleOCR / Donut / Yolov8 detector• Replace Gemini → GPT or another LLM• Replace panel detector without touching text logic• Swap OCR engines freely (Paddle, Donut, TrOCR, GPT fallback)Modular means no rewrites when upgrading models.Convex-based backend• Real-time updates streamed to the frontend• Job orchestration in Convex• Stable state management• Partial outputs instead of waiting for entire chapter completion# What You Must Deliver ForThe $2,000 Milestone1. Fully rebuilt pipeline implementing all steps (upload → detection → OCR → panels → alignment → translation). 2. Modular architecture where detection, OCR, panel logic, and translation can be swapped independently. 3. Convex integration for real-time syncing, job progress, and results.4. Significant accuracy improvements over the current system. 5. Significant performance improvements (faster processing end-to-end). 6. Clean project structure with documentation for all modules and interfaces. # Tech Stack You Will Use• Python – OCR, detection, panel processing, AI orchestration• TypeScript – Convex + frontend integration• Convex – backend database, jobs, and real-time sync• OCR Tools – CRAFT for text detection• LLMs – Gemini, GPT#Required SkillsMust have• Strong OCR experience• Experience with LLM-based translation/localization• Python + TypeScript proficiency• Ability to design clean, modular system architectures• Experience rebuilding/refactoring complex pipelines# To ApplyPlease include:• Relevant projects (OCR, CV, LLM translation, or modular system rebuilds)• Examples where you improved accuracy, performance, or architecture• A short explanation of how you would:1.Design a modular detection → OCR → panel → translation pipeline2. Improve bounding boxes and OCR for stylized Korean fonts3. Integrate Convex for real-time progress streaming to the frontend Apply tot his job
Apply Now

Similar Jobs

Back to Home