OCR Capability in Merlin
in progress
endu
in progress
endu
Merged in a post:
Feature Request: Correct OCR Texts
Jens Leinenbach
ChatGPT is able to correct broken OCR recognized texts. My feature request is to create a tool so that it is capable to correct uploaded texts >2 MB.
Example prompt:
You will receive text that likely contains errors introduced by Optical Character Recognition (OCR). Your primary objective is to reconstruct the text accurately, correcting only those errors clearly attributable to the OCR process. You must preserve the original meaning, structure, and formatting of the source as faithfully as possible.
Instructions:
- Character and Word-Level Corrections:
- Identify and correct commonly confused OCR characters (e.g., “0” mistaken for “O”, “1” for “I”, “rn” for “m”).
- Rectify misread letters or punctuation if the intended character or word is unambiguously clear from standard language use and spelling.
- If multiple plausible corrections exist, choose the most likely valid word that does not alter the text’s original meaning.
- Line Order and Logical Structure:
- For multi-column or complex layouts, reorder lines that have been misplaced by the OCR process so the text follows a coherent reading order.
- Preserve paragraphs, headings, bullet points, and other formatting elements. If the structure is unclear, reconstruct it logically without changing the text’s intended content.
- Retain incomplete or fragmented lines, placing them in the most sensible context rather than discarding them.
- Hyphenation and Word Splitting:
- Remove end-of-line hyphens used solely to indicate line breaks, recombining words correctly.
- Correct unintended internal hyphenations introduced by the OCR process.
- Punctuation and Typography:
- Standardize and correct punctuation marks (e.g., commas, periods, semicolons, colons, question marks, exclamation points) and ensure proper spacing.
- Replace incorrect quotation marks, dashes, and other special characters with their proper typographic equivalents, following conventional usage.
- Preserving Content Integrity:
- Do not modify the text’s meaning, insert new content, or omit meaningful elements.
- Align all substantive text elements coherently, even if incomplete or partially damaged, to reflect the original text’s intent as closely as possible.
Goal:
Deliver a clean, coherent, and typographically correct version of the text by fixing verifiable OCR errors. Ensure the final output remains true to the original content’s intent and structure, applying logical, context-based corrections where needed.
L
Lucas
I'm seriously considering asking for my refund because your OCR tool is absolutely terrible. It's UNBELIEVABLE how NOTHING works when it comes to OCR. The whole thing is completely useless!
L
Lucas
I'm seriously considering asking for my refund because your OCR tool is absolutely terrible. It's UNBELIEVABLE how NOTHING works when it comes to OCR. The whole thing is completely useless!
Merlin
Ethan Cohen, We discussed OCR capability with our Tech team, and turns out that OCR pipelines are not yet optimised for cost, quality and most importantly -- time efficiency. We're aware that Claude has added OCR recently, but we're still figuring out how to implement it in the best way possible. Unfortunately, it'd have to wait for a while. We'll let this thread know when we get headway. Thanks!
Ethan Cohen
Hey team - do we have a timeline to release on this one? Feel like we're getting a lot of likes but haven't seen any movement.
G
Guy
It would be useful to be able to say "do XYZ based on the tex tin the image", for example.
endu
planned