- Tip of the Day Blog - FreeOCR.net - free optical character recognition program

Monday

Jan102011

FreeOCR.net - free optical character recognition program

Monday, January 10, 2011 at 08:37AM

Have you ever faced a situation where you needed to obtain editable text out of an image or a PDF file created from a scanned document? What you need in this case is "Optical Character Recognition" (OCR) software that will literally "read" the document and try to identify characters and words visually, and FreeOCR.net is just such a program.

FreeOCR.net performs optical character recognition on images or PDF files that have a scanned origin. It can process PDF, TIF, BMP, JPG, and PNG files and provides an acquire function for running documents through a scanner. The simple user interface allows you to exclude non text elements (such as images or tables), although this has to be done manually.

For documents with multiple pages, each individual page has to be processed by the user separately, although FreeOCR will "pool" the output into a single text. FreeOCR.net is based on the open source Tesseract OCR engine and comes pre-installed with English support, although many other languages can be downloaded and added (including non latin character based languages such as Japanese, Korean, Indonesian, etc.)

This is an excellent basic OCR app that can get the job done. It works really well for use on the occasional document, or at least short documents. It is possible to process long documents (ebooks, etc), but in this case you would be better off with some of the more professional (and paid) apps that are out there.

PROS:

Powerful engine: produces excellent results in general, at least for English which I tested. Note that images are recommended to be scanned at 200 dpi or more.
Supported formats: processes PDF and most image filetypes (and will not restrict you to TIF as some others do).
Supports a wide range of languages: English comes pre-installed, but other languages can be installed separately (see here). Languages include French, Italian, German/Fraktur, Spanish, Dutch, Vietnamese, Bangla, Czech, Catalan, Polish, Lithuanian, Latvian, Bulgarian, Russian, Greek, Korean, Slovakian, Ukranian, Japanese, Indonesian, Norwegian, Hungarian, Serbian, Turkish, Tagalog, Romanian, Chinese (traditional & simplified), and Swedish.
Simple interface: allows for selecting chunks of text to process, such as to circumvent pictures and other elements.

CONS:

Does not process pages in batch: as it is designed to do one page at a time, which limits its usefulness for large documents.
No post-OCR processing: such as spellchecking for example.
No user-assisted "learning": such as employed by some other commercial OCR packages.

The verdict: an excellent free OCR solution. If you need to convert the occasional scanned document to editable text this will do the job. However, if you need to process hundreds of pages it can do the job in theory but is likely to be too labor intensive (much less labor intensive that re-typing though!).

Although I only tested English, the multi language support is quite noteworthy. If you do use for other language (esp. non latin) please post on your experience in the comments section. Thanks.

Version Tested: 3.0

FreeOCR.net: free optical character recognition program converts images to text in multiple languages | freewaregenius.com

Miguel M. de la O | Comments Off |