Optical character recognition (OCR)

> (Data|State) Management and Processing > (Data Type|Data Structure) > Raster (BitMap) Image

1 - About

Optical character recognition, usually abbreviated to OCR, is the mechanical or electronic translation of images of handwritten, typewritten or printed text (usually captured by a scanner) into machine-editable text.


3 - Software

All the open source software are based on the tesseract OCR engine. It doesn't have a GUI but provides one list of software using it.

3.1 - VietOCR

VietOCR is the only one that I found which:

  • simply works
  • is easy to use.
  • gives great result for screen shot. You have still to check the “ScreenShot Mode” from the Image menu.

3.2 - Free OCR

For windows, you have also FreeOCR 2.6. It work well but you can't process more than one page at a time. May be it's the good way because you always need to clean the result.

4 - Documentation / Reference