Open Source OCR

I remember several years back I tried out gocr which is an open source character recognition engine. I wasn’t thoroughly impressed, but it sort of worked. Yesterday, I saw the news that Google has released Tesseract as an open source Optical Character Recognition engine. It was originally developed by HP and has been shelved for some time, it’s supposed to be among the top 3 in accuracy according to testing by UNLV. The source code is available at their page. It’ will be good to see this taken up and integrated as a backend by open source scanning applications. (Maybe even office suites as a “recognize text in image file” type option….)

