martes, 1 de abril de 2014

The OCR World

I didn't know that OCR was in so many troubles...

I'm doing some research on OCR technology because I have to use it in a project right now.
But all the Open Sources alternatives are really crappy. They show on videos and make promises about something is not real.

I understand, of course. After all it's Open Source... Payed apps work much better.. As usual.

I'll post now this Javascript API (Open Source) which does not work excellent but it's a good start and a very valuable work.

http://kdzwinel.github.io/JS-OCR-demo/

Check a video on how the developer came to this: https://www.youtube.com/watch?v=9TzXcBBC1J8
And this other video introducing his final work: https://www.youtube.com/watch?v=ttn437BlEbo

I'll update this blog as soon as I find something interesting on any platform (Android or Javascript)

** UPDATE **

Well, after fighting a bit with ALL the important and most professional OCR APIs out there (I was looking for something good for Android and/or Javascript), I finally will keep this one:

https://github.com/rmtheis/android-ocr

Which is based on Tesseract...

You can also download an APK form Google Play and test it.
https://play.google.com/store/apps/details?id=edu.sfsu.cs.orange.ocr

For my project, I just need to recognize numbers. So I had to do a small tweak to get it done:

1) Download and build the project.
2) Go to the OCRTest Android project and open the following clas:

edu.sfsu.cs.orange.ocr.OcrCharacterHelper

3) Move down to the line number 217 (approx) you will see the whitelisted chars defined for recognition (and for the English language). Something like:

else if (languageCode.equals("eng")) { return "!?@#$%&*()<>_-+=/.,:;'\"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789"; } // English

You should comment that part and replace it for this one:

else if (languageCode.equals("eng")) { return "0123456789"; } // English

And that's it! The process will handle only numbers.