I had actually worked on this interactive digit recognizer a while ago. I planned to make it more comprehensive, but will do that later. The basic premise is to have a canvas for the user to write a number between 0 to 9. Then run an algorithm to guess which digit was written. This is a very common problem and has been tackled thoroughly before (For examples, see post offices, license plate reading, Google books). In fact, current methods have over 99% accuracy (but significantly lower on noisy data) when dealing with various inputs. I started with the basics and have a few ideas of my own on where I want to use it next.
The main challenge is the algorithm to recognize the hand drawn digit. We could use all kinds of machine learning (SVM, neural networks, etc), but instead I went with a simple Euclidean distance as a starting point. This works surprisingly well and is trivial to implement. The other important piece is to downsample the image, which can be seen in the example below. This is necessary for a few reasons, we want to ignore white space that changes depending on the size of the character drawn. A pixel by pixel comparison would actually be less accurate if using the original image rather than a downsampled image. The downsampling works by cutting a precise box around the image where pixels are colored and breaking the image up into a grid. Any cell in the grid with a colored pixel is colored. Below is an example with my initials, the red lines cut out the image to fit the entire box. Implicitly, there is a grid in the center red square which is where the downsampling is performed.
Have a play with a rudimentary application to see how the downsampling works. Though its far from perfect, for a simple bit of code with no learning involved, it performs alright.
Some day I plan to:
- Have it train on user input
- Use different learning/classification techniques
- Open it up to more characters
- Repeat in Java – I want to make a simple android app that can scan my receipts