What can you see in the image above? A seemingly innocuous picture of Tower Bridge and the London skyline? But, there’s also hidden instructions for a devious, top-secret scheme. And, it can’t be detected by just looking at the picture.
Steganography is the art of hiding messages in plain sight in such a way that it is undetectable without knowledge of the message. While cryptography is used to encode messages, such encodings can arouse suspicion and provoke attempts to crack the code. Steganography, on the other hand, should be undetectable except by those in the know even if shared through a public medium. Combining both can give powerful methods to disperse information discreetly.
It is not a new technique and here are some cool examples of historical uses:
- In ancient Greece, messages would be tattooed onto a messenger’s shaved scalp. Then, the hair would be allowed to regrow. Afterwards, the messenger could travel to their destination and upon arrival, have their head shaved to read the message. (Histiaeus)
- During WWII, the British Secret Service devised an ingenious way to smuggle escape kits to allied POWs in Germany. They produced “special” Monopoly games that were sent with Red Cross care packages to allied prisoners . These “special” games contained many useful items that appeared to be part of the game, including pieces that were metal files and magnetic compasses, silk maps of German positions, and even real money. (Monopoly)
- In the movie, A Beautiful Mind, John Nash is often looking for patterns hidden in magazines, newspapers, etc. The Da Vinci Code (and sequels) also has elements of steganography and cryptography.
There are many more examples but the idea is simple enough and it has direct applications in the world of clandestine operations and counter intelligence. The examples above are from the physical world, but the digital world has significantly advanced techniques in steganography and cryptography. So, back to the image at the top, the following image is hidden within the encoding:
Feel free to open the link at the top and the link to the original image here and see if you can spot any difference. Answer: you can’t.
How does it work?
First, we have to understand the structure of an image. There are a variety of image formats, but I will focus on the PNG format because it is lossless and supports 24-bit RGB space. I use lossless formats because the method requires the full integrity of the image to be intact and data cannot be compressed. An image consists of multiple pixels (think grid) and each pixel has a color represented in RGB color space by values ranging from 0 to 255. For example, pure red would have values of R(ed): 255, G(reen): 0, and B(lue): 0. These are decimal values and can be represented in 8 binary bits as in R: 1111 1111, G: 0000 0000, B: 0000 0000. The 8 bits for each color is a total of 24 bits as mentioned above.
The method I used is known as LSB (Least Significant Bit) and it works because the human eye can barely differentiate between adjacent colors in the RGB color space. As an example, examine the image below of three different colors. Can you spot the boundary where one ends and the next begins?
Suppose in an image, the first pixel has R value 250 or 1111 1010. We know that the human eye cannot detect a difference when the R value is changed to 251 or 1111 1011. So, the least significant bit (in orange) can be used to store information (In fact, we could even use the 2 least significant bits or more, but at some point the changes become discernible – I might experiment with this later). This means each pixel can store 3 bits of information for each of RGB.
The original image of London is 1024 * 768 pixels = 786,432 pixels * 3 bits of information (for each of RGB) = 2,359,296 bits of storage available.
The hidden image of Gnomes is 350 * 264 pixels = 92,400 pixels * 3 bits of information (for each of RGB) * 8 bits per byte = 2,217,600 bits of storage needed. Perfect.
I chose this image as an example, but of course I could read in and hide any data: images, text, files, table of coordinates, credit card numbers, etc.
The algorithm is a straight forward implementation of the concepts above. I used R once again as that seems to have become my language of choice now. The steps to hide an image:
- Read in original image as a giant matrix of pixels for each of RGB
- Read in data we want to encode and convert to stream of bits
- Iterate through the stream of bits and set the LSB of each pixel in the matrix
- Convert RGB matrix back to image
This image is different but this is imperceptible to the human eye. Furthermore, some bits may not have even been changed as there is 50% chance that the bit is already set to what is required. This is the image I posted at the top of the page.
And the image can similarly be “discovered”:
- Read in the encoded image as a giant matrix of pixels for each of RGB
- Get the least significant bit of each pixel up to size of expected image
- Convert this bit stream to an RGB matrix
- Convert RGB matrix to image
This will give the original image back exactly. Notice above that there is a requirement to know how many bits were originally encoded. Obviously this is not ideal, but it was easy to implement.
I spoke earlier about the historic usage of steganography, but today the landscape has changed. The amount of data being transferred regularly is immense. This sort of technique can be used to hide a message in a variety of mediums such as basic file transfers, youtube videos, forum avatars, email signatures, etc. Additionally, cryptography can be used to encode the message. This way, even if spotted, the message is still tricky to decipher.
You may be wondering, how can such a hidden message be spotted? There is a whole field of study on this called steganalysis. For the method I used, the LSB of the encoded image will be much noisier than we would expect from a typical image. This is because instead of smooth color changes, where adjacent pixels may have similar LSB values, the pixels change according to the hidden data. This fact can be exploited to detect statistical anomalies.
One last item of interest, this article about storing information in DNA caught my eye last week. Perhaps, in the future, steganography could even be used to hide messages in our very own cells.