Hashing it out with Sha na na
The word SHA in the title is an actual term in the security business. The initials stand for Secure Hash Algorithm. SHA is a tool that is used whenever digital cinema keys are exchanged, which is often.
SHA-1 and SHA-2 both have several variations, and they are only a few techniques of making a hash. Logically, we therefore need to know what a hash is, and why it is so integral to security.
A hash isn’t a type of encryption, and it also isn’t a checksum, which we learned about in the IT section. Like a checksum, the hash also uses a limited amount of data to summarize a large piece of data. Bu a checksum isn’t designed to be so completely secure, or so random that no two checksums will be the same. As we will see, the need is different, so the technology gets more complex.
This doesn’t stop people from calling a hash a checksum, but don’t let them confuse you.
So, we know what a hash isn't – it is not encryption, it is not a checksum – but what is it?
Imagine that you want to check a file to make certain that it is the same as the original file. You could open both files and compare them line by line, word by word, symbol by symbol – quite laborious work. There are programs that do this, of course, but whether by hand or by computer this takes time.
So, we meet the hash, a tool of cryptography, the art and science of writing codes. Cryptography, which uses the Greek words crypto as its base and graphia as its suffix, means hidden writing. But the hash doesn’t hide the writing at all. It is a one way tool. A piece of text is put into the tool, but the output isn’t meant to go the other way.
The output is called a digest, a type of condensation. Depending upon which tool you are using, and which version, the length of the hash will always be the same. For example, the hash that predominated the early 90’s was named MD5. An MD5 hash was 128 bits long. No matter how short or how long the original content, the digest length is always 128 bits.
Since the method of creating the hash is completely computerized, we won’t need to get into the method of making a hash file. There are several software packages available to take input in and give the proper code out.
But what is a hash used for?
Here’s an example. I have a website that I keep secure. But security is not my specialty. So the program I use automatically creates a hash of any combined username and password that a user submits when they register.
The table below shows two password hashes that were created for two users with the same password but with slightly different usernames. You’ll notice that the username:password hash of the two users are completely different even though only one letter is different.
If someone were to break into my system and get the table of the passwords, this information is useless to them. It is even useless to me. I can’t get into the system with someone else's passwords because the hash doesn’t let me work backwards to figure it out.
This shows some of the advantages of a hash. In addition to the variability, a well designed hash technique is predictable in one very useful way: it doesn’t allow for duplicate hash values, even if the input values are nearly the same. So, when a user logs in with the correct username:password set, the computer can run the login information through the same hash creating algorithm and if it doesn’t match what is stored in the computer, it will refuse entry.
The presumption is that there will never be two identical hashes created from different information. If two people with different username:password sets could somehow create the same hash, this is called a collision. This is something that a hacker could take advantage of, and therefore the hash program would be considered broken.
What are the odds of that?
MD5, which uses a 128-bit hash, can have 3.4 x 1038 possible values, which is:
340,282,366,920,938,463,463,374,607,431,768,211,456 possible hashes.
Steve Friedl points out in his An Illustrated Guide to Cryptographic Hashes:
"...finding a hash collision by random guessing is exceedingly unlikely (it's more likely that a million people will correctly guess all the California Lottery numbers every day for a billion trillion years).”
At the time, the MD5 hash designers thought that it would take a 1,000s of computers tens of 1,000s of years to forcefully find a way to find 2 inputs that would have the same hash. In 1995, a team in China figured a way to make that time shorter, but still taking a ridiculously long time.
When this type of announcement is made, the cryptography community around the world get busy to check everyone's work...if true, they know that the handwriting is on the wall. Even though the seemingly ridiculous amount of power and time required was only lessened by a fraction, they know that eventually the barrier of protection that the code gives to secure credit cards and military secrets...and your computer...is no longer secure. Within months a man from Hungary figured an even faster solution, although it still required 1000's of computer hours (but with fewer processors). Not too long after that, he appeared at a cryptographer's convention with the ability to duplicate MD5 hashes on a laptop in less than an hour.
MD5 is still used because there are ways to add to its strength. One technique is called HMAC (Hash-based Message Authentication Code). The HMAC routines mix a secret key into the algorithm. This is now a common feature with other hash techniques.
You'll remember the group in the United States named NIST, the National Institute of Standards and Technology. It provides a number of services, one of which is being the arbiter of cryptography used for government needs. Banks and other industries also use their standards. They develop hash and encryption standards by asking the cryptogrophy public to submit possible techniques in a competition, then submit these techniques back to the public to comment and test them. In return, when the final choice is made, NIST controls the patent and releases it with a royalty free license. These open source methods ensure that no one can sneak in a piece of code that will give them access to our information. This is one reason to always double-triple reconsider using a hash or password system that is based upon a proprietary algorithm.
In 1995, NIST determined that MD5 was out and SHA was in. The SHA formulas had gone through the vetting process and been approved a couple years before. At the last second, before most people could implement it, they found an error and had to release SHA-1.
SHA-1 uses more bits, 160 to be exact. But more importantly it is also made stronger by using new and different techniques and algorithms to shuffle the internal components being "digested". The possibility of finding a collision is astronomical. But in 2005 a researcher announced that the first walls against SHA-1 was breached. The time it would take to fully break the hash using the computers of the day was still in the billions of years, but experience said it was time to change.
NIST determined that SHA-1 was still safe to use, and could be used until 2010, especially if used with variations like HMAC. But they decreed that top security requirements should shift to SHA-2. SHA-3 is in the competition phase now, but isn’t expected to be ready until 2012. It isn’t required to be more secure than SHA-2, but is required to be more efficient.
What does efficiency have to do with this?
There is a saying in the computer world that says, "What Andy gives, Bill takes away." Andy Moore is a co-founder of Intel, and Bill Gates is a co-founder of Microsoft. A variation says: "What hardware gives, software takes away." It isn't entirely true, but it does point out that regardless of how much faster processors get, the software requirements seem to suck all the added gain away.
The ethernet cable between the d-cinema media player and the projector talks to each other constantly. The machines tell each other who they are and that they are certified as being good guys. This is called handshaking. Every time that this is done with a SHA-2 hash, it takes time and processing power that needs to be used for delivering the picture to the screen.
SHA-2 had been created with variations that have different bit lengths and security levels. If you search the DCI Specification document with "SHA-", you will see that SHA-256 is what d-cinema uses, in addition to HMAC-SHA-1.
If you keep up with technical issues of d-cinema, you will hear about the updates to the DCI spec which have resulted from mandates from NIST. Since the SMPTE and DCI specifications refer back to a NIST document that is being changed, the SMPTE and DCI specs are also going to change. They will change again when SHA-3 becomes the new standard. We will cover this in another lesson. For now, just appreciate once more what your equipment manufacturing suppliers have to go through to keep up to date, often without any additional funding. Getting the required NIST approvals, which are separate from the DCI Compliance approvals, can cost over $30,000, plus all the costs of the engineering staff to supply NIST with the validation requirements.