Monday, 25 February 2019

More Basic Cryptography

First of all it might be an idea to explain some of the jargon that is used in cryptography (though some are probably pretty obvious):
  • Plain Text: This is the normal, readable, text.
  • Cypher Text: This is the plain text that has been encrypted. Normally it cannot be read, or if it can be read it usually takes some effort.
  • Encrypt: To turn the plain text into cypher text.
  • Decrypt: To turn the cypher text into plain text.
  • Key: The function that is used to encrypt or decrypt a message.
  • Symmetric Key: Where the same key is used to decrypt or encrypt a message.
  • Public/Private Key: Where different keys are used to encrypt or decrypt a message.
  • Public Key: A key that is known to everybody. Normally used to encrypt a message, but is also used to validate a signature.
  • Private Key: A key known only to the holder. Normally used to decrypt a message, but also used to create a signature.

 

Symmetric Key Cryptography

As mentioned above, symmetric key cryptography is where the key that is used to encrypt and decrypt the message is the same. We saw that in the previous post where we were looking at the Caeser cypher and also looking at the double transposition cypher. However, this is also used in other areas, and the one we will be looking at here is known as the 'one-time pad'.

The 'One-Time Pad' is where the key is used once, and once only, and is then discarded. When it comes to using it with computers, basically what we need to do is to convert the letters into numbers, and then XOR the numbers with the key to create the cypher text. The message is then passed on to the other side, where the key is used to decrypt the message. The key is then discarded. The following diagram may help you understand how this works:


Another idea we need to know, an idea that arose from one of my previous subjects, is the XOR. The XOR function is one of the boolean functions where if you XOR the same number (0 or 1) it will produce a 0 and if you XOR different numbers it will produce a 1. Now, I asked whether you could use other functions as well (OR, NOR, AND, NAND) but I really didn't get any answer to that, except for the statement that we use XOR. I'm not really sure, but to be honest, I don't see why we couldn't, but I guess we just use the XOR function because that is the way that they have always done it. Anyway, here is a table that better explains the XOR:

So, let us see it in operation. We have the message 'Attack Sparta' which we want to encrypt. So, using the one time pad, we convert the letters into binary using a code that only we know, and then we spew out a series of numbers which we then use the XOR operator which will produce another series of numbers. Using our code, we then convert the numbers back into letters, which produces the message "tlklkkrttacl'. The table below will show how we came to that:


Now, another interesting thing is that by interposing a different key we may even be able to produce a completely different message. Take the following for example:

Here we have a message, heil Hitler, being encoded, and by using the same key, we have basically decrypted it to produce the same message. However, what if we use a different key?

Well, it seems like we have a completely different message. Actually, you could even have it come out completely garbled, but that would probably defeat the purpose. Similarly if the message came out as 'Strawberries and Pancakes'. Honestly, why would anybody be wandering around with a coded message that says 'Strawberries and Pancakes'. It would sort of raise some alarms.

Now, the stream cypher is what is known as being 'provably secure'. This means that it is accepted that this form of encryption is basically secure. Now, we have absolutely no information about the message from the cypher text. In fact all plain text messages are equally likely, though for this to work the key must be used once, and only once, and is only known to the sender an the receiver. However, there is one exception - the length of the cypher text and the plain text is the same. Further, the key will also need to be as long as the message, which can also create problems, particularly with computer systems. For instance, a 2 GB message will need a key that is also 2 GB in length.

There is another problem - if you know both the cypher text and the plain text then you can easily work out the key. However, since they key is used only once, and if you already know both messages, then I'm not entirely sure why it is that you would also want to know the key. This is known as a plain-text attack.

So, let's consider some uses for this one time pad. Well, we've already seen one, but the above diagram, which was taken from my lecture notes, shows another use which is called a stream cipher. Basically a stream cipher is where a series of pseudo-random bits are generated to encode the message. In fact the message is encoded bit by bit, and the stream is only used once, and then discarded. I say pseudo-random because anything generated by a computer simply is not going to be random. This system is even used today in modern computers, due to the pseudo-random nature of the cipher. However, as mentioned above, the stream cipher is still as long as the message, which means it doesn't offer absolute security, and it also has the potential of taking up quite a lot of space (not to mention the problem of actually getting the key to the other party).

The next concept we have is the block cipher, where a block of plain text is encrypted using a deterministic key. Now, the difference is that the key is not necessarily as long as the block of plain text, and the entire block is encrypted with the key as opposed to the stream cipher where the bits are encrypted one by one. Now, when we have multiple blocks that are identical, the encrypted previous block is actually added to the new block, which then creates a completely different block, and this new block is then encrypted. With the first block, a random vector is generated and added to the block before it is encrypted. Hopefully another diagram from my lecture notes will help explain this:

However, there is one very big problem when it comes to symmetric key encryption, and that is actually exchanging the keys. You see, it is all well and good generating random, one time keys, however if there is no way to actually pass that key on to the recipient using a secure channel, then the whole concept basically hits a brick wall. There is a solution, but before we get to that, let's look as hashes.

Hash Functions

So, now we come to this concept known as the hash. A hash could be defined as being a digital digest of a computer file. Basically, what happens is that when you 'hash' a document, or something else, it will produce a number, usually in hexadecimal, of a pre-determined length. In fact each and every one of those hashes are of exactly the same length. So, let us create a simple document that says 'I woz 'ere'.

Before I continue I will point out that the following instructions are for a Linux command line terminal (which is basically the same for the Apple Mac). However, for Windows it is slightly different, and I also believe you have to download and install other software - this is one of the many reasons that I really don't like Windows.

Anyway, to generate the hash of the document you use the command 'md5sum' and the name of the document you wish to hash. In this instance:

md5sum document.txt

produces the following hash:

7b57ad265cd9bf3f40a52f50b91df07b

Now, let us hash a pdf of Hamlet Prince of Denmark:

md5sum hamlet.pdf

ad88c46a728443257ade0f3ad7f93730.

So, we have two documents of completely different sizes producing a hash that is exactly the same length. However, there is another function of the hash that is very interesting. Let us change 'I woz 'ere' to 'I was here':

aa126cf8d527030d92bc88c729b2beb1

Wow! Have a look at that. we changed one letter and all of a sudden the entire hash has changed. In fact let us change the ' to a " and see what happens.

6de00a205c2046a4ea262756308321e5

Yep, once again the entire hash has changed.

So, let us convert the hashes into binary, and have a look at what values have changed. Since they are ridiculously long, I have broken them down into 32 bits each, with the first message being the top, and the second message being the bottom. I'm sure you can see for yourself have much a single letter has changed the entire value.

10111001000111011111000001111011
00101001101100101011111010110001

01000000101001010010111101010000
10010010101111001000100011000111

01011100110110011011111100111111
11010101001001110000001100001101

1111011010101111010110100100110
10101010000100100110110011111000

Now there is something very, very, very useful with that, namely checking the validity of a document, or a file. In fact if you go to the Kali Linux website you will note that next to each of the files there is what is known as an sha256sum. Let us download one of the files (which will take a but of time, and then hash it using the sha256 algorithm. So, the hash on the website is:

61bc17ee83ffa12e674af35503181bb336e943ccefac90805807f4bf0137e4b2

Now let us hash the file:

61bc17ee83ffa12e674af35503181bb336e943ccefac90805807f4bf0137e4b2

Well, as we can see the hashes are the same, which means that the file that I downloaded was the file that I wanted to download.

So, the reason for this is that we are using the hash to make sure that the file that we have downloaded is the same as the file on the website. This is particularly useful when we are downloading the file from a completely different site, such as a mirror. In fact it is very useful to make sure that the file we have downloaded is what we want and not, say, some trojan horse, or some virus that is designed to completely melt your CPU.

The other thing about the hash is that the conversion is one way. This isn't like an encrypted message where we have a key to decrypt it - once a message has been hashed, there is no way of being able to reconstruct the message from the hash. Does that nullify it's usefulness - well, not quite, but we will get to that shortly because there is another problem with hashes, and that is collision.

Basically a hash collision is where two completely different messages produce exactly the same hash. Now, since the MD5 hash, which is 128 bits, can have 3.4 x 1038 different hashes, which is something along the lines of 340,282,366,920,938,463,463,374,607,431,768,211,456 different combinations, it is unlikely that there will be a collision, though it is still a distinct possibility. In fact a program has been written to verify the chances of hash collisions without going through every possible combination (known as a brute force technique).

Now, there are numerous uses for the hash, and one that is pretty much used everyday is passwords. Basically your password isn't stored in a server as a password, but as a hash (though not all websites actually do this, and as such I would be very wary of such sites). So, when you enter your password details, the encrypted password is sent to the server, hashed, and is then compared with the hash that is stored there. A match will grant you access. This is the main reason why, if you forget your password, IT won't be able to retrieve it for you, namely because nobody, other than you, knows what the actual password is.

Another use is in what is called a closed auction. What happens is that when the bids are made the bids are hashed, and thus only the bidder actually knows what they had bid. However, there is a slight problem with that namely because a rather unscrupulous person, who knows the range of the bids, can basically generate a hash for each of the numbers in that range and then work out what the other participants' bids are. There is a way of solving this though, and that is by salting the hash. You know how by changing a single character will dramatically change the actual hash. Well, you can do that with the bid by adding a word, such as 'puppy dog', that only the bidder knows, so that the bids are sealed even further.

One final use for the hash is through spam reduction, which is a similar method that is used with proof of work for bitcoin (which we will look at later). Basically this method makes it much more difficult for bulk emails to be sent. To send a bulk email, a hash needs to be generated that starts with a certain number of 0s. The more 0s that are required, the greater amount of work that is produced. This does change the dynamics because there was a time when sending out bulk emails was very, very cheap. However adding a proof of work to the process means that this is no longer the case.



Creative Commons License

More Basic Cryptography by David Alfred Sarkies is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. This license only applies to the text and any image that is within the public domain. Any images or videos that are the subject of copyright are not covered by this license. Use of these images are for illustrative purposes only are are not intended to assert ownership. If you wish to use this work commercially please feel free to contact me

No comments:

Post a Comment