Tuesday, 4 September 2018

SECED - Fixing Up the Errors

The thing I've noticed about computer science which is somewhat different from my arts degree, is that things seem to build on the previous topics much more than they did with the social sciences. Okay, history does have this ability to compound, and later authors are influenced by authors that came before them, but it really seems as if this compounding is much more prevalent when it comes to the physical sciences, particularly a practical course such as the one I'm currently doing. The reason I suggest this is because all of that stuff we learnt about Boolean algebra (blame George Boole for that), comes into play here - namely bitmasking and error correcting (and it seems to keep on coming in later subjects as well).

Let's start with bitmasking.

Masking Bits

Actually, I'm not really all that sure as to why they call it bit masking, because what this process is doing is basically changing a bit from one state to another, namely from a 0 to a 1 or vice versa. So, we have three ways of doing it - setting, resetting, and flipping.

Setting: This is changing a bit from a 0 to a 1 and is done using the Or process, and you Or every bit with a zero, with the exception of the one you wish to set, which you Or with a 1. The reason we do this is because we don't want to touch any of the other bits while doing so. In this example, we will set bit 2 to change it from 0 to 1:

Resetting: Basically this is the opposite to setting a bit, namely you are changing it from a 1 to a 0. To do this you basically use the And. Once again, it is the opposite to setting a bit, you And each bit with a 1 with the exception of the one you wish to reset, where you And it with a 0. The reason being is that you don't want to change any of the other bits in doing so. The below tables hopefully will explain this more thoroughly.

Flip: Well, if setting changes a bit from 0 to 1 an resetting changes it from 1 to 0, then flipping does either or (that is from 0 to 1 or 1 to 0), and you do this using the method that pretty much seems like either or, namely Xor, and you Xor the bit with a 1, while Xoring the remainder with a 0. Now, in the example below, I've flipped two of the bangers (I'm not sure why I used that term, but our lecturer did, so I guess it okay), just so you can see how it worked.

So, what then if we need to set some bits, and then reset the rest, well we do that through two processes. Take this example for instance, where we wish to set bits 2 to 4, and the reset the rest. We set the first bits.

And then we reset the rest:

And Why!?!

So, I suspect the question that you are asking is why do we need to know all this. Well, the simple answer is because it is first year computer science. Then there is also the question of how do I know that you are asking the question. Well, you have got this far, and you are still reading so I assume that you have asked the question. Well, now that we have got this out of the way, bitmasking is basically used for correcting errors. Now, we are not talking about that wavy line under the English spelling of a word on an American word processor, but rather when information is passed between components in a computer, or even between computers themselves, sometimes things go wrong, and the computer needs to be able to correct them. If you have a look at some components, sometimes you will see ECC printed on them, meaning that they support error correcting code. Bitmasking is the way that computers correct these errors.

Now, we are only talking about one error here, namely because the computers are apparently so advanced that it is rare for there to be two errors in a code, and pretty much impossible for there to be three (at least when we are talking about the processes inside a computer, passing information between computers is a completely different story). However, the technology we have at the moment is that we can detect and repair a single error, and detect a second error. Computers do this through a process called parity. 

What this means, is that bits, known as parity bits, are inserted at specific points along the data to confirm whether the data that is being sent is correct. The number of bits required is determined using the following code:

Now, this really comes down to trial and error. So, m is the number of bits, and p is the number of parity bits that we might need. So, if the number on the left hand side is greater than or equal to the number on the right hand side, then we have our answer.

I'll leave the rest for you to work out.

So, the way it works, is that it places the parity bits into the code and then sends the code. When the code reaches its destination, it then passes through a checker. If the code is okay, then all is well. However, if it comes out as having an error, it is then passed through a correction algorithm, which uses bit masking, to correct it. The following diagram hopefully helps explain it:

Now, when setting the parity bits, they are placed into the parts of the code that represents powers of 2, ie 1,2,4,8 and so forth, and they cover the parts of the code that will be set to one. As with the example below, we will use the ASCII character P.

If you find this confusing, don't worry, you aren't alone because quite a lot of students were baffled by this. However, if you go back to your binary numbers, you can easily see the pattern. For the first bit, where there is a 1, it is covered, some for the second, and so on.

Now, the way it works is that the parity is set to even or odd. If it is even then when the code reaches the destination there should be an even number of 1s. if there isn't, then obviously there was an error and the code will need to be discarded. The same if the parity is set to odd - if there is an odd number then it is okay, if it is even then it isn't.

So, using an even parity, which means that each of the parity bits must make the bits that they cover even, will be 10100000010. Converting it into hexadecimal, it will be 502.

Now, the next step is how they are detected, and corrected. Each of the parity bits checks to see whether it is an even parity, that is whether it has an even number of 1s. Now, if there is an even number, then it is correct, however, if there is an odd number, then it is incorrect. So, what it does is that it compares the entries to the correct ones and if the parity above is correct, then that one is correct. This is done until the error is isolated. Here we will flip one of the bits to show how it is done:

Hopefully the colours help. So, what we have done is checked each of the rows and made sure that the parity is correct. This is not the case in row P3. So we then check each of the columns. Now, if one of the corresponding rows is correct, it means that the column is correct, so we move on to the next one until we get to a column where the corresponding rows is incorrect. Now, if one of the rows is correct, then the entire column is correct. In the above example, we have narrowed it down to column four, which is where the error lies.

Let us try it with two bits and see what happens:

Oh dear, now we have a bit of a problem. Where is the error? Is it in bit 1, 2, or 3. We know that there is an error, we just don't know where it lies. Is there a solution, other than the fact that we can simply assume that double bit errors are so rare we don't have to worry about them. Well, not quite.

The Magic of SECDED

Well, it isn't all that magical, but what it stands for is Single Error Correction, Double Error Detection. Well, the above one detected more than one error, but it happens to be a little bit more specific than that. It also came down to where I selected the error. If I had selected the error elsewhere, then we might have more than three possibilities. It tells us that there is an error, but not how many, or where. With SECDED, we can detect a double bit error, and correct a single bit error. We do this by adding a further column, called the parity column.

Basically, if the parity column says that there is an error, then it is likely to be a single bit error, and can be corrected as normal. However, if the parity column says that there is no error, but the code that comes across says that there is an error, then the indication is that it is a double bit error. 

Now, the parity covers all of the bits, so setting it to either 0 or 1 depends on whether it will make the overall bits even or odd (something that wasn't of a concern with the original).

Let us look at an example below:

Now, this is slightly different, because we have been told by the parity bit that there is an error, and it could be either in P1 or P4. So, we do a little arithmetic, being P0+P4+P1 = P5, which means the error is in row 5, so we flip it. Now, consider if the error is in bit 11. That would flag the parity bit, P1, P2, and P8. Doing our arithmetic, it would be P0+P1+P2+P8 which is, surprise, surprise, 11. So we flip that.

Now what if there are two errors? Lets try it.

Well, the parity says that everything is okay, but once we get to row P2 we are told that there is an error. It fingers column 2, but I can assure you that this isn't where the error is located. As such, we know that there are two errors, so this is discarded and resent.

How Common?

Well, not very, and not so much with the CPU. However, with the memory it is a little more problematic, so this is why they actually come with error correcting capacity. However, when I say problematic, I really mean very, very unlikely. Though considering how fast computers run these days, they probably happen a lot more often than we expect.

Creative Commons Licence

SECED - Fixing Up the Errors by David Alfred Sarkies is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. This license only applies to the text and any image that is within the public domain. Any images or videos that are the subject of copyright are not covered by this license. Use of these images are for illustrative purposes only are are not intended to assert ownership. If you wish to use this work commercially please feel free to contact me

No comments:

Post a Comment