Monday, 21 January 2019

Storage for the Masses

No, I'm not going to be talking about data farms, or those huge buildings in the middle of nowhere that simply exist of hold all of your private data that the government has been slowly collecting (namely because you have been willingly giving it to them by basically publishing your entire lives online). I'm not even going to be talking about cloud services, namely because that will be covered somewhat later. No, I'm just going to be boring and talking about the storage devices in your computer.

We've already spoken about memory, which is basically volatile memory - mostly - so the storage devices we will be looking at now are generally referred to as non-volatile memory, that is that it doesn't matter whether you turn your computer off, the data will still be saved.

I'm not going to be going too far back here, you know back in the days when basically everything was stored on paper cards, and you had to keep them in a specific order because if they got out of order then the program wouldn't run (and it was a nightmare when you were carrying a huge stack of them, and you tripped and fell). There were other forms, but I will mention a few here:

See what I mean - By ArnoldReinhold - Own work,

Tape: Ironically, this is still in use today. You generally don't find many tapes available, but back in the glorious eighties they were everywhere. In fact kids like me would have draws full of them containing songs that we had copied off of the radio, and some of us even had computer games on them. Look, back then we complained that tapes were slow, but that actually wasn't the problem - they were just as fast as disks - it's just that they were read sequentially, so if you wanted to load something off the tape you had to wind it, or rewind it, to the appropriate spot. Ditto if you wanted to save something, and you had to make real sure that it didn't accidentally save over something you already had on there. There are others, such as configured the tape deck, but I might come back another time since I consider this medium is actually quite fascinating.

Floppy Disks: Another medium that is basically obsolete. They were called floppy, as opposed to hard, because the actual recordable medium was quite floppy. Like tape, they store data magnetically, but not sequentially. They are the other form of storage - Random Access. The problem with these was that, well, they really couldn't hold all that much, and the older ones, such as the 5.25", could be damaged if you didn't watch out. They solved this with the 3.5" by placing the disk in a hard outer shell, and had a sliding piece of metal to cover the hole. Once again, this is also something I might return to at a later date.

Hard Disks: These are still in use today, and are characterised by the fact that the medium in which the data is written is hard. They store data magnetically and it is also read randomly. These days hard disks are pretty cheap, and can literally hold terrabytes of data (a 1 Terrabyte hard drive is now less than $100.00). The problem is that they are slow, very slow. Being random access, there is also a tendency for the data to fragment. Initially everything is stored in sequence, however once sections are deleted, the computer will go and write over that section with other data. All of the sudden your data is scattered all over the disk.

Modern hard drives are composed of platters, sort of multiple disks sitting one on top of the other. Data is stored by setting the magnetic polarity of the section of the disk, so it is N - S for 1 and S - N for 0. There are actually two heads on the arm, one for writing and one for reading. Hard drives platters are also divided into tracks, and sectors, that is the track is the distance from the centre, and the sector is the segment of the hard drive. So, the hard drive locates which platter the data is on (the platters are double sided), then seeks the track, and then the sector. This does pose a problem when the data is spread over multiple platters.

Anyway, here are some images to help you make sense of what I was talking about, firstly what the inside of a hard drive looks like:

Well, it looks as if hardrives do have multiple spindles. Anyway, an image of how the data is stored:

Finally, I felt that a diagram of the track and the sectors is much better than trying to explain it (while it says 'floppy drive' it is the same for a hard drive):

The other problem with hard drives (and this probably applies to the others that I have mentioned) is that it is mechanical, which means that they basically rely on moving parts that are thus much more susceptible to decay. However, hard drives deal with this through what is called head parking. Basically, when the drive is not in use, or the computer is powered down, the head moves off the platter into what is effectively a 'car park'. They are also called landing zones. Laptops even have what is called an accellerometer, which detects if the computer is falling, and will automatically park the heads. Now, when the head actually comes into contact with the platter, this is referred to as a head crash, which can pretty much make the drive unreadable. One of the reasons we should always unmount our external hard drives is because when we are unmounting them, one of the things that it does is that it parks the head (and also finishes off anything that it happens to be doing). If you don't, the head will remain where it is, and if you drop the drive, the head could be damaged, and the data lost (as I know all too well).

The other issue is that it relies on magnetic media, which means that if a nuclear bomb is detonated nearby then the resulting electromagnetic pulse is basically going to wipe all of your data. Then again, if a nuclear bomb goes off in the vicinity, you probably have bigger problems to deal with.

Compact Discs: I'll include DVDs in this category as well, though the proper term is optical medium. Initially CDs were read only, and then you could get single use CDs (and DVDs) and then you could get multi-use ones. When we wrote something to the CD we would refer to it as 'burning', which is actually what is happened - a laser in the device was burning the information onto the CD. The original ones were made of plastic, but the later ones had a chemical coating that allowed the information to be rewritten. The data is stored by a series of pits. Where the medium goes into a pit it is a 1, where it doesn't, it is a 0, as such:

Unfortunately CDs are vulnerable to scratching, because if it is scratched suddenly the data changes. It also causes the CD to jump, as you may know if you have listened to a scratched CD (that is if you have ever listened to a CD). The other thing is that they are mechanical, which means the devices are slow and are prone to wear and tear. However, while they can hold substantially more data than a floppy disk, they still hold nowhere near as much as a hard drive.

The other thing about a CD is that it is sequential medium, which means data is store, and read, in order. The track on the CD is actually a spiral that winds down to the centre of the disk, much like the old vinyl records. Oh, and they aren't magnetic either, which means the data can survive an EMP pulse from a nuclear attack (which is why I would use CDs as a backup medium).

Solid State Drives: These drives are basically made up of a series of chips, much like the RAM circuits in the computer. The difference is that you can read and write to them. They are substantially faster than Hard Drives, however they are also much, much more expensive. The other thing with solid state drives is that they suffer from wear, which means that the more you write to them the more wear they suffer. This is solved by a process called wear leveling, in that the entire drive will be written to before the computer starts rewriting over older sections that have been 'erased'.

Also, unlike the other media, SSDs don't have any moving parts, but are controlled by a section of the drive called the controller. The controller actually determines the speed of the drive, and makes decisions on how to read, write, and clean up data that is on the drive. The drive uses a series of electrical cells that are divided into grids, and these grids are separated into pages, and these pages (where the data is stored) are then divided into blocks.

SSDs don't actually write over the data as other drives do, but rather they search the drive for pages that are no longer being used, and make sure that the surrounding pages are also not being used. They then basically blank them, and then write the data onto the blank section. It is like a sheet of paper full of scribble - you simply can't write over the scribble and hope it remains legible. Instead, what you do is you rub the scribble out, and then write onto the paper (you still write things on paper, don't you?).

There is also a type of drive called a Hybrid, which is bascially half an SSD and half a hard drive. I won't really go into any more detail because they seem to be trying to get the best of both worlds, but in reality are only getting half of none. In reality, you might has well have an SSD and a hard drive in your machine (like my laptop has).

Staying with SSDs, you have different types (of course), and that is usually to do with the number of layers. Single layers really only have two states, 0 and 1, and a lower threshold voltage. Remember how I mentioned that SSDs are prone to wear? Well, that is the threshold voltage, and the lower the better. You then have the dual layer, which has four states: 00,01,10,11. Then there is the triple layer, which has eight states: 000, 001, 010, 011, 100, 101, 110, 111. Of course, there is even that quad layer, which has sixteen states, but I suspect (or hope) you get the picture. Anyway, the more layers, thee higher the threshold voltage, which means the more likely they are to wear. Those flash drives you use to transfer data are actually triple layer SSDs, which means that they are actually quite prone to wear (which is why they are comparatively cheap).

Now, it seems that the term 'Threshold Voltage' has been bandied around without saying anything about what it actually means. Well, in technical terms, it is the amount of voltage that is required to force a connection in a transistor. Basically, in simple terms, it is the amount of force that is required to open the gate. Now, this is important for SSDs, because the deeper the layers, the more force that is required to open the gates. As such, while these deep layer SSDs (I should call them flash drives, because that is what a flash drive actually is, a three layered SSD) may be able to hold more, more force is actually required to 'open the gates' and as such they are more prone to wear and tear.

So, the SLC has one data bit per cell, has the fastest writing speed, the longest life, and is the most expensive, the MLC (or dual layer cell) has two bits, the TLC has three bits, and the QLC has four bits and is the slowest, has the shortest life, and is the cheapest. There is also an eMLC, called an Enterprise Multilevel Cell, which is actually more robust than your typical MLC, but they are generally only used in the commercial world. Also, SSDs are used in your mobile devices, and are generally TLCs (namely due to the cost, but they also are nowhere near as bad as the QLCs).

I probably should say a few more things about the controller. Basically it is a processor in the SSD that executes firmware level code, and manages the SSD. However, in a hybrid it also performs the function of managing the hard drive as well. Still, your typical hard drive also has a controller which pretty much performs the same function as the SSD controller. It's sort of like the warehouse, archive clerk that knows where everything is, and has his own system for finding it (which is why you can't sack him).

They also perform other functions, such as correcting errors, wear leveling, utilising a cache for items that are being retrieved, and also noticing and dealing with bad blocks. In some of the more advanced systems, it also performs encryption functions. The SSDs are also over provisioned (which means they have more than what is advertised) to provide space for the controller to perform its functions.

Understanding the Statistics

There are a few things that we need to understand about drives when we are looking at their states (though once again, these are never actually printed, you have to dig around for them). One of them is the 'burst transfer rate'. This is the speed at which data is moved between the drive's controller and the rest of the computer while the 'Sustained Transfer Rate' is the speed at which data is moved from the platters and into the controller. While the burst transfer rates are generally faster, the sustained transfer rate is actually more indicative of the drive's performance. Burst rates aren't going to be sustained if there are bottle necks in the PC, and if the data is not sequential on the hard drive.

Spindle Speed: not really important, but hard drives spin faster than CDs, namely because CDs aren't secured in place in the same way that hard drive platters are. Desktop hard drives spin faster than laptop hard drives because with laptops you have power considerations, particularly if it is unplugged. Finally, server hard drives spin faster than desktop hard drives because, well, of the noise factor. Sure, if you can deal with the noise, then go your hardest, otherwise just enjoy the sound of silence.

I mentioned a difference between the way data is stored on the CD and a hard drive. The reason being is that they are read differently. CDs, particularly the older ones, have a spiral coming out from the centre, and they work on what is called 'constant linear velocity'. However, the hard drives have concentric circles coming from the centre, and are divided into sectors. This is 'constant angular velocity' and with that you find that the data density at the end tends to be greater than that closer to the centre. With the CAV the speed stays the same whether near in centre or the edge, but with CLV, the speed changes depending on how close, or far, you are from the centre.

IOPS: This is basically Input/Output per second, and measures how fast the driver can perform read/write requests.

Throughput: Is the speed that data is transferred into or out of the device and is measured in bits per second.

Latency: Here is that word again - it seems to appear everywhere, and pretty much measures how long it takes for a device to being a task.

I'll finish off with another screen, which is an IOPs test comparing a Hard Drive and a Flash Drive.

Notice that there are a number of tests performed. First is 16MB, which is a large sequential file. The next is 4K, which is a small random file, and the final is 512B which is also a random read, but the data is more scattered. This is the IOPs measurement, but you also have them for throughput and latency.

Creative Commons License

>Storage for the Masses by David Alfred Sarkies is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.This license only applies to the text and any image that is within the public domain. Any images or videos that are the subject of copyright are not covered by this license. Use of these images are for illustrative purposes only are are not intended to assert ownership. If you wish to use this work commercially please feel free to contact me

No comments:

Post a Comment