Friday, 27 September 2019

Introduction to Databases

Okay, now we come to one of the fundamental uses of computers, and that is basically storing data. In fact, data is pretty much everywhere and like it or not, we are not only using it constantly, but we are also generating it. For instance, every time we look for a place on Google Maps we are accessing data, and in turn, if we have 'location sharing' turned on, and our phone in our pocket (or in our hands scrolling through our newsfeeds), then pretty much everywhere we go is being recorded. Actually, to add a further point, every search we enter into Google, and every website we visit is not only being recorded, but is also being stored in a database. Sure, we might be using DuckDuckGo instead of Google, and we might have private browsing turned on, but the thing is that all of this information is still being stored, namely in the databases managed by our ISPs (namely because the government has specifically legislated that this is done).

So, this subject basically covers two things - how data is stored, and how data is accessed, and to say that these two things are vitally important in our modern society is somewhat of an understatement. Look, I could go into the ethics and problems with all of this, but at this stage what I'll be writing about is the fundamentals of Database Management, as well as information regarding SQL (Structured Query Language), which is the language that is used to search databases (and while many people pronounce it as Sequal, I still like to call it Squirrel).

Anyway, to get an idea of how a database is set up consider the diagram below.


As you can see above, there are three layers to the database - the user, the Database Management System (DBMS), and the databases. The data in the database is actually stored on the hard drive, and the DBMS is the program that is used to store, access, and change the data in the database, but first of all let us consider the user.

Users

So, you could say that there are three types of users: the Administrator, the Developer, and the End User.

Administrator: This person pretty much has complete control over the database. They will be the one who builds the database from the ground up, and they also define the schema, which is basically how the database is laid out, how the data is stored, and the structure that the database takes. The administrator is also the one that grants privileges to other users, and you could say that they are the givers, and they are also the ones who take away.

Developer: Some of the major websites are a bit sly when it comes to handing out the role of the developer. The reason I say that is that these databases, despite the fact that they do rake in huge amounts of money, manage to get people to do this for free. Mind you, the role of the librarian on Goodreads, and the actual developer problem to drift apart in places.

Anyway, the developer basically manages the database, and have certain roles assigned to them from the administrator, including updating data and also being able to retrieve data that an end-user might not be able to. For instance, a Facebook user can only perform certain searches on the website, whereas a developer would be able to perform much more detailed and extensive searches, such that a user might not be able to do.

End-User: Yep, these people are us, though in some cases, such as the insurance company where I used to work, they would also be the average employee. Once again, it is difficult to assign specific roles considering that some end-users, such as the claims consultant, will tend to have much greater access to the database than the customer, who will only have access to information relating to their particular insurance policy. However, all things being simple, users generally fall into one of these three categories.


Schema

This word was thrown about above, and basically, as mentioned, is how information is stored in the database. The schema is determined by the administrator and deals with what sort of data is being stored. Further, the schema also sets out restraints on the data that is being stored in the database. However, to explain it better, let us use a real-world example, namely a customer database.

So, a company will have a list of all their customers (and note that companies tend to go to great lengths to build such databases, and will do many things to attempt to get you onto that database). So, a customer will have a customer number, and the restraint will be that each customer has to have it and that it has to be unique. Then there will be other things, such as a name, an address, an email address, and maybe even a date of birth.

Now, this will be written as such:

customer(custNo, name, address, email, DoB)
 
This is known as the relationship schema, basically how the data on the customers is recorded. 

Metadata

Now, there is also this concept of metadata, something that has basically be bandied around a bit in the halls of power of late (we don't want your data, we just want your metadata - George Brandis - though you can find out more about what these laws mean here.). So, what is metadata - well, it actually isn't your data, it is just that which identifies data - namely the schema is a form of metadata. Further, there is also the type of data that is stored here. So, the name will be text, as will be the address. The email is a specific type of text, while the date of birth will be a date.

Metadata also deals with constraints on data, as mentioned above, and also includes user privileges and the like. Though, honestly, now that I know what metadata actually is, it sort of makes me wonder how that information is going to be at all useful for the government to keep track of us, though this does sort of bring us to our next point - SQL. One thing that metadata allows us to do is to search and retrieve data.

SQL - The Language of Search

As I mentioned above, SQL stands for 'Structured Query Language' but as it turns out, it is more than just a simple language of search. However, before I continue, the best way to learn any programming language is through practice, and one of the best tools available happens to be W3 Schools. Unlike other sites, this site is completely free and allows you to practice to your heart's content. So, while I will in later posts be talking about SQL and how to use it, these websites are actually much better for practice.

The thing with SQL, as I have already mentioned, is that it is more than just a language for searching for things in a database. It is also a 'Data Definition Language' meaning that it is a language in which you can design databases and set out the schema. It is a 'Data Query Language' which basically means that you can use it to search for information in the database. Finally, it is also a 'Data Manipulation Language' which means that you can use it to change data in the database.

Oh, and you may have seen this cartoon:

Source: https://xkcd.com/327/
Well, our wonderful mother is actually using SQL here in its data definition form. In fact, as you get to know SQL much more you will be regularly encountering the phrase DROP TABLE.

The other thing about SQL is that it is a 'declarative' language, meaning that you use it to tell the computer the problem you wish to solve. This differs from a 'procedural' language in that there you tell the computer how to solve the problem. The thing with declarative languages is that they tend to be short and simple, though the problem does arise where you don't actually know whether you have entered the instructions correctly or not.

To give you a bit more of an example of the difference, say we want to count the number of customers we have in our database. Well, we would use the following command:

SELECT count(custNo) FROM customer;

Here is the same program in the C programming language:



Concurrency

Databases operate on the principle of concurrency, which means that multiple users need to be able to access it at the same time. Actually, I still remember a database I used to use at work, and one of the biggest problems we had was that two people could not access the same claim at the same time - it would throw up errors. They did eventually solve the problem, but it did take quite a while.

Now, when an operation is performed on a database, a process known as a transaction, it must result in an all or nothing approach. That means that it is either fully completed, or nothing happens at all. There are no half measures when it comes to databases. The other thing is that once the operation has been performed, the database needs to be saved, otherwise everything will be lost. Then again, that goes with quite a lot of other programs as well.

So, this comes down to a system that is known as ACID:

Atomicity - this is the all or nothing approach.
Consistency - like atomicity, when a transaction is performed it must go from a consistent state to a consistent state, meaning that after a transaction, the results cannot be negative.
Isolation - basically each transaction performs in isolation from all the other transactions being performed. Namely, the database needs to prevent the users from interfering with each other.
Durability - the effect of the transaction, once completed, can never be lost.  
So, I'll finish off here with another diagram, just so you get the picture.

 
Creative Commons License

Introduction to Databases by David Alfred Sarkies is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. This license only applies to the text and any image that is within the public domain. Any images or videos that are the subject of copyright are not covered by this license. Use of these images are for illustrative purposes only are are not intended to assert ownership. If you wish to use this work commercially please feel free to contact me

No comments:

Post a Comment