Sunday, 14 April 2019

Using the Cloud - Preserving Privacy

One thing is that many of us really don't see much beyond the time when we enter our username and password into a website and then go about our normal business. Sure, every so often we hear about how a website has been hacked and data stolen, but generally many of us don't give much of a thought about how, or even if, our data is secure. This is very much the case now with the cloud, particularly since the cloud is able to provide much more computing power, at a much cheaper price, than either our personal desktops, or even the company server. However, while the cloud might be pretty powerful, we need a way of being able to use it while maintaining the privacy of our information.

However privacy goes much further beyond making sure that people we don't want looking at our Facebook statuses and updates don't (though the solution to that is to basically not post anything at all) and to real information that many of us wouldn't want being made public - such as our medical records or our financial situation. The is where the concept of privacy preserving computations come into play. The thing with privacy is that if it is breached it could be used to commit fraud, or even worse, particularly if your medical records somehow land up in the hands of, say, your employer (who really should be allowed to have access to it anyway).

The other thing is that we have these companies that mine data - Facebook is a classic example. As one person suggested, if you are getting a service for free, then the product is you. The thing with this data mining is that it can be used to generate targeted advertising, or even worse. For instance, with the amount of personal information many of us post on Facebook, it is scarily easy for somebody to assume our identity. However, many applications are very data heavy, and need extra computing power to process the information, but the problem is that the cloud simply cannot be trusted. As such we need a way of using the cloud, while maintaining our privacy.


So, basically we simply can't use the cloud to perform its functions on unsecured data because, well, there are privacy issues to take into account, and honestly, if the data is unsecured then pretty much anybody can look at it. However, we can't just encrypt the data because when it is encrypted then we simply are unable to do anything with it, except maybe to unencrypt it. So, what we need is a way to perform these functions and to perform them in the way that the data is, and remains, secure. The thing is that if we are encrypting the data, and then performing the operations on the data, it is also going to take longer, and require more power, so we will also need a way to distribute this workload.

What we thus need is a way to encrypt the data, send it into the cloud, have the cloud perform the function on the data, and then either return the result to us, or forward it to the third party. Now, we will basically be looking at how we can perform addition and multiplication, namely because all of mathematics boils down to these two concepts - well not quite because multiplication is actually a form of addition, though since we have methods of securely performing multiplication, we can include that as well.

Homomorphism

So, homomorphism is what is known in algebra as a structure preserving map, or in computer science the word algorithm is probably a better way of describing it. So, we have partially homomorphic proceedures, but we don't as yet have a full-homomorphic proceedure. For instance, RSA and El-Gamal can perform multiplication, but not addition while Pallier can perform addition, but not multiplication (which sort of doesn't really make sense since multiplication is basically an extension of addition). However, even if we did have a fully homomorphic proceedure, the problem would be the speed at which the operation is performed. The more complicated the operation, then the slower the procedure takes to complete. Actually, we do have some fully homomorphic schemes, but that are so slow that we might as well not bother with them and simply do the procedure by hand.

RSA Multiplication

Okay, let us have a look at some of these procedures in operation, beginning with RSA, which happens to be the easiest.

So, we have M1 = 3 and M2 = 4, and we want to multiply these numbers together. So, the public key n=33 and e = 7 and the private key d=3 are generated. Now that we have these, it is time to encrypt the message:

C1 = M1e mod n =  37 mod 33 = 9.
C2 = M2e mod n =  47 mod 33 = 16.

So, now that the values are encrypted, they are sent to the cloud where the numbers are multiplied, producing the answer 144. This is then sent to the receiver (whether it be the original computer or not), where the answer will be decrypted.

MA = CAd mod n = 1443 mod 33 = 12


El-Gamal Multiplication

Okay, that we pretty easy, but now it is time to move it up a notch and have a look at another process, this time using El-Gamal:

So, first of all we select a prime number, p, which in this situation will be 2879. We then select the generator g, which is 2585. Then we select the secret key x=47 and from there generate y, which is:

y= Gx mod p = 258547 mod 2879

The numbers that we want to add are then sent to two different servers (I never said that this was going to be easy) where the random numbers are chosen. Server 1 chooses r1 = 154 and Server 2 chooses r2 =96. They then encrypt m1= 5 and m2=6 as follows:

C11 = gr1 mod p = 2585154 mod 2879 = 1309
C12 = m1*yr1 mod p = 5*2585154 mod 2879 = 199

On server 2, we do a similar thing:

C21 = gr2 mod p = 258596 mod 2879 = 1138
C22 = m2*yr2 mod p = 6*258596 mod 2879 = 2433

Now that we have encrypted the data they are then sent into the cloud to perform the calculation.

C3 = C11*C21 mod p = 1309*1138 mod 2879 = 1199
C4 = C12*C22 mod p = 199*2433 mod 2879 = 495

Now that the functions have been performed we note that we still have two numbers, when in reality we are only looking for one. Well, obviously this isn't quite over, so we need to do something else. However this isn't done in the cloud, but rather it is performed on the client computer.

C4 mod p  = 495 mod 2879       = 30
C3x mod p    119947 mod 2879

So, MA = 30, which we will note is the correct answer.

If you thought that was a little complicated, let us move on to the final one, and that is addition with Pallier:

Pallier Addition

Well, we have the two numbers M1=4 and M2=1. Now for this to work we need some other numbers, so if you refer back to the post on Pallier, you will see how we derived them, but so as not to go over old ground, we will simply produce them as follows: p=5, q=7, n=35, 入 = 12, g=164, and μ = 23. We also need two random numbers, so we have r1=6 and r2=17.

Now, we encrypt the numbers using gmrn mod n2. This produces C1=416 and C2=127.

Now that we have encrypted the numbers, we can send them into the cloud to add them. Well, we aren't actually adding them, rather we are multiplying them, so C1*C2 = 416*127 = 52832, and this number is then sent to the receiver (whoever that may be) to be decoded.

So, by using the formula m = L(cλ mod n2)*μ mod n, where L(u) = (u-1)/n, we come back to the value of 5, which is 4+1.

In Action - E-Voting

Well, e-voting seems to be pretty controversial, particularly by the people who support the losing candidate. However, this is one area where this concept works, and that is the secure addition of numbers. Basically, when the vote is cast, it is sent into the cloud where the results are all compiled, and this is then sent down to the voting authority who will decrypt the data and thus produce the result. The thing is that because computers really are mysterious entities that seem to do things out of sight of prying eyes, people do get concerned over the authenticity of the result. For instance, here in Australia, votes are counted manually, and scrutineers from the various parties will be in the room making sure that the right vote for the right candidate is counted - this is something that isn't all that possible when it comes to electronic voting, though some would argue that this removes the need for scrutineers since ambiguity is something that can be done away with - you either push one button, or the other, you can't just sort of push it, but not.


In our example, we will have two candidates, say Hillary Clinton and Donald Trump. Well, in this instance we will only have five voters, just for simplicity's sake. Now, each of the votes is represented by a 4 bit number, so if you vote for Clinton, your vote will be 0100, which if you vote for Trump, your vote will be 0001. Now, When these votes are cast, they will be encrypted, and sent to the cloud where they are all added together. 


Basically, each of the voters has a private number, which is used to encrypt the vote. Once encrypted, the votes are collated in the cloud using the Pallier encryption system above, and the answer is then sent to the voting authority, where the result is decrypted. In this example, the result comes back as 14. Now, that doesn't actually mean anything, that is until we turn it into binary: 1110, which is then split in two to produce 11 for Hillary, and 10 for Donald. Thus Hillary got three votes while Donald only got two.

Well, we could easily say that Hillary was the winner, except that in the United States the election is determined by the electoral collage and not by a simple majority, so despite the fact that Hillary got more votes, Donald still wins because, well, that's just the way things happen.

There are other applications as well, such as electronic meter reading for electricity usage. Normally some guy just comes along, opens up a cabinet at the side of the house, reads the meter, makes a note of the reading, and then moves on. This method means that electricity usage can be monitored and read across the entire day, and we can also get more accurate readings and indications of usage. However, this type of information we don't want people to get hold of, because electricity usage charts can give people a pretty good idea of when somebody will be home, and when they won't.

Consider another option - say you have multiple bank accounts, and funds spread out across these accounts. Say then that you want to purchase something, and while you have the funds, you don't have the funds in a single account. As such, this system can access the bank accounts and tally up all of the amounts and then determine whether you have enough to purchase the item, all in the while not actually knowing how much is in each of the accounts.

This could so be used to store biometric data for security, or even protecting your location through the use of the GPS system (since all phones these days are GPS equipped). So, as you can see, this whole concept of being able to perform calculations in the cloud, or in fact doing anything in the cloud, requires there to be strong security, and in a way it does actually go beyond simply adding two numbers together.


Creative Commons License

Using the Cloud - Preserving Privacy by David Alfred Sarkies is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. This license only applies to the text and any image that is within the public domain. Any images or videos that are the subject of copyright are not covered by this license. Use of these images are for illustrative purposes only are are not intended to assert ownership. If you wish to use this work commercially please feel free to contact me

No comments:

Post a Comment