Sunday, June 20, 2010

RSA - Public Private Key Algorithm - Cryptography

Rivest, Shamir and Adleman, the inventors of the algorithm. Rivest Shamir Adleman (RSA) Authentication Mechanism is used to simplify the security environment for the Flexible Management Topology. It supports the ability to securely and easily register new servers to the Flexible Management topology. With the Flexible Management topology, you can submit and manage administrative jobs, locally or remotely, by using a job manager that manages applications, performs product maintenance, modifies configurations, and controls the application server runtime. The RSA authentication mechanism is only used for server-to-server administrative authentication, such as admin connector and file transfer requests.

RSA Key Length:
When you create an RSA key pair, you specify a key length in bits, as generally you would for other algorithms. Specifically, the key length of an RSA key specifies the number of bits in the modulus. In this article we specified a key length of 2048 bits. But in practice, what RSA key length should we choose?

First the short answer:

  • A RSA key length of 1024 bits is sufficient for many medium-security purposes such as web site logins.
  • For high-security applications1 or for data that needs to remain confidential for more than a few years, you should use at least a 2048-bit key, and consider having a contingency plan for migrating to larger key sizes. 
  • To keep data confidential for more than the next two decades, RSA recommends a key size larger than 2048 bits (see below).
So, why not just make the key much longer, say 4096 bits or even 8192 bits? Well, as usual, there's no such thing as a free lunch. A larger key increases the maximum number of bytes that we can encrypt at once, and also the security of the encryption. But it has a serious problem in practice:
  • With every doubling of the RSA key length, decryption is 6-7 times times slower.
Figure 1 shows how decryption time increases with modulus length. The timings were made on a 2GHz Pentium.


Figure 1: RSA decryption time by key length.
The key length also affects the speed of encryption, but it's usually the speed of decryption that we're more concerned about because (a) that's the part that takes place on the server, and (b) decryption is much much slower than encryption, because the decryption exponent is huge (whereas the encryption exponent is typically small).
If we use a 4096-bit modulus, it takes around a second of CPU time to decrypt a block of data. Even if you were able to sacrifice this amount of CPU to every log on, it leaves us with the problem that an attacker can effectively burn a second of CPU time on our server by firing some random data at it. With a 1024-bit key length, decryption takes just 25 milliseconds; with suitable restrictions on the rate of login attemps (and thus decryptions) we allow per remote client, protecting against a "CPU burn" attack is more feasible.

How secure is an n-bit RSA key?
As ever, judging the security of a key of a given size is a complex issue. With current knowledge, "breaking" an RSA key by brute force effectively means factoring the modulus. The largest number that has been factored publically to date is RSA-640, a 640-bit number put up as a challenge by RSA and factored in 2005. This number took "only" around 350 CPU hours (using a cluster of 80 2.2 GHz Opterons). Put another way, you can rent that CPU time from Amazon for about 50 dollars. This is a simplistic view: it doesn't take into account memory and data transfer requirements. And the experimental software used by the team isn't exactly a "plug and play RSA cracker": it surely requires considerable configuration by somebody well versed in number theory.
Factoring RSA 512-bit keys is now squarely within the reach of anyone who is determined enough. As testimony to this, several 512-bit RSA keys used to sign the operating systems of Texas Instruments calculators were recently factored, reportedly within "several months".
So what about 1024-bit keys? Generally, this size will keep your data safe now from an adversary with modest resources. But it's not sufficient for keeping data confidential much into the future, or for keeping it secret from an adversary prepared to devote a few million dollars to the problem. To see why, we'll look below at some estimates on the difficulty of breaking 1024-bit RSA encryption.

One estimate is made by Shamir & Tromer (2003) in their hypothetical TWIRL device. They suggested that for "a few dozen million US dollars", a hardware device could be built to break a 1024-bit RSA key within around a year. Franke et al (2005) similarly estimate a cost of 200 million dollars2 for a machine to factorise a 1024-bit number in one year. If these cost estimates are accurate, it's safe to assume that the NSA has built such a machine (unless they have another way of breaking RSA more efficiently). And by Moore's Law alone, we'd assume that their machine takes considerably less than a year.

Based on Shamir & Tromer's estimate, Kaliski (2003)— see reference in footnote 1— recommends the following RSA key lengths depending on how long data is intended to remain confidential:


Recommended RSA key sizes depending on lifetime of confidential data.

Lifetime of dataRSA key size
Up to 20101024 bits
Up to 20302048 bits
Up to 2031 onwards3072 bits

Shamir & Tromer considered hardware because they estimated that a solution in software would not scale beyond around 500 bits. Thorsten Kleinjung (one of the tem that broke RSA-640) estimates that around 8.4 million CPU years are needed to factorise a 1024-bit number in software3 (his estimate is specifically 8.4 million uniprocessor PCs, taking into account memory and data transfer requirements). Using my favourite crude approximation, that's a million or so dollars of rented CPU time in 2009. It's not clear if and how this would scale to, say, several thousand 256-core machines (bearing in mind that that could be a fairly modest botnet by, say, 2020).
Ferguson & Schneier (2003) in Practical Cryptography are actually more conservative than the RSA recommendations:
    "The absolute minimum size for n is 2048 bits or so if you want to protect your data for 20 years. [...] If you can afford it in your application, let n be 4096 bits long, or as close to this size as you can get it." (p. 233)

They also recommend checking that your software supports keys up to 8192 bits, "just in case". To my knowledge, Sun's RSA implementation does in principle support this size, but at present it is impractical performance-wise.

RSA Encryption in Java
We introduced the notion of asymmetric encryption, in which a key needed to encrypt data is made public, but the corresponding key needed to decrypt it is kept private, for example in a file on the server to which clients connect. In principle, such a system solves the problem of how to send a temporary encryption key securely to the server when opening a secure connection*. A very common asymmetric encryption system is RSA, named after inventors Rivest, Shamir & Adleman.

*If we just used a symmetric encryption system such as AES on its own, we'd have the problem of how the two parties agree on the key in advance. This is sometimes possible, but usually, we want to transmit a temporary key at the moment of connecting. Generally, we use an asymmetric encryption system such as RSA to transmit the secret key, then for the rest of the conversation, use a regular symmetric encryption system using that secret key.

Basic algorithm and terminology:
RSA encryption and decryption are essentially mathematical operations. They are what are termed exponentiation, modulo a particular number. Because of this, RSA keys actually consist of numbers involved in this calculation, as follows:
  • The public key consists of the modulus and a public exponent.
  • The private key consists of that same modulus plus a private exponent.
Those interested in more details of the RSA algorithm should see this separate page.

Creating an RSA key pair in Java
As a one-off process, we need to generate an RSA key pair that from then on, we'll use for all conversations between our clients and server.
Creating an RSA key pair essentially consists of picking a modulus, which is based on two random primes intended to be unique to that key pair, picking a public exponent (in practice, the common exponent 65537 is often used), then calculating the corresponding private exponent given the modulus and public exponent. Java provides the KeyPairGenerator class for performing this task. The essential idea is as follows:
  • we create an instance of KeyPairGenerator suitable for generating RSA keys.
  • We initialise the generator, telling it the bit length of the modulus that we require (see below).
  • We call genKeyPair(), which eventually returns a KeyPair object.
  • We call getPublic() and getPrivate() on the latter to pull out the public and private keys.
The code looks as follows:
KeyPairGenerator kpg = KeyPairGenerator.getInstance("RSA");
kpg.initialize(2048);
KeyPair kp = kpg.genKeyPair();
Key publicKey = kp.getPublic();
Key privateKey = kp.getPrivate();


Notice that we specify a key length of 2048 bits. Common values are 1024 or 2048. Choosing an RSA key length is a tradeoff between security and performance.

We now have two Key objects. Whilst we could immediately use these to start encrypting and decrypting, in most practical uses, we want to store these two keys for later use. For that, we use a KeyFactory to pull out the components (modulus and exponents) of the keys.

Saving the public and private key
In practice, we need to store the public and private keys somewhere. Typically, the private key will be placed on our server, and the public key distributed to clients. To store the key, we simply need to pull out the modulus and the public and private exponents, then write these numbers to some file (or put in whatever convenient place).

The Key interface allows us to pretend for a second that we don't need to worry about the algorithm-specific details of keys. But unfortunately, in practice we do. So there also exist "key specification" classes— RSAPublicKeySpec and RSAPrivateKeySpec in this case— with transparent methods for pulling out the parameters that make up the key. Then, a KeyFactory allows us to translate between Keys and their corresponding specification. It's a bit clumsy, but the code ends up as follows:
KeyFactory fact = KeyFactory.getInstance("RSA");
RSAPublicKeySpec pub = fact.getKeySpec(kp.getPublic(), RSAPublicKeySpec.class);
RSAPrivateKeySpec priv = fact.getKeySpec(kp.getPrivate(), RSAPrivateKeySpec.class);
saveToFile("public.key", pub.getModulus(), pub.getPublicExponent());
saveToFile("private.key", priv.getModulus(), priv.getPrivateExponent());

To save the moduli and exponents to file, we can just use boring old serialization, since the modulus and exponents are just BigInteger objects:
public void saveToFile(String fileName, BigInteger mod, BigInteger exp) throws IOException {
  ObjectOutputStream oout = new ObjectOutputStream(new BufferedOutputStream(new FileOutputStream(fileName)));
  try {
    oout.writeObject(mod);
    oout.writeObject(exp);
  } catch (Exception e) {
    throw new IOException("Unexpected error", e);
  } finally {
    oout.close();
  }
}

In our example, we end up with two files: public.key, which is distributed with out clients (it can be packed into the jar, or whatever); meanwhile, private.key, is kept secret on our server. Of course, we needn't even bother saving the keys to a file: they're just numbers, and we could embed them as constant strings in our code, then pass those strings to the constructor of BigInteger. Saving the keys to file simply makes it a bit easier to manage keys in some cases (for example, we might change keys periodically and distribute new key files to the clients; distributing a few bytes is often more practical than distributing the entire code base).

Now we've got a mechanism to generate a key pair and save those keys for the future, we're ready to consider how to actually perform RSA encryption/decryption, reading in the key files we generated.

The RSA Algorithm:
For performing RSA encryption with Java, you luckily don't need to know all the gory details of how RSA works. But it's worth having an overview, at least so that you can understand the terminology. (Note that outside of Java, you do need to know some of these details to use RSA securely; luckily, at least Sun's implementation solves some potential security issues that you could otherwise run into if you just used RSA "without thinking about it".) OK, hold on to your hat...

Basic RSA equation:
The essential idea is as follows:
  • we pretend that our message to encrypt is a number (although conceptually, we want to encrypt/decrypt some arbitrary series of bytes, we can always put them together and treat them as a big number).
  • We pick two large prime numbers, call them p and q.
  • We multiply p and q together to form the modulus (n).
  • It is then mathematically possible to find two exponents, generally called e and d, so that.
    • med (mod n) = m (mod n) [where m is the number representing the message we want to encrypt.]
In other words, if we raise m to the power e, taking the answer modulo n, there exists some other number d, so that raising the result to d (and taking the answer modulo n again) gets us back to the original number. Thus, we have a way to perform encryption then decryption, each using a different number: e in one case and d in the other.

Using the RSA equation in practice
Once we've picked values for p and q (remember, these are random large primes), we need to pick suitable corresponding values for e and d. In practice, it's easiest to just use some small value of e that we know is suitable (65537 is commonly used), and then calculate the corresponding d. Then:
  • To encrypt a message, we calculate me (mod n).
  • To get back to the original message, the person on the other end will take this number and raise it again to the power d (overall, that means between them they'll have calculated med and gotten back to the original message).
  • The person encrypting needs to know only the values of e and n— these two numbers therefore become the public key. 
  • Only the person decrypting needs to know the value of d (and the value of n)— these two numbers therefore form the private key.
How to calculate d
The method of calculating d actually turns out to be the basis of the security of RSA:
It is mathematically easy to find d if and only if you know p and q, but otherwise, it is extremely difficult.
Remember that p and q are initially chosen by the person creating the key pair, but then these numbers are discarded (or at least, kept secret) once d has been calculated; only the result of multiplying the two numbers together (n = pq) is made public (along with e), and in practice, this multiplication cannot be undone if p and q are large enough.

Terminology
The following terminology is generally used, and is useful to know when working with the Java RSA classes:
  • n is called the modulus.
  • e is the public exponent.
  • d is the private exponent.
  • the public key consists of the pair (e, n).
  • the private key consists of the pair (d, n)— though in fact n is publically known.
Security of RSA
Purely in theory, RSA could be attacked as follows:
  • Somebody could factorise n to get the corresponding p and q, but there is no publically known way to factorise numbers efficiently, so that this becomes impossible in practice if n is sufficiently large (say, a few thousand bits). 
  • Somebody could come up with "some other way" of calculating d given e and n— again, no such method is known (or at least, not publically known) at present.
  • If the original message is predictable, somebody could make repeated guesses at the original message, calculating me (mod n) each time— where m is the guess in question— until the result matched the encrypted version.
To get round the latter problem, it is important to make sure that the message is unpredictable by appending random padding. Sun's implementation handles this (but another implementation might not by default). Adding padding also solves a problem with very short messages: if the message is so short that me is less than n, then an attacker can simply calculate the eth root of m to retrieve the message. (Once it exceeds n, and so the number is actually reduced modulo n, such a calculation isn't feasible.)

Symmetric Key Encryption in Java:
Having introduced the the notion of an encryption key, we turn our attention to symmetric-key encryption. This is the case where the same key is used to both encrypt and decrypt. It sometimes called secret-key encryption, single-key encryption or simply symmetric encryption.

In traditional fashion, we'll assume that Alice wants to send some data securely to Bob. This means Alice is going to encrypt the data, and then when he receives it, Bob will decrypt it again. Firstly, somehow or other, Alice and Bob have to agree between them on what the secret key will be. We'll assume that the key is some sequence of bytes of an appropriate length. (We mentioned in our introduction to encryption keys that in fact, with certain algorithms, you have to make sure the key has a certain structure and/or avoid "weak" keys; in fact, it turns out that for our particular example this won't be a problem.) In a moment, we'll come back to the problem of how the key is chosen, what length it should be, and how Alice and Bob can agree in advance on what that key is. For now, let's pretend they "just know" what the secret key is, and worry about the Java code that they each run to actually encrypt and decrypt their data.

In this example, we're going to use a widely-adopted encryption algorithm called AES. (We'll consider one or two other algorithms later. But in general, if you need to use a symmetric encryption system and you're not sure which algorithm to pick, AES is probably the one you'll end up using unless you have a strong reason to use something else.) Then, as a first attempt (which we will need to revise), this is essentially the pattern for encryption/decryption:
Very basic AES encryption/decryption functions in Java.
N.B. In reality, this code contains some insecurities that we would need to revise for various common uses (see below).

Alice's encryption codeBob's decryption code
byte[] key = //... secret sequence of bytes
byte[] dataToSend = ...

Cipher c = Cipher.getInstance("AES");
SecretKeySpec k =
new SecretKeySpec(key, "AES");
c.init(Cipher.ENCRYPT_MODE, k);
byte[] encryptedData = c.doFinal(dataToSend);

// now send encryptedData to Bob...
byte[] key = //... we know the secret!
byte[] encryptedData = //... received from Alice

Cipher c = Cipher.getInstance("AES");
SecretKeySpec k =
new SecretKeySpec(key, "AES");
c.init(Cipher.DECRYPT_MODE, k);
byte[] data = c.doFinal(encryptedData);

// do something with data

If you want to test this code, just pick any 16-byte sequence as the key. If you put the same key into each side, you should find that data encrypted by Alice's code can be decrypted by Bob's. And if you try printing out the encrypted bytes, you should find that it looks like a series of undecipherable random bytes. Indeed, in principle at least, these bytes will be computationally undecipherable without the key.

But unfortunately, that's not the whole story... To see why, we need to first consider how AES and similar algorithms work, and in particular the notion of block ciphers.

Symmetric Key Encryption in Java: AES and block ciphers:
The basic Java encryption code we've just looked at gives us the very basic building block for securely storing or transmitting data. Unfortunately, like all good building blocks, it's generally not enough on its own. To get proper security, we need to use the building blocks properly.
The first thing we need to consider is that AES, like other common encryption algorithms, is a block cipher. That means that it always operates on units or "blocks" that are some fixed number of bytes in size. In the case of AES, a block is 16 bytes, or 128 bits. (Coincidentally, we're also using a 16-byte key in our examples, but the block size and key size aren't necessarily the same. AES can be used with other key sizes. And another algorithm, DES, has a 7-byte key but an 8-byte block size.) AES and various other block ciphers essentially take each n-byte block of bytes and "mix" that block together with the key, thus outputting an encrypted block.
Mixing typically involves simple bitwise operations such as the XOR operation and bitshifts, possibly arithmetic operations (addition and multiplication), plus a technique called "substitution" (whereby at points during the process, parts of the encryption buffer are used as indicies into an array of alternative "random" bytes/bits for which they are swapped). Mixing occurs in "rounds": the mixing procedure is repeated some number of times on each block, often with some variation on each round (e.g. different bytes might be used from the key on different rounds).
The upshot is that:
By default with a block cipher, a given block-length sequence will always result in the same sequence of encrypted bytes when encrypted with a given key.
To illustrate this, consider Alice's encryption code from the previous page. Supposing we define the data to be encrypted as follows (in other words, bytes 16-31 of the block will be a repetition of bytes 0-15):
byte[] plaintext = new byte[32];
for (int i = 0; i < 32; i++) {
  plaintext[i] = (byte) (i % 16);
}

Now, we pick a random key (if you're not familiar with it, see this site's explanation of the SecureRandom class; for reasons discussed there, never use poor-quality generators such as java.util.Random for generating random keys):
byte[] key = new byte[16];
SecureRandom r = new SecureRandom();
r.nextBytes(key);


But let's see what happens when we encrypt plaintext using key, which, remember, contains 16 random bytes. We run Alice's encryption code, save the encrypted bytes, and look at them in a hex editor (or write a little Java method to print them out). What we actually get is something such as this:
6BA71D86E3115D2D49F54A12B5800291 k?.??.]-I?J.??.?
6BA71D86E3115D2D49F54A12B5800291 k?.??.]-I?J.??.?
6F91863A50238AD46AACC5064E902B8D o??:P#??j??.N?+?


Although each individual line contains undecipherable garbage as we'd hope, the first and second lines of encrypted data are identical. Because our plaintext consisted of a repeated pattern, where the pattern itself is exactly the length of a block, the encrypted result contains two identical blocks. (The third block is something called padding: we'll come back to that.) In other words, the cipher is leaking information about the structure of our plaintext. Depending on our application, this could be really bad news.
Unfortunately, you'll see that this happens with the "default" options. When we constructed our AES Cipher object, we didn't set any special option to say "please operate in insecure mode". And yet that's what it decided to do. So I'm just guessing there could be one or two commercial systems out there handling sensitive data with this insecurity. What's the solution?
Well, the solution is essentially quite low-tech at first glance. We simply arrange things so that by the time our data is fed into the encryption algorithm, we "never encrypt the same block twice". And to do that, we choose an appropriate option from what are called block modes.

Encryption Key Size
As well as choosing an encryption algorithm, you must generally choose a key size to use with that algorithm. In a loose way, the key size determines the "strength" of encryption, assuming that the actual algorithm is otherwise secure. Specifically, the key size generally determines: the number of guesses that an attacker would need to make in order to find the key by "brute force", assuming that all possible keys are equally likely. Relatedly, it also determines the chance and feasibility of a collision attack.
Note that on this page, we are concerned with the key size of symmetric encryption algorithms— that is, where a shared secret key is used by all parties. In geneal, asymmetric algorithms require a larger key size measured bit-for-bit. We discuss RSA key sizes on a separate page.

How to set the key size in Java
If you use the KeyGenerator class, then you can set the key size in bits via the init() method. The following example shows how to generate a 256-bit AES key:
KeyGenerator gen = KeyGenerator.getInstance("AES/CTR/PKCS5PADDING");
gen.init(256);
SecretKey k = gen.generateKey();
Cipher ciph = Cipher.getInstance("AES");
ciph.init(Cipher.ENCRYPT_MODE, k);

...
Note that attempting to use this key with Sun's JDK "out-of-the-box" will throw an InvalidKeyException due to an (overridable) policy restriction that limits symmetric keys to 128 bits. Below, we describe how to remove this 128-bit description, and how to determine if a given client has this restriction.

Which key size?In general, the minimum required key size is some combination of the maximum key size that can be attacked now for a given algorithm, extrapolated to the number of years that you need to keep the encrypted data confidential. Needless to say, such extrapolation is extremely difficult beyond a few years. A couple of pointers are:
  • Lenstra & Verheul (2001)1 take data points from attacks on keys of various lengths and extrapolate via Moore's Law. They conclude that in 2030, it will seem as difficult to brute-force a 93-bit symmetric key as it did to brute-force DES in 1982, and that in 2050 this will be true for a 109-bit key. From this, we might conclude that, ignoring collision attacks, a 128-bit key is sufficient to keep data confidential for the next few decades.
  • Taking into account collision attacks, Ferguson & Schneier (2003)2 recommend using a 256-bits key size (p 65).
  • NIST (2006)3, presumably based on similar types of calculation, recommend a minimum of "128 bits of strength" (p. 66) to keep data confidential "beyond 2030".
  • The Visa credit card company's Data Field Encryption best practices guide suggets that for the purpose of transmitting credit card numbers, "keys should have the strength of at least 112 equivalent bit strength". For this purpose, they actually judge 128 bits (as in the minimum AES key size) to be "stronger than needed" (presumably because of the relatively short lifespan of credit cards).
  • In principle, a quantum computer could square root the effort of a brute-force key search, so that the bit strength of the key is halved (i.e. a 256-bit key in quantum computing has the strength of a 128-bit key in classical computing).
The conclusions I draw from this are:
  • If possible, use a 256-bit key; if no great revolution occurs, this will probably keep your data safe for at least 20 or 30 years.
  • A 128-bit key may be suitable if you've thought carefully about collision attacks and how they affect the future of your data and application, and if you don't see quantum computing as a threat within the lifetime of your data (in the transmission of credit card details, for example, it seems a fairly safe bet that all existing cards will have expired by the time practical quantum computers become a reality!).
Practical problems with using 256-bit keys in JavaThere are a couple of practical problems with using 256-bit keys in Java:
  • Current versions of the JDK ship with a 128-bit policy restriction on key sizes.
  • The SecureRandom generator has a 2160 period— in other words, it can only generate 2160 out of the 2256 possible 256-bit keys.
If your deployment model allows it, the 128-bit restriction can be removed by downloading and manually installing Unlimited Strength Jurisdiction Policy Files provided by Sun.