Friday, May 20, 2005

Comparison of ciphers

Java supports a number of encryption algorithms out of the box. Which encryption algorithm to use can depend on a number of criteria:
  • How secure the algorithm is currently judged to be in the cryptographic literature.
  • The performance characteristics of the algorithm (e.g. the "raw speed" of the algorithm, and whether it supports parallel encryption).
  • how politically safe a decision it is to use a particular algorithm (paradoxically, this doesn't necessarily depend directly on the algorithm's security).
  • Whether you have to interact with a legacy system.
Summary of algorithms
We compare measured speed of encryption with various algorithms available as standard in Sun's JDK, and then give a summary of various other characteristics of those algorithms. The encryption algorithms we consider here are AES (with 128 and 256-bit keys), DES, Triple DES, RC4 (with a 256-bit key) and Blowfish (with a 256-bit key).

PerformanceFirst, the easy bit. Figure 1 shows the time taken to encrypt various numbers of 16-byte blocks of data using the algorithms mentioned.

Figure 1: Comparison of encryption times for various common symmetric encryption algorithms provided as standard in Java 6.
It's important to note right from the beginning that beyond some ridiculous point, it's not worth sacrificing speed for security. However, the measurements will still help us make certain decisions.

CharacteristicsTable 1 gives a summary of the main features of each encryption algorithm, with what I believe is a fair overview of the algorithm's current security status.
AlgorithmKey size(s)SpeedSpeed depends on key size?Security / comments
RC440-1024Very fastNo Of questionable security; may be secure for moderate numbers of encrypted sessions of moderate length. RC4 has the redeeming feature of being fast. However, it has various weaknesses in the random number sequence that it uses: see Klein (2008)1.
Blowfish128-448FastNoBelieved secure, but with less attempted cryptanalysis than other algorithms. Attempts to cryptanalyse Blowfish soon after publication are promising (Schneier, 19952 & 19963). But, unlike AES, it doesn't appear to have received much attention recently in the cryptographic literature. Blowfish has been superseded by Twofish, but the latter is not supported as standard in Java (at least, not in Sun's JDK).
AES128, 192, 256FastYesSecure, though with some reservations from the crypto community. It has the advantage of allowing a 256-bit key size, which should protect against certain future attacks (collision attacks and potential quantum computing algorithms) that would have 264 complexity with a 128-bit key and could become viable in the lifetime of your data.
DES56SlowInsecure: A $10,000 Copacobana machine can find a DES key in an average of a week, as (probably) could a botnet with thousands of machines.
The simple answer is: "Don't use it– it's not safe". (RFC 4772).
Triple DES112/168, but equivalent security of 80/112Very slowNo Moderately secure, especially for small data sizes. The 168-bit variant estimated by NIST (2006) to keep data secure until 20304.
Triple DES performs three DES operations (encrypt-decrypt-encrypt), using either two or three different keys. The 168-bit (three-key) variant of Triple DES is generally considered to offer "112 bits of security", due to a so-called meet-in-the-middle attack. AES offers a higher level of security for lower CPU cost.

Remarks on AESAES stands for Advanced Encryption Standard and is actually an algorithm that was originally called Rijndael, after its inventors Rijmen & Daemen. It is the algorithm that most people will end up using unless they have a strong reason to use anything else:
  • It is a politically safe decision: the encryption standard of the US National Institute of Standards and Technology (NIST), and the US government reportedly approves AES with 192 or 256-bit keys for encrypting top secret documents (or put another way, your boss won't sack you for choosing AES...).
  • It is largely considered secure, though with some reservations:
    • Nobody yet has (publicly) a full attack on AES, or a partial attack that is practical (though some impractical partial attacks exist5).
    • however, AES is algebraically simpler than other block ciphers: effectively, it can be written as a series of mathematical equations, and there is a worry that somebody could demonstrate a way to solve those equations (see Ferguson & Schneier, Practical Cryptography, pp. 57-58).
    • the NSA may have chosen Rijndael as they secretly know how to break it, or secretly estimated that they could develop a way to break it. 
  • It is fast (Ferguson & Schneier (op cit), p. 59, argue that this is the reason it was picked over Serpent as the AES standard.

Sunday, March 6, 2005

C O M P L E X I T Y

Know Thy Complexities!

Hi there!  This webpage covers the space and time Big-O complexities of common algorithms used in Computer Science. When preparing for technical interviews in the past, I found myself spending hours crawling the internet putting together the best, average, and worst case complexities for search and sorting algorithms so that I wouldn't be stumped when asked about them.

GoodFairPoor

Searching

AlgorithmData StructureTime ComplexitySpace Complexity
AverageWorstWorst
Depth First Search (DFS)Graph of |V| vertices and |E| edges-O(|E| + |V|)O(|V|)
Breadth First Search (BFS)Graph of |V| vertices and |E| edges-O(|E| + |V|)O(|V|)
Binary searchSorted array of n elementsO(log(n))O(log(n))O(1)
Linear (Brute Force)ArrayO(n)O(n)O(1)
Shortest path by Dijkstra,
using a Min-heap as priority queue
Graph with |V| vertices and |E| edgesO((|V| + |E|) log |V|)O((|V| + |E|) log |V|)O(|V|)
Shortest path by Dijkstra,
using an unsorted array as priority queue
Graph with |V| vertices and |E| edgesO(|V|^2)O(|V|^2)O(|V|)
Shortest path by Bellman-FordGraph with |V| vertices and |E| edgesO(|V||E|)O(|V||E|)O(|V|)

Sorting

AlgorithmData StructureTime ComplexityWorst Case Auxiliary Space Complexity
BestAverageWorstWorst
QuicksortArrayO(n log(n))O(n log(n))O(n^2)O(n)
MergesortArrayO(n log(n))O(n log(n))O(n log(n))O(n)
HeapsortArrayO(n log(n))O(n log(n))O(n log(n))O(1)
Bubble SortArrayO(n)O(n^2)O(n^2)O(1)
Insertion SortArrayO(n)O(n^2)O(n^2)O(1)
Select SortArrayO(n^2)O(n^2)O(n^2)O(1)
Bucket SortArrayO(n+k)O(n+k)O(n^2)O(nk)
Radix SortArrayO(nk)O(nk)O(nk)O(n+k)

Data Structures

Data StructureTime ComplexitySpace Complexity
AverageWorstWorst
IndexingSearchInsertionDeletionIndexingSearchInsertionDeletion
Basic ArrayO(1)O(n)--O(1)O(n)--O(n)
Dynamic ArrayO(1)O(n)O(n)O(n)O(1)O(n)O(n)O(n)O(n)
Singly-Linked ListO(n)O(n)O(1)O(1)O(n)O(n)O(1)O(1)O(n)
Doubly-Linked ListO(n)O(n)O(1)O(1)O(n)O(n)O(1)O(1)O(n)
Skip ListO(log(n))O(log(n))O(log(n))O(log(n))O(n)O(n)O(n)O(n)O(n log(n))
Hash Table-O(1)O(1)O(1)-O(n)O(n)O(n)O(n)
Binary Search TreeO(log(n))O(log(n))O(log(n))O(log(n))O(n)O(n)O(n)O(n)O(n)
Cartresian Tree-O(log(n))O(log(n))O(log(n))-O(n)O(n)O(n)O(n)
B-TreeO(log(n))O(log(n))O(log(n))O(log(n))O(log(n))O(log(n))O(log(n))O(log(n))O(n)
Red-Black TreeO(log(n))O(log(n))O(log(n))O(log(n))O(log(n))O(log(n))O(log(n))O(log(n))O(n)
Splay Tree-O(log(n))O(log(n))O(log(n))-O(log(n))O(log(n))O(log(n))O(n)
AVL TreeO(log(n))O(log(n))O(log(n))O(log(n))O(log(n))O(log(n))O(log(n))O(log(n))O(n)

Heaps

HeapsTime Complexity
HeapifyFind MaxExtract MaxIncrease KeyInsertDeleteMerge
Linked List (sorted)-O(1)O(1)O(n)O(n)O(1)O(m+n)
Linked List (unsorted)-O(n)O(n)O(1)O(1)O(1)O(1)
Binary HeapO(n)O(1)O(log(n))O(log(n))O(log(n))O(log(n))O(m+n)
Binomial Heap-O(log(n))O(log(n))O(log(n))O(log(n))O(log(n))O(log(n))
Fibonacci Heap-O(1)O(log(n))*O(1)*O(1)O(log(n))*O(1)

Graphs

Node / Edge ManagementStorageAdd VertexAdd EdgeRemove VertexRemove EdgeQuery
Adjacency listO(|V|+|E|)O(1)O(1)O(|V| + |E|)O(|E|)O(|V|)
Incidence listO(|V|+|E|)O(1)O(1)O(|E|)O(|E|)O(|E|)
Adjacency matrixO(|V|^2)O(|V|^2)O(1)O(|V|^2)O(1)O(1)
Incidence matrixO(|V| ⋅ |E|)O(|V| ⋅ |E|)O(|V| ⋅ |E|)O(|V| ⋅ |E|)O(|V| ⋅ |E|)O(|E|)

Notation for asymptotic growth

letterboundgrowth
(theta) Θupper and lower, tight[1]equal[2]
(big-oh) Oupper, tightness unknownless than or equal[3]
(small-oh) oupper, not tightless than
(big omega) Ωlower, tightness unknowngreater than or equal
(small omega) ωlower, not tightgreater than
[1] Big O is the upper bound, while Omega is the lower bound. Theta requires both Big O and Omega, so that's why it's referred to as a tight bound (it must be both the upper and lower bound). For example, an algorithm taking Omega(n log n) takes at least n log n time but has no upper limit. An algorithm taking Theta(n log n) is far preferential since it takes AT LEAST n log n (Omega n log n) and NO MORE THAN n log n (Big O n log n).SO
[2] f(x)=Θ(g(n)) means f (the running time of the algorithm) grows exactly like g when n (input size) gets larger. In other words, the growth rate of f(x) is asymptotically proportional to g(n).
[3] Same thing. Here the growth rate is no faster than g(n). big-oh is the most useful because represents the worst-case behavior.
In short, if algorithm is __ then its performance is __


algorithmperformance
o(n)< n
O(n)≤ n
Θ(n)= n
Ω(n)≥ n
ω(n)> n