Cryptology
Cryptography is the art of providing secure communication over insecure channels and cryptanalysis is the dual art of breaking into such communication. Cryptology is the combined arts of cryptography and cryptanalysis. The very first electronic computers in history were built in England for the purpose of cryptanalysis. Consult the books of D. Kahn, in particular [a13], for a detailed historical perspective. For more information and additional references on the topics treated here, see [a3].
The purpose of a cryptosystem is to encipher an intelligible cleartext (also called plaintext), thus producing an unintelligible ciphertext (also called cryptogram). The intended receiver must be able to decipher the ciphertext, thus recovering the cleartext. However, eavesdroppers (also called (passive) cryptanalysts) must be unable to decrypt the ciphertext.
A cryptosystem is restricted if its security is based on keeping secret the nature of the enciphering and deciphering algorithms. Restricted cryptosystems are usually designed by amateurs and are almost always childplay for qualified cryptanalysts. More importantly, they can be of no use in the modern context of a large number of users. Codes (cf. Code) are instances of restricted cryptosystems. These are not discussed any further here.
A cryptosystem is general if its security lies not in the secrecy of the enciphering and deciphering algorithms, but rather in a relatively short secret parameter known as the key. It should be easy for individual users to come up with their own keys so that even the designer of the cryptosystem cannot break it if he does not know which key has actually been used. General cryptosystems should allow a very large number of possible keys, so as to discourage exhaustive search (trying systematically to decipher a given ciphertext using each possible key until meaningful cleartext emerges). However, there is no safety in large numbers alone.
A general cryptosystem is secret-key if some secret piece of information (the key) has to be agreed upon ahead of time between any two parties that wish to communicate. This implies a need for secure key distribution, which can be a serious problem if a large number of users are involved.
A general cryptosystem is public-key if it provides secure communication over insecure channels (cf. Communication channel) between two totally unacquainted parties. In such systems, each user selects a private key from which she obtains a pair of algorithms. She makes one of them available to everyone as her public enciphering algorithm, whereas she keeps secret the other one, which is the corresponding private deciphering algorithm. This allows even a complete stranger to use her public algorithm to encipher a message for her; yet only she can decipher it through the use of her private algorithm.
Cryptographic techniques have a large number of applications beyond secure communication, such as authentication, digital signature, password protection and minimum disclosure proofs, which are discussed below. In addition, the following applications are mentioned in [a3]: certified mail, coin flipping, contract signing, secure distributed computing, election schemes, playing electronic poker, computing with encrypted data, exchange of secrets, oblivious transfer, privacy amplification, the protection of privacy, secret sharing and software protection.
Secret-key systems.
A secret-key cryptosystem consists of a key space and, for each , of a cleartext message space , a ciphertext message space and a pair of functions and such that for each plaintext message . Given any key , it must be easy to obtain efficient algorithms for computing and . The cryptosystem is used as follows for the purpose of secure communication. If and expect that they might eventually have to communicate privately, they must initially agree on some secret key . Whenever wishes to send a specific to , she uses the enciphering algorithm to produce ; she sends to over an insecure channel; and uses algorithm to recover . In many practical cryptosystems, both and are finite sets, often independent of the key (such as the set of all eight-character strings). In this case, it could be that the actual message is too long to be enciphered directly. If this occurs, must be broken in pieces and must be used several times.
Consider a cryptosystem in which the message space consists of -bit blocks. The obvious solution to encipher a longer message is to cut it into -bit slices and encipher each of them independently using the same secret key. This idea, known as the Electronic Code Book mode, is to be avoided as much as possible. Its most obvious weakness is that two identical slices of ciphertext indicate to the cryptanalyst that the two corresponding slices of plaintext are also identical.
There exist several other modes of operation [a17]. Only the Cipher Block Chaining mode is described here. In this mode, the secret key is supplemented by a -bit block (whose secrecy is not essential). The plaintext is split into -bit blocks . For each , ciphertext block is computed as , where denotes the bitwise exclusive-or. The resulting ciphertext is . Given the ciphertext and knowledge of and , decipherment is achieved by computing . Errors do not propagate when using this mode: plaintext block depends only on ciphertext blocks and . A transmission error can therefore disrupt only two blocks of deciphered cleartext. Nonetheless, each block of ciphertext depends on the entire cleartext up to that point.
Secret-key cryptography distinguishes three levels of cryptanalytic attack.
1) Ciphertext only attack: the cryptanalyst is given . He is to infer or, lacking this ability, as many among as possible.
2) Known plaintext attack: the cryptanalyst is given as above, but also the corresponding . He is to infer or, lacking this ability, from some new ciphertext .
3) Chosen plaintext attack: the cryptanalyst gets to select plaintext messages and he is given the corresponding . He is to infer or, lacking this ability, from some ciphertext .
The main cryptanalytic threat on secret-key cryptosystems comes from the high redundancy of the source language. This allows several kinds of statistical attacks [a8]. C. Shannon suggests two basic cryptographic methods "for frustrating a statistical analysis" : diffusion and confusion.
The purpose of diffusion is to dissipate the cleartext redundancy by spreading it out over the ciphertext. A transposition cipher rearranges the order in which letters are written in the message. Another approach to diffusion is to have each letter of the ciphertext depend on as many letters of the cleartext as possible. The cipher block chaining mode is excellent at creating diffusion.
The purpose of confusion is to make the relation between the key and the ciphertext as complex as possible. As a result, the cryptanalyst should not gain much useful information on the key from statistical studies of the ciphertext. This is usually brought about by the technique of substitution. It is advisable to use substitution on blocks of several letters, and/or to use a different substitution for each position in the cleartext.
Although the cryptanalyst's ultimate goal is to figure out exactly the key or at least the plaintext message, he may be satisfied to learn some probabilistic information about the latter. The cryptanalyst may have a priori information about the plaintext even before looking at the ciphertext. For instance, he knows that "hello" is more probable than "xykph" . Cryptanalysis tries to sharpen this a priori information by modifying the probabilities associated with each possible plaintext message, thus making the correct plaintext more probable, although not necessarily certain. Shannon's information theory formalizes this notion [a19].
A cryptosystem achieves perfect secrecy if knowledge of the ciphertext yields no information whatsoever on the corresponding plaintext, with the possible exception of its length. In other words, the a posteriori probabilities after seeing the ciphertext are exactly the same as the a priori probabilities. Such systems exist, but the problem of key-distribution is amplified because perfect secrecy is only possible if the key is at least as long as the cleartext and used once only.
If a cryptosystem does not offer perfect secrecy, knowledge of the ciphertext yields some information on the corresponding cleartext message. The unicity distance of a cryptosystem is the length of ciphertext that is required in order to expect that there exists only one meaningful decryption for it. For classic cryptosystems with fixed key space, the unicity distance is approximated by the formula , where is the key space entropy (roughly the logarithm base 2 of the number of keys) and measures the redundancy of the plaintext source language in bits (cf. Bit) per letter (about 3.5 for English). A cryptosystem offers ideal secrecy if its unicity distance is infinite. With such cryptosystems, some uncertainty normally remains after a ciphertext-only cryptanalysis, however long the ciphertext is, but cryptanalysis may nonetheless bring out some information on the corresponding plaintext.
The one-time pad offers perfect secrecy. Let be a cleartext message, such as "hello" , and let be a purely random string of letters of equal length, such as "iwpbu" . Each character of is subjected to a cyclic shift in the alphabet, whose magnitude is given by the corresponding character of . In our example, plaintext letter "h" is replaced by "q" , the nineth following letter in the alphabet, because the corresponding key letter is "i" , the nineth letter in the alphabet. Continuing this way, the resulting ciphertext is "qbbnj" . It is easy to recover the plaintext from the ciphertext provided the key is known. Without this information, however, any plaintext of the same length could be accounted for by an appropriate key, assuming the key is indeed purely random and used once only.
A sequence is pseudo-random if it appears to be random, although it was in fact produced by a deterministic process known as a pseudo-random generator (cf. Pseudo-random numbers). Such generators are given a random starting sequence known as the seed, and they are to deterministically produce from it a much longer pseudo-random sequence. A pseudo-random generator is cryptographically strong if the sequence it produces from a short secret seed is essentially as good as a truly random sequence for the purpose of being used as a one-time pad. Such generators can be used to implement a secret-key cryptosystem if both parties agree on the secret seed they are going to use. However, the same seed should not be used directly more than once. The advantage over using a true one-time pad is that the secret can be much shorter than the message. An efficient pseudo-random generator that is cryptographically as strong as factoring is hard is described in [a1].
Diffusion and confusion work best when they are combined. A prime example of this phenomenon is displayed by the widely used Data Encryption Standard (DES), which enciphers -bit blocks of data using a -bit key. The DES algorithm transforms the key into sixteen -bit partial keys through a key scheduling algorithm that uses each of the key bits several times. After a standard initial permutation, a -bit block of data goes through sixteen followed by the inverse initial permutation. Each round performs one step of confusion (using the corresponding partial key and the so-called S-boxes) followed by one step of diffusion. The DES algorithm is designed so that deciphering is performed by the exact same process, except that the order of the partial keys is reversed in the key scheduling. Therefore, the same device can be used both for enciphering and deciphering. Very high encryption and decryption speeds can be achieved with the DES when implemented in hardware (more than 14 megabits per second in 1987). The algorithm is described in [a16].
There is a controversy about the safety of DES, which comes mostly from having a key space small enough to make exhaustive search feasible. One million processors working in parallel, each trying one million keys per second, would exhaust the key space within twenty hours. Nonetheless, it appears to be quite adequate for privacy considerations in small to medium security applications. A simple technique can be used to make exhaustive search more difficult, and DES should not be used without it. Instead of using one -bit key, use three such keys , and . Then encipher a -bit block of plaintext as . The use of DES inverse in the second stage of this formula is designed to offer compatibility with single encryption by making all three keys equal. It has been suggested to use only two keys by setting in the above formula or by computing simply , but these alternatives are vulnerable to more sophisticated attacks.
Public-key systems.
A public-key cryptosystem consists of a key space and, for each , of a cleartext message space , a ciphertext message space and a pair of functions and such that for each plaintext message . Given any key , it must be easy to obtain efficient algorithms for computing and . The difference with secret-key systems is that it must be infeasible to infer any efficient algorithm for computing from knowledge of the algorithm for computing .
A public-key cryptosystem is used as follows. Once and for all, each user selects a random . He uses it to obtain both algorithms for and . He makes publicly available but keeps secret. Whenever another user wishes to send him a message, she looks up in the directory for his public enciphering algorithm and uses it to encipher her message. Using his secret, only the legitimate receiver can decipher the message. This notion was invented independently by W. Diffie and M.E. Hellman [a6] and by R.C. Merkle [a14].
The cryptanalyst's ability to come up with ciphertexts for cleartext messages of his choice causes the collapse of the classic levels of attack discussed about secret-key cryptosystems. However, the following meaningful attack can sometimes be mounted:
4) Chosen ciphertext attack: against a given user whose functions are and , the cryptanalyst gets to select ciphertext messages and he is given the corresponding , provided they exist. He is to infer or any efficient algorithm for computing or, lacking this ability, he is to infer from some ciphertext .
Consider two arbitrary sets and . A function is one-way if it is easy to compute for every , whereas it is hard for most to figure out any such that (provided at least one such exists). Although the current state of knowledge does not allow one to prove the existence of one-way functions, there are functions that are believed to be one-way. Consider for instance integer multiplication. It is easy to multiply very large integers, whereas even the most powerful computer with the best known algorithm is incapable (within the lifetime of the universe) of factoring a mere two hundred digit number that is the product of two prime numbers of roughly equal size.
A function is trap-door one-way if it can be computed efficiently both forwards and backwards, yet one can disclose an efficient algorithm to compute such that even complete knowledge of how this algorithm works does not enable one to figure out an efficient algorithm to reverse the computation. The secret that enables one to obtain both efficient algorithms is known as the trap-door. An example candidate of a trap-door one-way function is squaring modulo a composite integer. Let be the product of two distinct large primes. Let (), where both and are between and . Computing from and is easy. However, computing from and (or any other such that ()) can be done efficiently if and only if the factorization of is known or easy to obtain. Squaring modulo a composite integer is thus a trap-door one-way function provided factoring is hard, and the trap-door consists in the factors of the modulus.
Public-key cryptosystems can only exist if both one-way and trap-door one-way functions exist. Indeed, the enciphering functions must be trap-door one-way and the function that maps the keys into the enciphering algorithms must be one-way. Therefore, one has to be content with public-key cryptosystems whose security has not been proven. The best-known such candidate is the RSA cryptosystem, named after its inventors: R.L. Rivest, A. Shamir and L. Adleman [a18]. It is based on computational number theory and is at most as secure as it is hard to factor large integers (but it could be weaker). Both hardware and software implementations are available. Note that Merkle and Hellman's so-called knapsack cryptosystem [a15] has been broken by Shamir [a5].
A drawback of public-key cryptography is that eavesdropping on the ciphertext always leaks some information about the corresponding plaintext. For instance, an eavesdropper can use the public enciphering algorithm to verify any specific guesses as to what the cleartext might be. The purpose of probabilistic encryption, a notion invented by S. Goldwasser and S. Micali [a11], is to encipher messages in a way that it is essentially as hard to obtain any partial information on the cleartext as it is to break the cryptosystem completely. The enciphering algorithm in a probabilistic encryption scheme is probabilistic: the same cleartext message can give rise to a large number of distinct ciphertexts.
M. Blum and Goldwasser's implementation of probabilistic encryption [a2] is based on the trap-door one-way function and the cryptographically strong pseudo-random generator mentioned previously. Intuitively, the generator is used by the sender to produce a pseudo-random one-time pad of appropriate length from his randomly selected seed. The receiver's ability to extract square roots (based on his trap-door information) allows him to recover the pad and figure out the cleartext. The difficulty of breaking this cryptosystem is equivalent to the difficulty of factoring large integers.
Despite the advantages of public-key cryptography and probabilistic encryption, none of the implementations proposed thus far can compete in speed with secret-key systems such as the DES. A hybrid cryptosystem uses a public-key cryptosystem once in order to share a short piece of information that is then used as key to encipher and decipher the actual message through a secret-key cryptosystem. If the message is sufficiently long, it is advisable to use the public-key cryptosystem several times during the transmission, so as to change secret keys often. In this context, it may be preferable to use a public key distribution system [a6] instead of a public-key cryptosystem.
The inability to prove that secure public-key cryptography or probabilistic encryption can exist is linked to the fundamental question of whether P equals NP [a9]. Indeed, such systems would not be possible should because guesses about secret keys can always be verified efficiently against the published information. Moreover, there are theoretical reasons to believe that there cannot even exist a public-key cryptosystem whose cryptanalytical task is NP-complete.
Authentication and signature.
Until now, only the notion of a passive cryptanalyst, that is, someone whose purpose is merely to eavesdrop on the communication channel, has been discussed. An active cryptanalyst goes further: he may also inject messages of his own in the hope of deceiving the receiver into believing it was sent by someone else. He may also modify genuine messages in transit.
An authentication scheme consists of a key space and, for each , of a message space , a tag space and an authentication function . The function does not have to be one-to-one, and it may even be compressive. Given any key , it must be easy to obtain an efficient algorithm for computing . However, given and only, it must be infeasible to infer or even for some message different from . The authentication scheme is used as follows. If and expect that they might eventually have to authenticate messages between them, they must initially agree on some secret key . Whenever wishes to authenticate some message for , she computes and sends it along together with . In order to verify the authenticity of the message, also computes and compares it with the tag he received. Of course, this does not preclude the encryption of if both privacy and authenticity are sought.
Although an authentication scheme allows to gain high confidence that the message she received originated from , it does not allow her to convince anyone else that actually sent it. This makes authentication schemes suitable between two mutually trusting parties, but it cannot be used to settle a dispute between them. The stronger notion of digital signature was invented by Diffie and Hellman [a6]. If sends a digitally signed message to , the latter will not only be convinced that the message was signed by none but the former, but she will also be able to prove to a judge that actually signed that very message.
A public-key cryptosystem offers signature capability if, for each key , and if for each message . It is used as follows for signature purposes. Let be some user 's private key and let and be her public enciphering and private deciphering functions, respectively. Consider some cleartext message and let . Anyone can compute and find out that its value is . However, only has the knowledge to come up with some such that . In this sense, can be considered as 's signature of message . If secrecy is also desirable, the digitally signed message should be enciphered with the receiver's public enciphering algorithm , yielding . If the message is very long, it may be preferable to sign a one-way compression of it.
Password protection.
Whenever a would-be user seeks the services of a computer (or a customer wants to use an automatic teller, or the teller wishes to access the bank's central computer), how can the latter make sure the former is not forging a false indentity? The classic solution to this problem is through the use of passwords. Assuming the user and the computer share some confidential piece of information, the latter can request this information from the former whenever his identity is questionable. The safety of this solution depends crucially on the ability to keep passwords secret, which points at two weak links: the existence of a password table in the computer and the threat of wire taps while the user enters his password.
One-way functions can be used to overcome the vulnerability of a password table: store in the computer the image of the users' passwords under some fixed one-way function. Because it is one-way, the function itself needs not be kept secret. Whenever a user wishes to prove his identity, he transmits his actual password on which the computer immediately applies the one-way function. The result can then be validated against the computer's table. An enemy would find useless a listing of the table stored in the computer because of his inability to reverse the one-way function, hence to determine the actual passwords. This scheme is rather weak if the users are likely to use easy-to-guess passwords, although a simple technique known as salting can help.
Despite the use of a one-way function, passwords are vulnerable to interception either while they are in transit towards the computer or inside the computer memory before they have been transformed by the one-way function. Intelligent terminals or, preferably, personal smart cards can be used to solve this problem. For this, each user of the system chooses a private random key from some public-key cryptosystem. He uses it to generate a corresponding pair of enciphering and deciphering algorithms. He reveals the enciphering algorithm to the computer and he programs the deciphering algorithm into his own terminal or smart card. Whenever the computer wishes to assert the identity of a user, it generates a random message, computes the alleged user's enciphering function on that input and sends the result to the user's terminal as a challenge. The latter runs its secret program to recompute the original message, which is sent back to the computer for validation. This identification scheme can be used in a continuous challenge mode, in which the computer challenges the terminal at regular intervals. For a practical implementation, it is best if the challenges can be constructed very quickly even if meeting them requires more computation. This is so because the mainframe may have a large number of terminals to attend, whereas each individual terminal can work on its challenges in the background, while still functioning at full speed from its user's point of view.
This identification problem can also be solved by minimum disclosure techniques [a7], a notion described below.
Minimum disclosure proofs.
Assume I have found a proof of, e.g., Fermat's last "theorem" . How could I convince you that I know such a proof without disclosing anything about it beyond its existence and my knowledge of it? More generally, let us say that an information is verifiable if its correctness can be established efficiently by the use of some publicly agreed upon verification procedure. In order to convince you that I know some verifiable information, I could simply show it to you and let you run the verification procedure yourself. This would be a maximum disclosure proof of this knowledge because it results in your learning all the information. By contrast, a minimum disclosure proof allows me to convince you beyond any reasonable doubt that I have the information I claim, but in a way that does not help you determine this information.
The original notion of zero-knowledge interactive proofs was put forward by Goldwasser, Micali and C. Rackoff [a12]. They showed the existence of such proofs for specific number-theoretic problems such as being a quadratic residue or not. Under cryptographic assumptions, it was later discovered that the technique extends to membership to any language in NP (and even a probabilistic generalization of NP) [a10], [a4]. More specifically, consider any and any for which I know an NP certificate that gives . There is a protocol by which I can convince you that in a way that you do not learn anything about my certificate. Using such techniques, for instance, I can convince you that I know the factors of a given large integer without helping you to factor it. This is a very significant tool for the construction of safe crypto-protocols.
The general technique consists of an interactive protocol. In each round of the protocol, I allow you to ask me one of two possible challenges in a way that I could answer both of them only if I had the secret I claim to have, but also in a way that answering either one challenge does not give you any information about this secret. Because I cannot predict ahead of time which challenges you will issue, each round increases your confidence in my claim. In fact, I could only hope to fool you in successive rounds with exponentially vanishing probability . One calls such techniques cut-and-choose because each round is similar to the classic "protocol" by which two children split a piece of cake — one of them cuts and the other one chooses. The great utility of such cut-and-choose is that it gives an exponential increase in security at the cost of only a linear increase in the number of rounds.
References
[a1] | L. Blum, M. Blum, M. Shub, "A simple unpredictable pseudo-random number generator" SIAM J. Computing , 15 : 2 (1986) pp. 364–383 |
[a2] | M. Blum, S. Goldwasser, "An efficient probabilistic public-key encryption scheme which hides all partial information" , Proc. Crypto 84 , Springer (1985) pp. 289–299 |
[a3] | G. Brassard, "Modern cryptology: a tutorial" , Springer (1988) |
[a4] | G. Brassard, D. Chaum, C. Crépeau, "Minimum disclosure proofs of knowledge" , Techn. Report PM-R8710 , CWI (1987) (Revised version) |
[a5] | E.F. Brickell, "The cryptanalysis of knapsack cryptosystems" , Proc. 3-rd SIAM Discrete Mathematics Conf. |
[a6] | W. Diffie, M.E. Hellman, "New directions in cryptography" IEEE Transact. Inform. Th. , IT-22 (1976) pp. 644–654 |
[a7] | U. Feige, A. Fiat, A. Shamir, "Zero knowledge proofs of identity" , Proc. 19th ACM Symp. Theory of Computing (1987) pp. 210–217 |
[a8] | W.F. Friedman, "Elements of cryptanalysis" , Aegean Park Press , Laguna Hills, CA (1976) (Written earlier) |
[a9] | M.R. Garey, D.S. Johnson, "Computers and intractability: a guide to the theory of NP-completeness" , Freeman (1979) |
[a10] | O. Goldreich, S. Micali, A. Wigderson, "Proofs that yield nothing but their validity and a methodology of cryptographic protocol design" , Proc. 27th IEEE Symp. Foundations of Computer Science (1986) pp. 174–187 |
[a11] | S. Goldwasser, S. Micali, "Probabilistic encryption" J. Computer and System Sciences , 28 (1984) pp. 270–299 |
[a12] | S. Goldwasser, S. Micali, C. Rackoff, "The knowledge complexity of interactive proof-systems" , Proc. 17th ACM Symp. Theory of Computing (1985) pp. 291–304 |
[a13] | D. Kahn, "The codebreakers: the story of secret writing" , Macmillan (1967) |
[a14] | R.C. Merkle, "Secure communications over insecure channels" Comm. ACM , 21 (1978) pp. 294–299 |
[a15] | R.C. Merkle, M.E. Hellman, "Hiding information and signatures in trapdoor knapsacks" IEEE Transactions on Information Theory , IT-24 (1978) pp. 525–530 |
[a16] | "Data encryption standard" , Federal Information Processing Standard , FIPS PUB , 46 , Nat. Bureau of Standards, U.S. Department Commerce (1977) |
[a17] | "DES modes of operation" , Federal Information Processing Standard , FIPS PUB , 81 , Nat. Bureau of Standards, U.S. Department Commerce (1980) |
[a18] | R.L. Rivest, A. Shamir, L. Adleman, "A method for obtaining digital signatures and public-key cryptosystems" Comm. ACM , 21 (1978) pp. 120–126 |
[a19] | C.E. Shannon, "Communication theory of secrecy systems" Bell System Technical Journal , 28 (1949) pp. 656–715 |
Cryptology. Encyclopedia of Mathematics. URL: http://encyclopediaofmath.org/index.php?title=Cryptology&oldid=46559