How many words did Shakespeare know?

Statisticians Bradley Efron and Ronald Thisted tried to answer the question how many words Shakespeare actually knew. This analysis required advanced methods and was done in 1976.

Efron and Thisted took the complete known works of Shakespeare and created a corpora through counting the number of words that are used once, twice, three times, and so forth.

Table 1. The number of words in the complete works of Shakespeare:

In his complete works, Shakespeare used 31,534 different words and a grand total of 884,647 words counting repetitions. The task of counting words is a nontrivial task; the results are compiled in a concordance of the works of Shakespeare.

The Statisticians expected Shakespeare to have known many more words than he actually used in his works.

“Suppose a second, new and different, sample of Shakespeare’s works was discovered of the same size as the first sample. How many words could we expect to find in the second sample that were not used in the first sample? We would expect there to be fewer new words in the second sample, because the first sample, every first occurrence of word is new, even a common word like the; in the second sample, those common words are no longer new. But how many fewer new words would be expected in the second sample? Efron and Thisted were able to estimate that 11,430 words would appear in the second sample that did not appear in the first sample.”

In the end, they came to the conclusion that in addition to the 31,534 words that Shakespeare knew and used, there were approximately 35,000 words that he knew but didn’t use. Thus, we can estimate that Shakespeare knew approximately 66,534 words.

