Back to Blog

Poe's Prodigious Prose: A Study in Type-Token Ratio

by Derek LeBlond Classics
poe vocabulary ttr type-token-ratio linguistics

When analyzing The Fall of the House of Usher through Prose Parser, one metric immediately jumps off the page: Edgar Allan Poe's Type-Token Ratio (TTR) of 0.30 is nearly three times higher than any other work in our Classics Library.

But what does this actually mean? And more importantly, what doesn't it mean?

What is Type-Token Ratio?

Type-Token Ratio is one of the oldest and most intuitive measures of vocabulary richness. The formula is simple:

TTR = Unique Words (Types) / Total Words (Tokens)

A TTR of 1.0 would mean every word in the text is unique—no repetition whatsoever. A TTR approaching 0 would indicate extreme repetition of a small vocabulary set.

In Poe's The Fall of the House of Usher, he uses 2,134 unique words across 7,101 total words, yielding a TTR of 0.30. For every ten words Poe writes, three of them are words he hasn't used before in the story.

Poe vs. the Classics

Here's how Poe stacks up against other works in the library:

Work Author TTR Word Count
The Fall of the House of Usher Edgar Allan Poe 0.300 7,101
Alice's Adventures in Wonderland Lewis Carroll 0.105 26,463
A Scanner Darkly Philip K. Dick 0.103 85,180
Frankenstein Mary Shelley 0.095 75,166
The Picture of Dorian Gray Oscar Wilde 0.090 79,199
Moby-Dick Herman Melville 0.088 214,532
The Sun Also Rises Ernest Hemingway 0.075 67,897
Dune Frank Herbert 0.068 200,461
Dracula Bram Stoker 0.065 160,973
Great Expectations Charles Dickens 0.062 185,545
Sense and Sensibility Jane Austen 0.055 119,728

Poe's number looks extraordinary—but notice that column on the right? That's where things get complicated.

The Elephant in the Room: Text Length

Here's the inconvenient truth about TTR: it's heavily dependent on text length.

As a text grows longer, words inevitably repeat. There are only so many ways to say "the," "said," or "was." The longer you write, the more your TTR will decline, regardless of your actual vocabulary.

Poe's story is 7,101 words. Melville's Moby-Dick is over 214,000. If we extract a 7,000-word sample from Moby-Dick, its TTR would likely rise dramatically.

This is a fundamental limitation of TTR, and it's why linguists have developed alternative metrics.

Beyond TTR: The Hapax Ratio

One metric that provides additional insight is the hapax ratio—the percentage of unique words that appear only once in the text (called hapax legomena, Greek for "said only once").

Work Hapax Ratio
The Fall of the House of Usher 70.6%
A Scanner Darkly 57.1%
Dracula 55.7%
The Picture of Dorian Gray 55.6%
Moby-Dick 54.2%
Alice's Adventures in Wonderland 54.0%
The Sun Also Rises 52.9%
Dune 52.4%
Great Expectations 52.4%
Frankenstein 51.2%
The Wizard of Oz 47.6%
Sense and Sensibility 45.8%

Poe leads again, and this metric is less susceptible to text length effects. Over 70% of Poe's unique words appear exactly once. He uses a word, then moves on to another. This supports the idea that Poe deliberately avoided repetition in his vocabulary choices.

What This Tells Us About Poe

Poe was famously meticulous about word choice. In his essay "The Philosophy of Composition," he described constructing "The Raven" with mathematical precision, selecting each word for maximum effect.

The data suggests this wasn't just talk. Poe's writing exhibits:

  • Extreme vocabulary diversity: A TTR of 0.30 means roughly 30% of his words are unique
  • Minimal word recycling: Over 70% of his vocabulary appears only once
  • Dense, demanding prose: His Flesch readability score of 48.2 places him at "Difficult (College)" level

This aligns with Poe's gothic aesthetic. His ornate, deliberately unusual word choices create the atmosphere of creeping dread that defines his work. Words like "insufferable," "pestilent," "arabesque," and "phantasmagoric" appear once, do their work, and vanish—leaving an impression without becoming repetitive.

The Honest Conclusion

Is Poe's vocabulary genuinely richer than Melville's or Dickens's? The data can't definitively answer that question. TTR's length dependency means we're not comparing apples to apples.

What we can say is that within the confines of a 7,000-word short story, Poe demonstrates remarkable vocabulary diversity. His hapax ratio suggests deliberate avoidance of repetition. And his difficult readability score confirms that his word choices lean toward the unusual and elevated.

For writers, the lesson isn't to chase a high TTR. The lesson is to recognize that word choice creates atmosphere. Poe's vocabulary isn't just extensive; it's purposeful. Every unusual word serves the mood of decay, dread, and psychological dissolution.

Try It Yourself

Want to see how your vocabulary measures up? Run your own writing through Prose Parser to discover your Type-Token Ratio, hapax legomena, and other vocabulary metrics.

Explore the full Usher analysis →

Or analyze your own writing to discover your verbal fingerprints.

Want to analyze your own writing?

Discover your verbal fingerprints with Prose Parser.

Analyze Your Text

Discover Data-Driven Details

Regular insights on classic literature analysis and writing techniques.