Skip to main content

The Silmarillion: Who Speaks?


Methodology and Definitions

Methodology and Definitions

The spreadsheet is organized into three datasets (Speech Acts, Dialogue, and Silence), followed by a series of tabs with statistics compiled from the datasets.

Definitions

The following terms are used throughout to refer to characters' speech:

  • dialogue: quoted speech
  • indirect speech: speech that is reported but not quoted
  • speech: dialogue and indirect speech
  • speech act: a unit of speech (dialogue or indirect) that serves a distinct function

Speech act theory comes from the 1955 lecture series by J.L Austin, How to Do Things with Words. Prior to Austin's work, speech was assumed to be descriptive; Austin proposed that speech could serve as an action ("I thee wed") or compel an action from someone else. Since then, speech act theory has been applied to literature and picked up by computational linguists, who use it to define the purpose of speech for the purpose of "training" computers. I have used a heavily modified version of Martin Weissner's DART Annotation Scheme to classify speech acts in The Silmarillion, partly due to its broad applicability—many annotation schemes have a very narrow focus, such as annotating professional meetings—and its development with adaptation for broad use in mind. See the section "Speech Act Tag" below for the annotation system I modified from DART.

Columns

ID/SID

Each row is provided with a unique ID used for counting in certain formulas. In the Speech Acts and Dialogue sheets, this is the ID column; in Silence, it is the SID column. The ID columns match between Speech Acts and Dialogue; they do not match with Silence.

Chapter (Ch)

The chapter, abbreviated. For the corresponding full chapter title, see the Stats - Chapter tab.

Paragraph (Para)

I count each indentation as a new paragraph, which includes single lines of dialogue and stanzas within long poems. This differs from how Kane (see Source, below) counts paragraphs, so there are discrepancies between my work and his.

Be forewarned that if there are errors in the dataset, this is likely where they are. If you do a careful count and come up with different paragraph numbers, using my methodology, let me know so that I can correct the datasets.

Type

Type refers to the type of speech:

  • dialogue-init: The first speech act in a new instance of dialogue.
  • dialogue-cont: Subsequent speech acts in an instance of dialogue.
  • non-dialogue: An instance of indirect speech.
  • generalized: Generalized speech refers to speech occuring over an extended period of time rather than a single instance. For example, consider the line from the Valaquenta: "But mostly Ulmo speaks to those who dwell in Middle-earth with voices that are heard only as the music of water." Ulmo is clearly speaking here, but it is not a single instance.

Passage

The quoted dialogue or a snippet of the text surrounding the indirect speech. I've bolded the word or words in the indirect speech that indicates that it is speech. Note that I use an unauthorized digital copy of The Silmarillion that emerged from a scan made for a similarly unauthorized translation project in the mid-2000s. While I've fixed errors where I've found them, I have not done a thorough side-by-side comparison of the published text. I do not recommend using the dataset for quotations.

Word Count

  1. Single lines of dialogue interrupted by dialogue tags or other prose are indicated with an ellipsis (...) with a space only at its end (so as not to mess up the word-count formula).
  2. End punctuation is retained. If there is no end punctuation (because there is a comma leading into non-dialogue prose), then no punctuation is used.
  3. Dialogue within dialogue (when the speaker is attributed): The ellipsis (see #1) removes the embedded dialogue, which is counted separately for that speaker.
  4. New paragraphs are counted as new dialogue (unless uninterrupted dialogue is divided into multiple paragraphs without interceding action).
  5. Thoughts in the form of dialogue are not included.
  6. The following are not included: remembered dialogue that repeats earlier dialogue (e.g., Ulmo's warning to Turgon), imagined speech (e.g., Túrin's convinction that he hears Finduilas calling him), or speech that repeats or refers back to earlier documented speech (e.g., "Thus it was that as Mandos foretold to them in Araman the Blessed Realm was shut against the Noldor").

Turn

Turn is used to document turn-taking within conversations. The number corresponds to the number of the turn in a conversation. Numbers that end with an X mean that was the final turn. Turns can include both dialogue and indirect speech.

Speech Act Tag

As noted above, the thirty-two speech act tags used in this dataset were adapted from Martin Weissner's Digital Annotation and Research Tool (DART), version 3.

A speech act is defined as speech with a single, distinct purpose. Multiple speech acts can occur within a single sentence, or multiple sentences may constitute a single speech act. When the subject or topic of the speech changes, this becomes a new speech act, even if the tag is the same. For example, "Is it still raining out? Did you see him on your way in?" would be a single instance of dialogue containing two question speech acts.

The following speech act tags are used in the dataset. Bold text indicates changes or additions to the DART taxonomy.

  • accept: responding in an active positive way or signaling explicit agreement
  • answer: answering a question
  • boast: predicting future success, without evidence, for rhetorical purposes
  • correct: correcting what the interlocutor has said or expressing disagreement
  • counsel: collaborating with other characters to share information or develop a plan
  • direct: eliciting an action from another character with the assumption that the direction will be followed
  • express approval: expressing appreciation or approval
  • express disapproval: expressing objection, criticism, or disapproval (including curses)
  • express discouragement: expressing that something can't or won't be done or won't happen, or expressing regret
  • express hope: expressing hope that something can or will be done, or will happen
  • express opinion: expressing an opinion/evaluation
  • express uncertainty: expressing uncertainty regarding something
  • express wish: expressing a wish or desire (including prayers)
  • greet: greeting the interlocutor at arrival or departure
  • hallow: blessing with divine power
  • insult: insulting the interlocutor
  • name: providing a being or object with a name
  • negotiate: attempting to influence another character's choice or behavior
  • offer: offering a service to benefit the interlocutor (including offering friendship)
  • perform: performing a song, dance, or recitation
  • persuade: attempting to influence another character's opinion, belief, or feelings
  • pledge: committing to future action
  • prophesy: using foresight to describe a future event
  • question: posing a question in order to gain information (i.e., does not include rhetorical questions)
  • refuse: responding negatively to an offer, etc
  • report: reporting what others, including the interlocutor, have said (including rumor)
  • state: conveying information/awareness without attempting to change another character's perceptions, emotions, or choices
  • suggest: proposing potential action
  • teach: providing knowledge or instruction in a skill to another character
  • thank: expressing gratitude
  • threaten: promising future harm
  • warn: providing information on a harmful condition

An important feature of speech act theory (and the broader field of discourse analysis to which it belongs) is that speech is analyzed within a social context, and different contexts can change the function of otherwise identical words of speech. For example, consider possibly the shortest speech act possible in the English language: "No." This can function differently within different contexts:

  • "Did you feed the dog?" "No." (answer)
  • "Melkor made the Silmarils." "No." (correct)
  • After dropping a full, fresh gallon of sun tea onto the floor. "Nooo!" (express discouragement)

Given this, I applied speech act tags using the context around the speech itself.

Character

This column includes the following special designations:

  • Group: two or more characters are credited with the speech. Note that when there are two named characters credited for indirect speech (e.g., Gelmir and Arminas's warning to Orodreth), they are given separate speech acts. When two characters are credited with quoted speech, it is tagged as "Group," since it is highly unlikely that those characters recited the speech in unison.
  • Unknown: speech is indicated but the speaker is unknown; this is especially used for the frequent references to characters receiving news or tidings.
  • Unnamed: a single speaker is identified but not named (e.g., Dorlas's wife, messenger of Gondolin's guard). The speaker is identified more fully in the Notes column.

Gender

Options are male (M), female (F), and no or unknown gender (N/A).

Group

This column classifies characters by the broad "race" group they belong to.

Sub-Group

This column classifies characters further within the broader grouping.

Source (Primary)

This section is deeply indebted to Douglas Charles Kane and his book Arda Reconstructed, which identifies the texts from which Silmarillion materials originate.

Source (Secondary)

See "Source (Primary)."

Notes

Additional details or clarifications about data in other columns.

Validation

These data have been validated three times over (at least) for the following:

  • Rereading The Silmarillion for dialogue, speech, and silence.
  • Reviewing speech acts to ensure that they are tagged correctly and that tagging is consistent.

I've also cross-compared the statistics and spot-checked formulas with hand counts.

This does not mean that the data are perfect. I'm confident that mistakes remain. However, the data have been carefully reviewed at many different points over the past year to check for errors and inconsistencies.

Two areas, noted above as well, where errors are likely are the paragraph and passage columns. Make sure to double-check these when using these outside of personal use and informal contexts.


Dataset

Dataset for The Silmarillion: Who Speaks? (v.2)

These data are available for anyone to use under a CC Attribution Non-Commercial Share-Alike license. Credit Dawn Walls-Thumma (scholarly contexts) or Dawn Felagund (fannish contexts) with a link to dawnfelagund.com if you do. Contact me at DawnFelagund@gmail.com for uses outside the CC license.

Make a copy of the dataset for your own use.

The original dataset is now obsolete but can be found here.


FAQs

FAQs

I found a mistake. What should I do?

If you find a mistake, I appreciate you letting me know so that I can fix the dataset for others as well! Please email me at DawnFelagund@gmail.com and let me know what you found.

If you disagree with how I've labeled or interpreted a particular piece of data, I appreciate you keeping that to yourself, making a copy of the dataset, and doing whatever makes you happy and feel like it's right.

Can I use these data?

Yes! I spent over a year putting this dataset together, and that effort feels more worthwhile if others use it for their work, edification, or enjoyment.

These data are available for anyone to use under a CC Attribution Non-Commercial Share-Alike license. Credit Dawn Walls-Thumma (scholarly contexts) or Dawn Felagund (fannish contexts) with a link to dawnfelagund.com if you do. Contact me at DawnFelagund@gmail.com for uses outside the CC license.

Do not email me to ask permission to use the work under the terms of the CC license. The CC license is me telling you it's okay.

Please do email me to share anything cool that you post or publish using the data!


Updates and Changes

Updates and Changes

Version 2.1.1

Date: 31 January 2026

Changes:

  • SpeechActs, IDs 375 and 376, "Turn" is corrected to 1X on both

Works Cited

Austin, J.L. How to Do Things with Words: The William James Lectures Delivered at Harvard University in 1955. Oxford University Press, 1962.

Kane, Douglas Charles. Arda Reconstructed: The Creation of the Published Silmarillion. Lehigh University Press, 2009.

Weissner, Martin. "The DART Scheme." MartinWeissner.org, https://martinweisser.org/DART_scheme.html. Accessed 22 August 2025.


Add new comment

The content of this field is kept private and will not be shown publicly.

Restricted HTML

  • Allowed HTML tags: <a href hreflang> <em> <strong> <cite> <blockquote cite> <code> <ul type> <ol start type> <li> <dl> <dt> <dd> <h2 id> <h3 id> <h4 id> <h5 id> <h6 id>
  • Lines and paragraphs break automatically.
  • Web page addresses and email addresses turn into links automatically.