Michael Santoro,. Queens College English Department. Introduction to Quantitative Formalism. "'Distant reading', I have once called this type of approach; where distance is however not an obstacle, but a specific form of knowledge…"

Quantitative Formalism: The “Genre” Potential of Political Rhetoric

Michael Santoro,

Queens College English Department


Introduction to Quantitative Formalism

“’Distant reading’, I have once called this type of approach; where distance is however not an obstacle, but a specific form of knowledge…”





Introduction to Quantitative Formalism

The idea that the form of a text can be understood through the analysis of quantitative data gathered by software.

Quantitative formalism is a concept that is reliant on the advent and pervasion of computer technology, and would previously have been a moot one, even in the face of the soundest of theories.


Introduction to Quantitative Formalism

The digital transcription and collection of data is the foundation of digital literary analyses. Study-mode-specific software is the rest of the structure in which theories can be tested.

Quantitative formalism doesn’t only use software to categorize texts in accordance with our existing categorical distinctions, but sometimes reveals new ways of thinking of how we define texts.


Quantitative Formalism at Work

Five scholars (including Franco


) conducted studies on authorship attribution and genre recognition using LAT and MFW analyses, and published the results in 2011.

LAT (Language Action Type) – LAT refers to a method of categorizing words or groups of words with labels that indicate modes of speech, types of content, or any other parameter dictated to the software.

MFW (Most Frequent Word) – The words diversity an frequency within a text or texts.


Quantitative Formalism at Work

The software “


” gathered LAT data, and the scholars were able to do a comparative analysis of the data to determine elements of both authorship attribution and genre.

A MFW approach, with the most common words (often “the”, “a”, “and”, etc.) included was able to identify novelistic genre.



Genre = Rhetoric?

MFW research was able to determine genre with identifiers that were incredibly latent for any human close-reader.

These consistencies imply that, beyond the author’s intent, the MFW counts of a novel can belie traits of style or content.

Could this same approach not be used to determine the political persuasion of rhetorical text?


Genre = Rhetoric?

The corpus – The closest pool of political text I can think of that resembles the novel in terms of size comes from the field of political punditry.


Genre = Rhetoric?

Software –


Tools –


offers a variety of tools that facilitate any number of arrangements of data collected from text, from simple MFW tallies to correspondence analysis, and offers plenty of visualization options to help illustrate any interesting consistencies that may be observed across a text or group of texts.


Genre = Rhetoric?Voyant scatter plot example


Genre = Rhetoric?

If consistencies of any variety are observed, I plan to open the corpus up incrementally. Punditry is a fine starting point, but if any observed consistencies seem to point specifically towards inherent qualities of texts that land squarely on one side of the US political dichotomy, I would like to open it up to more ambiguous political texts, as well as political and philosophical texts that were originally written in languages other

than English.









