The Breakup#
Now that I had a handle on what the computer thinks of my corpus as a whole, I split my corpus into pieces in three ways:
Time: I grouped pieces together from four eras of my life: 2008-2013 (7); 2015-2017 (10); 2020 (11); and 2021-2023 (9). (I don’t have anything from 2014, 2018, or 2019, so I omitted those years here.)
Audience: As noted above, my corpus can be pretty easily split between “for school” and “for a wide audience.” I added the piece I wrote for a magazine to the “self-published” section for a split of 25 and 12.
Topic: I grouped pieces together based on their primary assigned topic per Topic Modeling Tool. One piece had two primary assigned topics, Topic 4 and Topic 10, so I selected Topic 4 since Topic 10 was otherwise represented in every piece. (More on Topic 10 at the end.)
Then, I created a spreadsheet that displayed several metrics from Voyant about every piece in the corpus, including word count, vocabulary density, words per sentence, readability, and which topic(s) that Topic Modeling Tool told me were most associated with each piece. In addition to laying out this data on an individual piece basis, I calculated the per-piece averages of each metric when I sorted the pieces by time, audience, and topic. I added some conditional formatting in the form of a color scale for each metric, and bam, I had a very basic heat map.
Naturally, there was a lot I wanted to explore here, and I knew I wouldn’t be able to devote the time I wanted to all of it. To revisit my research questions:
Did my writing get better (or worse) over time?
What do “better” and “worse” mean?
Can I identify any idiosyncrasies that changed (or stayed the same) over time?
What differences exist between the mediums of publication — e.g., a piece I wrote knowing others would see it vs. a piece I wrote for an instructor — if any?
Yeah, let’s just get into as much of this as is holistically possible.