Earlier I showed how to extract the postings from a given Facebook page. Here, I will show you how to do some basic text mining on the posts you found. For practice, I will use the messages of a local neo-Confederate group called ACTBAC (“Alamance County Taking Back Alamance County”). Their antics have been covered in local media, but with their re-branding in light of the Trump election and the rise of the alt-right, many people in our area are still wondering just what this group is all about. Perhaps text mining can help illuminate some of their beliefs and strategies for us.
I ran the script on their “ALAMANCEOURS” Facebook page, and it yielded 1017 messages beginning in June, 2015. Here is the spreadsheet (actbac.csv) in case anyone wants to play around with it.
Top 50 words most used in their FB posts
I wrote a program to count frequencies and remove stopwords (stopwords are boring words like ‘a’, ‘to’, ‘it’, ‘is’). Then I highlighted the most interesting words (to me) in yellow. Each word is shown with its count next to it.
From these, we can see many predictable words for a county-based neo-Confederate group (county, state, southern, cause, carolina). However, I was most intrigued by the prominence of the word ‘stand’.
Usage of the word ‘stand’
Stand can be both a noun (“take a stand”) and a verb (“stand up for yourself”). With this group, ‘stand’ is the most common verb used in their messages (not counting stopwords like ‘be’ or ‘is’). My hypothesis is that, as a verb, this word ‘stand’ conveys a lot of the power of their movement. Why?
To help understand how they use ‘stand’, I wrote a program to generate a concordance to show how the word is used in their messages. The first few lines of the concordance look like this:
The word of interest (shown in red) is placed in the center of each line. The concordance then shows each collection of words around that word.
From this, I learned that the word ‘stand’ is used 291 times in 1017 messages, most commonly as follows:
In addition, there are another 41 uses of “stood” and 86 uses of “standing”.
It would be interesting to compare this usage to other Confederate and non-Confederate groups to see whether this is a uniquely ACTBAC thing (I doubt it), or – more likely – it is a rhetorical device used more broadly by all Confederate groups. I would guess that their defensive “stand up for your beliefs, no matter how unpopular” plea has great power in a neo-Confederate setting. After all, the “Lost Cause” narrative also describes a heroic, virtuous South fighting against all odds, and ultimately unfairly defeated in the American Civil War.
Next, just for fun, I wrote a program to build a topic model of the postings. A topic model tells us what words frequently co-occur in sentences, and tries to make groupings of those words into possible “topics”. Inside the program, you can fiddle with the number of topics, and the number of words generated for each.
After running a few experiments, I settled on 3 topics with 4 words each. These topics weren’t terribly interesting, as you can see below, but we can still learn a few interesting things. First, when ‘stand’ is mentioned, it is often used with ‘southern’ and ‘state’, and it seems to be ‘people’ who are doing the standing (makes sense). Additionally, the topic we could call ‘Confederate battle flag’ emerges (labeled Topic 3 below):
Finally, I looked at how difficult the text was to read. These are fairly simple analyses based on sentence structure, number of “difficult” words, and how many syllables are in the words.
The FKRE is the Flesch-Kincaid Reading Ease metric, which tells you how “easy” a document is to read, and then this number (71.55, or “fairly easy”) can be converted to a grade level metric (7th grade). I also ran an overall readability summary, which integrates several other difficulty measures in addition to FKRE. That one also puts this text at right around 6th or 7th grade.
I hope you enjoyed this quick tour of text mining – perhaps you will find some interesting techniques to use on your own projects!