#teaparty -- they talk a lot

Well, they really write a lot. #teaparty is a hashtag, and the best way to think about hashtags is as locations for communication. They are not geographic locations, but are locations of interest. If I include the hashtag in my Twitter message then everyone who is interested in #teaparty will find my message. There is no monopoly on the use of a hashtag; anyone can use it. People write both approving and disapproving Teaparty and Teaparty views of politics.

Teaparty is a political movement. Its origin was early 2009. It was inspired by the election of President Obama and the Democratic majorities in the House and the Senate and what they intended to do in office [Beth Rowen]. And where Teaparty exists is on Twitter. The Washington Post went on a search for local Teaparty organizations -- attempting to contact every local organization in the US in the fall of 2010 [Amy Gardner]. They found only small, disconnected local organizations.

But #teaparty is very active. The figure shows the number of messages a day from September 7, 2010 through the election.

There were 620,195 messages during the two months with an average of almost eleven thousand a day. And the day of the election they really got with it -- to the tune of 22,000+ messages.

That is the volume; volume is easy. You capture the messages including #teaparty every day and then count. But the variety of the strands of messages is more difficult to track. Are they only communicating about a few topics in the 11,000 messages a day? Are there many topics? This analysis is one way to answer that question.

I will examine all of the messages for September 21 [12,223 messages], 22 [12,811 messages], and 27 [11,780 messges]. The dates were chosen to be able to examine the overlap between two contiguous days and then between the first day and the seventh day.

The subject of a message seems quite straightforward, but it is not. For example, if a messages notes that Sarah Palin endorsed Christine O'Donnell is the subject Palin or O'Donnell or endorsement? Well, it is all three. Counting streams of messages when each message might have three or more subjects is not a very satisfactory procedure. If the object was to examine messages about Sarah Palin or about Barack Obama one could simply extract them from the flow and characterize the extracted set. There were 2,340 messages that included Palin and 5,198 that included Obama. But that is only 20.5% of all of the messages for the three days. There is 80% left over.

An alternate procedure is to look for phrases. This takes advantage of the 140 character limit of Twitter; it is difficult to include more than one phrase in 140 characters. If you search for phrases you will come close to no more than one per tweet. The phrases were defined as 4 or 5 words in length that appeared at least 3 times in the text. So you would have phrases about Palin and O'Donnell, Palin and dancing with the stars, and Palin and an appearance at some location. Each would be counted as a distinct phrase. And the number of distinct phrases, by this definition, would constitute the number of subjects being written about.

When you do this search procedure the number of distinct phrases found is substantial. The search was done using WordStat from Provalis Software. There were 1,657 for September 21, 1,662 for the 22nd, and 1504 for the 27th. If you define phrases as 3 to 5 words there are more by about a third. When I read them the four words phrases seemed more distinct than the three word phrases. These are distinct phrases and each appeared at least 3 times in the text. The 4823 distinct phrases appeared 32,332 times, which is an average of 6.7 appearances per phrase. There was, of course, considerable variation in the number appearances per phrase. The standard deviation is 8.4; the standard deviation can be larger than the average when the distribution is highly skewed.

In this case there are a few phrases that occur many times and many phrases that occur only a few times.

The table presents the top ten messages per day. The number of occurrences range from 114 to 56. There were roughly 12,000 messages and 1,600 distinct phrases a day, and the most frequently used phrase appeared 114 times. Relative to the total no phrase appears in the text very frequently.

9/21
 
9/22
 
9/27
 
O'Donnell win a victory 85 O'Donell win a victory 90 til its over endrun project 114
win a victory for freedom 85 spread the wealth 90 DE goes red money bomb 86
DE goes red money bomb 72 win a victory for freedom 90 retweet this if you make 85
O'Donnell for senate 68 O'Donnell for senate 85 O'Donnell for senate 83
Sarah Palin endorses Christine 67 Sarah Palin endorses Christine 73 spread the wealth 71
spread the wealth 62 Ohio Dem party chair calls 73 Sarah Palin endorses Christine 70
stop the Dream Act 60 bomb over Oh delphi workers 68 O'Donnell win a victory 68
make the most generous donations 57 make a difference 59 win a victory for freedom 68
generous donations lets win 56 editors apply today 57 great news site needs volunteers 66
lets win this in November 56 site needs volunteers to serve 57 serve as editors apply today 66

Christine O'Donnell had won the Republican primary election just a week earlier, and she is the only candidate mentioned. The forthcoming election is the primary focus even though it was only September. The only legislation to make it into the top 10 was the Dream Act. There is quite a lot of overlap from one day to the next in the most frequently used phrases.

It takes only a little sorting to find the phrases that appear on two days. On September 21 there were 12,233 Twitter messages and 1,657 distinct phrase. On September 22 there were 12,811 Twitter messages and 1,662 distinct phrases. And 270 phrases appeared on both days. 270 of 1,657 or of 1,662 is a very small overlap between phrases on the two days. On September 27 there were 11,780 Twitter messages and 1504 distinct phrases. Only 148 phrases appeared both in the text of October 21 and October 27.

What to conclude from this analysis? One, high variety which has an unexpected consequence for 'big' and 'small'. When there are 12,000 messages and 85 is the most frequently a phrase appears there is a lot of variety. The overlap from one day to the next is very small and after a week it is even smaller. So, 85 or 100 is big in these streams. That 85 would appear big in these streams is not at all obvious.

Two, finding coherence in these streams of messages is going to be more a matter of interpretation than counting. For example, there is little doubt that "taking back" is a dominant theme in the Teaparty movement. This is one statement of that move:

RT @PMgeezer: "We're not trying to take back our country, we ARE our country!" Christine #ocra #DEsen #teapartyO'Donnell (Patriot)

In this message "take back" is associated both with Christine O'Donnell and patriot. Much of the enthusiasm for the candidate O'Donnell grew from the sense that she was part of taking back the country. She had defeated a 'standard' Republican in the primary election. She was taking the country back even from RINOs or 'Republicans in name only'. And this taking back is the ultimate in patriotism. However, the phrase "take back" appears only 37 times in the messages of these three days. If it is a dominant theme it is not dominant because of constant repetition. It is, rather, dominant because it is the base from which grew support for O'Donnell and the other candidates who reflected the same values.

And #teaparty -- after the 2010 election they are confident that:

TSA degrading policy is Obama's trial balloon on pure despotism.

They will rid their party of RINOs, and

Sarah Palin will win the presidency in 2012

© G. R. Boynton,
November 21, 2010