*“I think you hit a reset button for the fall campaign. Everything changes. It’s almost like an Etch A Sketch. You can kind of shake it up and restart all over again"*
-- Eric Ferhnstrom, advisor to Mitt Romney

The American presidential election 2-stage format requires candidates to obtain their party's nomination in order to compete in the general election against the other party’s nominee. Voters during primary elections tend to hold more extreme views than voters during the general election; many of the more moderate (or disinterested) voters only vote during the more important general election.

This format results in shifting incentives for presidential candidates over the election season. During the primary election, candidates can benefit from pandering to their party base – overly centrist candidates may conceal their true ideology in order to appear more in line with the median voter of their party. After securing their party’s nomination, candidates then have an incentive to moderate their platforms to appeal to independent voters and voters on the margin.

One of the early frameworks for thinking about party elections is the Hotelling-Downs model of two-party competition (Downs 1957). I illustrate in the figure below, in the style of Hotelling-Downs, the motivation for why we may expect candidates in a 2-stage election to adopt a more partisan platform in the primary election and then moderate their platform for the general election. This setup relies on the key assumptions that voter turnout is uniform across the partisan spectrum and that voters have single peaked preferences – that is, that they have a certain bliss point along a single dimension of party platforms and will vote for whatever platform is closet to their bliss point. As illustrated, during the primary the centrist challenger (B) would be set to win the general election; however, a more extreme challenger (A) of the same party is positioned to win the primary. If A were to win the primary, they would then lose the general election. By shifting to position of the median of primary voters, challenger B can win the primary election. However, at his new position he would then lose the general election against the more centrist incumbent. But by again shifting his position, this time to the median general election voter, challenger B can win the general election.

I use text analysis of candidates speeches, debates, statements, and press releases to identify shifts in ideology over the course of the 2012 election. My method provides the first high-frequency estimates of *any* individual's ideology, even when we do not have a past voting record.

For members of Congress, ideology can be estimated by examining roll calls of votes. There is a deep literature in Political Science on how best to turn voting records into estimates of an individuals partisan lean. One of the most popular methods is called DW-NOMINATE, originally developed by Rosenthal and Poole (1980). Details and estimates for every congress are made available by Voteview.

Many presidential candidates, however, are not members of Congress. Further, DW-NOMINATE is estimated only infrequently as it requires many votes -- it cannot provide, for example, monthly estimates of someones ideology. If an individual's ideology shifts over time, it will take awhile to show up in DW-NOMINATE scores.

To estimate candidates DW-NOMINATE scores, without relying on legislative votes, I do the following:

- Collect the text of all speeches given by members of Congress (Senate and House) given on the congressional floor by scraping Congress's website, which includes transcripts
- Train a model to predict DW-NOMINATE scores of a member of Congress based on these speeches
- Collect the text of all speeches, statements, fundraiser talks, and debates given by presidential candidates by scraping the American Presidency Project's speach tracker
- Predict each candidates DW-NOMINATE score in a given month based on the text for that month

This method is inspired by Gentzkow and Shapiro (2010), which estimates ideologies of different newspapers using Congress as training data.

Text is messy and requires substantial cleaning. I complete the following for both the text of congressional speeches and text from presidential candidates, most of which is quite standard:

- Remove any procedural speech, such as "Madam speaker"
- Remove stop words (e.g., “a”, “as”, “not”, “of”, ...) according to a list provided by the
*nltk*package - Stem each word according to the Porter stemming algorithm, also available as part of the
*nltk*package. Stemming algorithms reduce words of similar meaning to a common stem – “voting”, “vote”, and “voters” all share the stem “vote.” - Convert text into vectors of frequency counts of two-word phrases (bigrams). Bigrams are a 1-order Markov chain of the text – the phrase, “The quick brown fox” would have a bigram representation of "The quick", "quick brown", "brown fox."
- Limit to bigrams that: appear in at least 15 speeches and are used by 10 different congresspeople. This leaves about 10,000 bigrams

While removing stop words and stemming each words helps reduce the dimensionality of the data and combine similar phrases, it can also combine aggregate phrases with different meaning – “I do not approve of this war” and “I approve of this war” both become “approv war.”

Using count vectors assumes the text is a ‘bag of words’; there is no consideration for order of bigrams or grammatical structure.

The data is now at the individual-speech level. Each row is an individual-speech and each column is a bigram.

Next, I use the count vectors of bigrams from Congress to build a model to predict DW-NOMINATE scores.

I model polarization using Multinomial Inverse Regression (MNIR), a relatively new framework for text analysis developed in Taddy (2013) and applied to a similar problem in Gentzkow, Shapiro, and Taddy. The MNIR framework allows dimension reduction for text data while preserving estimates of sentiment (in this case, DW-NOMINATE or party affiliation). Further, MNIR accommodates the intrinsic multinomial structure of text; politicians do not iteratively (or independently) choose each phrase in a speech, but rather simultaneously choose a large combination of different phrases.

Consider count vectors $\mathbf{x}_i = [x_{i1}, \dots, x_{ip}]^T$ for each document $i$, where $p$ represents the number of bigrams in the representation. Define $m_i \equiv \sum_{j=1}^px_{ij}$ to be the total number of bigrams spoken.

It is useful to assume that each $\mathbf{x}_i$ is independent. In the case of political speeches, that is a difficult assumption to defend given that all documents come from a small set of speakers, such that count vectors are likely to be strongly correlated across documents from the same speaker. I mitigate this by collapsing count vectors by speaker $s$ into a single count vector representing all speech by a given legislator, implicitly assuming that each speaker represents a unique and independent text-generating function: $\mathbf{x}_s = \sum_{i: s_i=s}\mathbf{x}_i\ \forall s \in S$, the set of all speakers. Similarly, $m_s = \sum_{i: s_i = s} m_i \ \forall s \in S$.

Then $\mathbf{x}_s \sim MN(\mathbf{q}_s, m_s)$,where \begin{align*} &q_{sj} = \frac{\exp[\eta_{sj}]}{\sum_l^p\exp[\eta_{sl}]} \\ &\eta_{sj} = \alpha_j + \gamma_{j}D_{s} + \varphi_{j}r_s \end{align*} where $D_s$ is the DW-NOMINATE score for speaker $s$ and $r_s$ is an indicator variable taking value $1$ if the speaker is Republican. The coefficients $\gamma_{j}$ and $\varphi_j$ can be interpreted as the lift in utility from being Republican or having a higher DW-NOMINATE score for speaker $s$ using phrase $j$.

As described, estimation of the above multinomial regression is infeasible due to computational limitations and the high dimensionality of text data; however, it can be approximated using a Poisson approximation, which allows for distributed computing. Details of the Poisson approximation for MNIR can be found in Taddy 2013; I use the corresponding R package *textir* for estimation. The Poisson approximation implementation in *textir* imposes substantial sparsity on loadings through a penalized minimization -- the result is that over half the loadings on bigrams for party affiliation and DW-NOMINATE are zero. This helps limit overfitting.

I can now look at the bigrams that are most predictive of Liberal or Conservative speeches

A gut-check confirms that these are sensible: liberal bigrams include “comprehens.immigr,” “colleg.cost,” “decent.wage,” and “deni.health.” while conservative bigrams include “unborn.child,” “burdensom.regul,” “job.keyston,” and “kill.regul.”

There are also a few that are more surprising – conservative legislators apparently talk a lot about spotted owls. It turns out, spotted owls were put on the endangered species list in 1994. There has been a long-standing debate about whether the human cost was worth it; a number of timber towns in the Pacific Northwest shutdown due to government regulation of the spotted owl’s habitat. One of the top Google results for “spotted owls and republicans” is an article titled “George Bush Hates the Spotted Owl."

We still want to get to a lower dimension. Under conditions detailed in Taddy 2013, a sufficient reduction (SR) score is given by $$z_s = \frac{(\pmb{\gamma} + \pmb{\varphi})x_s}{m_s}$$ where $r_s, D_s \ \perp \mathbf{x}_s, m_s \mid z_s$. Given these SR projections, we can ignore the full count vectors $\mathbf{x}_s$ and model text-sentiment (e.g. party affiliation) against the SR projection values.

The final step of MNIR is to fit a forward-regression model of $y_s \sim z_s$, where $y_s$ can be either DW-NOMINATE scores or party affiliation. I use a simple linear regression to estimate DW-NOMINATE scores and a Logit model for party affiliation.

I estimate DW-NOMINATE scores for each candidate-month of the 2012 election season, using count vectors of all text from speeches, debates, press releases, and statements for that month.

There are a number of notable inferences to draw from the plotted DW-NOMINATE estimates. First, there is no dramatic race-to-the center following the end of the primary election; Mitt Romney’s estimated DW-NOMINATE remains fairly flat at just below 0.4 until August 2012, when it drops closer to 0.2.

There a few ‘gut checks’ to help confirm that this accurately captures candidates ideology. While we may not have definite priors about time trends, it is useful to consider the cross-candidate ranking of ideology. In particular, we can see that Michele Bachmann is consistently ranked further right than all other candidates. Given that she has been dubbed the "Queen of the Tea Party" and has strongly opposed same sex marriage, abortion, universal healthcare, govern- ment regulation, and essentially all other ‘liberal’ ideas, this is not surprising. On the opposite end of the spectrum is Obama who, as the only Democrat, is exactly where we’d expect him to be.

I also divide all text into pre- and post- May 1st. By May 1st, the Republican primary was effectively over and Romney could safely shift focus towards the general election.

Here we slightly more succinct evidence that Romney moved towards the center in how he spoke following the primary.

Finally, the SR projections of the text data can be used for other types of models. The figure below instead estimates whether someone is a member of the GOP.

The results largely track the month-by-month trends of the DW-NOMINATE predictions, but with magnified variances. This is due to the party affiliation predictions being, essentially, a non-linear transformation of the DW-NOMINATE scores; the time trends are largely unchanged, but the peaks and valleys are magnified. The SR projections of $DW-NOMINATE_s$, $party_s$, and $m_s$ for each speaker $s$ being closely correlated with the predictions of the forward regression of those projections onto true DW-NOMINATE/party affiliation. For DW-NOMINATE, the predictions are a linear combination of the SR projections. For party affiliation, the predictions are from a logistic function, pushing predictions closer to 1 or 0.

Given that party affiliation predictions are largely just a scaling of DW-NOMINATE predictions, the results should not be interpreted as Romney effectively obfuscating his party affiliation during the general election (down to only $\sim 69\%$ probability of being Republican) while his ideology, as estimated by DW-NOMINATE, remained largely the same. Rather, the party affiliation results shed light on the significance of the apparently moderate change in DW-NOMINATE scores, which are not generally interpretable in terms of magnitudes or percent decreases (only in comparison to the distribution of other legislators). The party affiliation results suggest that the decrease in DW-NOMINATE corresponds to being almost positive that Romney is Republican ($\sim 94\%$) to there being a $\sim 30\%$ chance that he is Democrat, based on this speech.

Presidential candidates almost certainly engage in some "cheap talk," spouting what the voters want to hear. But the population of voters changes between the primary and general election. This may lead to sharp changes in how candidates address the voters -- there may be value to appearing more centrist once the general election begins to sway marginal, independent voters.

There are a number of meaningful improvements that could be made and it is difficult to make a causal argument with only a single observation (Romney) of a candidate in primary and general election. There are two extensions that would help make steps towards a causal argument. First, I could expand the sample size to include more presidential elections -- The American Presidency Project has text transcripts for the 2008 and 2016 elections. Second, I could more directly use polling data as a proxy for how competitive the election is, to see how much a competitive primary predicts shifts to the right.