It has been a while. As long as I'm being boring this week, let's really stretch our legs and take the dog for a walk.
Attributing the characteristics of a group to its individual members lies at the root of a vast portion of the bad logic in this world (and certainly a majority of the bad social science). In the most basic sensee, aggregation destroys data. And whether in life, politics, or academia, we often want but cannot have that data. We don't know what each member of an audience thinks of a performance; all we know is that the crowd applauded at the end. We want to know what kind of people voted for Obama, but we know only that 54% of voters did. We want to know if liberals or conservatives are experiencing more home mortgage foreclosures, but the only information we have are general foreclosure statistics.
A basic fallacy of aggregation assumes the actions or decisions of an individual based on a group. Mike was at the concert and the crowd cheered loudly, so Mike must have enjoyed it. White people like Arrested Development, so Ed likes it. These may be good guesses, but playing the odds is not the same as being logical. Knowing that Jim Inhofe is a Senator and that the Senate is about to confirm Sotomayor, would we conclude that Mitch McConnell voted to confirm her? Yeah, not really.
A second kind of fallacy is particularly prominent in sociological, political, and economic research because of the preponderance of pooled data (election results, jury verdicts, unemployment rates, the Gross National Product, etc.). A famous sociologist termed it the "Ecological Fallacy" in 1950. An ecological fallacy correlates pieces of aggregate data without evidence that a relationship exists among the data. Here is a basic example.
In the 2008 Election, residents in a particular state were asked to vote on Proposition 1. In City A, Prop 1 got 5% Yes votes. In City B, the Yes votes were 40%. Fine. But we also note that 5% of the population in City A is Latino – as are 40% of the residents of City B. Ergo we conclude, seemingly quite logically, that Latinos voted for Prop 1 and non-Latinos didn't. The percentage of Yes votes and Latinos is equal in both cities. Lacking detailed data about who voted for what and why, commentators often make leaps of faith along these lines. Here's the problem. What if Prop 1 is, "Should public services throughout the state, including schooling, be conducted solely in English?" City A has few Latinos, the ethnic group most likely to want or need services offered in a different language. Because there are few people who are likely to be native Spanish speakers in City A, voters there neither think nor care much about English-only laws. But in City B, the large Latino population makes the issue highly contentious and polarizes non-Latino voters. So City B is 40% Latino, but the 40% voting for Prop 1 are the white people who feel threatened by the multilingual environment in which they live. The fact that the Yes votes and percentage of Latinos in each city are equal does not imply a direct correlation. The true relationship among the data, in this hypothetical, is entirely different than the numbers would suggest.
The foreclosure maps which are popping up in newspapers and around the interwebs are just too tempting for many people. Note the county-level foreclosure rate, throw in some election data, and start making conclusions about partisan balance and imprudent lending. Were it that simple, I and most of my colleagues would be out of a job and contributing mightily to the foreclosure landscape. The data lost in aggregation are often of great interest but no amount of rationalization can recreate them from the pale substitute of numerous data points smashed together in one big, indistinct pile.