Featured Post

New book available! David Kaiser, A Life in History

Mount Greylock Books LLC has published my autobiography as an historian,  A Life in History.   Long-time readers who want to find out how th...

Friday, December 08, 2017

Searching for truth

This week I read a remarkable book by a 30-something named Seth Stephens-Davidowitz, Everybody Lies.  Seth, which I shall call him for obvious reasons, has a graduate degree in economics and writes for the New York Times, but he is not mainly interested in purely economic issues any more.  Instead he is interested on what “Big Data”—the enormous amounts of digital data that go along with our new way of life—can tell us about various aspects of our life.  Seth has a very quick mind, a flair for simple, punchy sentences, and a sense of humor.  His work reminds me somewhat of the early work by Bill James, the founder of modern sabermetrics, or statistical baseball analysis, with whom he clearly feels some affinity.  The book has a lot to say about different areas of American life, but I will focus on what he has to say about two sensitive topics, sex and politics. (Religion, apparently, will have to wait at least until his next book.)

Seth’s findings about sex illustrate the difficulties of matching public morality and real human behavior.  Both the political right and the political left have very definite ideas nowadays about relations between the sexes (and between consenting adults of the same sex), but neither one of them seems to match what the bulk of the population is doing.  Seth uses two main sources: statistics on google searches, which we can all access, it turns out, using a readily available app called google trend, and statistics he acquired from pornhub.com, one of the nation’s leading pornography sites.  He points out that google searches are especially useful because people will ask google questions that they literally would not ask any human being face to face.  (Thinking about this carefully, I realized I had done this more than once myself.)  People lie on Facebook all the time—one reason for the title of the book—but they give an accurate idea of what they are worried about on google.  That leads to interesting findings.

Using search data from pornhuh.com, one of the leading pornography sites, Seth estimates the gay male population of the country to amount to 4-5% of all men—a much lower estimate that Alfred Kinsey’s 10%, which was probably skewed, he explains, by his sample, which included a good many prison inmates.  This is a very straightforward calculation: about 5% of all porn searches by men on pornhub ask for gay porn. The search data also suggests that there are roughly as many gay men in red states as in blue.  Other data from google, as we shall see, tends to confirm that there are far more men in the closet in red states than blue.

What do people worry about concerning sex?  Men, few will be surprised to learn, are very concerned about the size of their penises, and often wonder how they might be increased.  Straight women, on the other hand, are much more likely to complain about excessively large penises than excessively small ones.  Women’s concerns about their appearance, on the other hand, have shifted.  The posterior has replaced the breasts as the main source of anxiety during the last decade, and women are more likely to worry that their rear end isn’t big enough.  I couldn’t help but wonder, although Seth never brings this up, whether this might have had something to do with one of the most visible women in the United States, the last First Lady, who was well endowed in this respect.  A great many women are also worried about certain bodily odors, but I shall leave the details of those concerns to those who want to read the book for themselves.  There is one very encouraging piece of data from porn searches that more people should take to heart.  Just as the New Yorker facebook movie page, where I now contribute, tells us that there is no movie so bad that it does not have at least a few fans out there, it turns out that there is no aspect of physical appearance, including obesity, that does not excite some measurable portion of the population.  In fact, Seth has commented in an interview I read that if if men and women were willing to use dating sites to search for what they really wanted, instead of what society tells them they should want, they would be a lot happier.

I was very struck, however, by the data google provides on sex within relationships. Relatively few people seem to ask google why their partners want so much sex; instead, they (and especially women) often ask why their partner seems to want so little.  A surprisingly popular question among women is, “how do I tell if my husband is gay?”—and this question is especially popular in the deep red South, perhaps (see above) because there are quite a few closeted gay men there.  I did get the feeling that the sexual revolution and new courtship customs for 20-somethings may have done a lot of harm to marital sex.  People, especially better-educated people, now tend not to get married until the most intensely sexual phase of their relationship is over.  I also wondered about the effect of infinite, free, readily available hard core pornography on the sex lives of young Americans.  It does not seem to have improved it, and indeed, I know from other sources that there is a measurable population now, including some women, who are so addicted to masturbation while watching it that they can no longer do the real thing.

For whatever reason, Seth had much less to say about what big data tells us about women’s sexuality, but he does report one very politically incorrect fact.   I quote: “Fully 25 percent of female searches for straight porn emphasize the pain and/or humiliation of the woman—‘painful anal crying,’ ‘public disgrace,’ and ‘extreme brutal gangbang,’ for example.  Five percent look for nonconsensual sex—‘rape’ or ‘forced’ sex—even though those videos are banned on PornHub. And search rates for all these terms are at least twice as common among women as among men.”  I will leave it my readers to absorb that data on their own—except to say that those findings would not have surprised my favorite author on relationships and sex, the late Nancy Friday, whose compilations of female sexual fantasies include quite a few along those lines.

Now let us turn to politics—where Seth gives us only one major finding, but one which has really set me thinking.

That finding has to do with race in politics, and its apparent impact in each of the last presidential elections.  During the last year and a half we have all had occasion to ask whether racism is in fact a much more powerful force in the US than we had come to believe. The answer is yes.

Although Seth is an Xer, not a Boomer, he shares the Boomer view of 50 years ago that words do not kill, and does not sugar coat his message.  I won’t either.  The main instrument he uses to measure racism in the US is searched using the word “nigger,” of which there are about seven million every year in the US.  That is comparable to the number of searches for “economist” or “migraine.” Sadly,. 150 years after the Civil War and 50 after the great civil rights acts, no other racial or ethnic prejudice in the US remotely compares to racism.  I quote: “Searches for ‘nigger jokes’ are seventeen times more common than searches for ‘kike jokes,’ ‘gook jokes,’ ‘spic jokes,’ ‘chink jokes,’ and ‘fag jokes’ combined.” [emphasis added.]  Such searches tend to spike whenever black Americans are in the news—most notably, of course, during the presidential elections of 2008 and 2012.  And when the searches were broken geographically, a clear pattern emerged. There was a negative correlation in 2008 and 2012 between the number of searches for “nigger” in a given area and the vote for Barack Obama, and a positive correlation in 2016 between the number of those searches and the vote for Donald Trump.  And those areas included upstate New York, western Pennsylvania, eastern Ohio, West Virginia (all those, actually, part of a contiguous region), industrial Michigan, and rural Illinois, as well as Appalachia and parts of Louisiana and Mississippi.  Overall, comparing Obama’s vote in 2008 and 2012 to John Kerry’s in 2004, Seth estimated that racism had cost Obama about four percentage points in the national vote.  When he and a collaborator first wrote an article presenting those results, quite a few academic journals refused to publish it, but one eventually did.

Now there are some possibly complicating factors which Seth might have paid more attention to.  In these mostly white areas, Obama’s race cost him perhaps 4% of the national vote.  But his race seems to have earned some additional votes elsewhere, not only from his fellow black Americans, but also from young people who were inspired to vote by his status as the first major non-white presidential candidate.  Analysis of the 2016 vote has shown that many of those voters did not turn out to vote for Hillary Clinton.  We are dealing here, it seems to me, with an effect of partisanship. A candidate who appeals to his or her party’s activists and seems out of the mainstream, such as Obama or Trump, will draw more votes from his own side, but may be less effective at winning votes from the other side.  That dynamic, if it exists, will make it very difficult for any candidate to be elected with more than a narrow majority.

The other huge question this data raised in my mind, however, was the role sexism in the 2016 election.  I went to google trend myself, and I did searches for “Hillary Clinton” and “bitch.”  Searches for Clinton’s name alone, clearly, would not distinguish supporters from opponents.  Now the time line for those searches showed huge spikes during two recent years—2008 and 2016, when Clinton was running for President.  The 2008 spike took place in the early spring, at the height of the primary season, and the 2016 one took place in the fall.  The largest number of 2016 hits occurred in some of the same places the ones indicating racism did—in Mississippi, Louisiana, Kentucky, and Indiana  Pennsylvania, and Indiana, but also in Wisconsin and Minnesota.  But when I used another feature of the program and compared the numbers to searches for “migraines,” the term that had been about as frequent as the racist searches when Obama was running they were much less numerous.  A few other abusive search terms turned up even fewer hits.  Unless I simply failed to think of the most revealing search, it seems that overt sexism is now much less of a factor in our politics than overt racism.  Clinton seems to have lost critical states not because so many people hated her, but because too few people were moved to come out and vote for her.

A good deal of Everybody Lies is devoted to exploring uses and potential uses of big data and what it might do to us.  Seth effectively describes one of its pitfalls, the pitfall of dimensionality, which has led to some misleading articles tying particular genes to particular human characteristics or diseases.  Essentially, there are so many thousands of variables within our chromosomes that, given a large sample of human beings with a particular problem, chance will produce an apparent match or two.  But those will be random matches with no predictive value for the future. Everybody Lies does not seem to have been very widely reviewed. I recommend it.


Energyflow said...

I have occassionaly commented on your blog and made a comment using particular forbidden terms within a certain political context and in a moment of anger but I never expected to see so many here at once plus so much sex talk openly. Maybe you will get some broader young readership:). As to % of gays I recently saw a lecture video discussing gender concept, sexuality, etc. The scientist gave, iirc, 3 or 4 % gay plus 1% confused. There is a lot of talk about mixed genders also being pushed in school curricula and he tried to pin that down as nonsense. Being gay can be a mainly y chromosome defect, anomaly. We are all conceived female and men are 'defective' then some of us males develop wrong completely and some don't get a clear message. He tried to make a neutral scientific message as it seems some biologists are tired of progressive politics hijacking their field, as if everyone could wake up each morning and choose one of 50 or moregenders before breakfast. Now taking that further, if gayness is normal to a small percent in all mammals then it must have a reasonable function like artist, shaman, etc. in ancient cultures and not all individuals in species reproduce it is useful to have certain types in between to help communicate, share between sexes. Particularly in difficult transitional historical phases where sexuality is a dominant theme, 60s-70s, summer in generational theory. Where yin energy is most dominant before men return to power in cycle. I don't use facebook but I guess lying in casual conversation is normal. I recall my research on penis size coming up with averages for races, Euro-white 12 cm, asian 7 or 8 cmm and black african 18 cm. Of course length is not width and watching porn can be eye opening n all strange subjects. It is also monotonous. Like any short theater pieces the human element is all that really gives it any saving grace. Sex in relationship can also become monotonous. Sex actresses last six months apparently before burning out and are quite young. This takes a lot of intense emotional energy of course and so much contact, hundreds, gets to one emotionally. It obviously also gets to viwers, particular younger generations, starting now in grade school, who form their sexual concepts from it, instead of through playboy magazines of their fathers, dirty jokes shared on playground, etc. I have heard they cannot properly enjoy sex as the expectations are all there and nobody can meet them( like young anorexic women looking at models). We older people had relationships before this internet scourge and certainly only very progressive groups like gays talked massively about all various techniques, anal, golden showers, etc. which were widely avoided by the masses but popularized through internet, news stotries. Apparently most sex videos are fantasy based, forbidden sex, public sex, sex with other races, etc. What is not allowed you want to watch at least once, apparently. Maybe women seek rape videos due to abuse, like s+m men had a dominant mother, etc. As to race, attitudes change extremely slowly on average, as over decades it swings back and forth to a central gravity, like with yin-yang, there is a natural, scientific center. Until we are all a bit black then it will be an issue of contention over milennia perhaps. Male/female will always exist, thank god. Vive la difference!

RML said...

This post, while interesting, has quite a bit of lazy analysis. I haven't had a chance to read this book, so I'm not sure about the author's own analysis and I can only hope it is better than your own. I've added it to my reading list and will hopefully have a chance to read it and inspect for myself.

The idea that simply looking at percentages of people that look at gay porn accurately predicts the amount of gay people is predicated on the idea that only people who want a particular sex act look at it. However, given the popularity of "lesbian" as a search term among men, it's hard to believe that. Maybe I'm wrong and only gay men watch gay porn (and no gay men choose to watch different porn) but there's no evidence to support that (and again, I admit I have not read the book so there may be some evidence there).

Your own foray into the world of big data to find out if Clinton faced sexism is also questionable from a methodological point of view. While I'm pleasantly surprised to find that you didn't find overt sexism in your search term, I would point out a few things:

1. In my searches, bitch was much more common than migraine, though I concur that it does not spike in Hillary's campaigning years. I'm not sure if I've misread your sentence somehow, but you seem to suggest the opposite.
2. Bitch is used in a number of different contexts (in AAVE, in reclamation of the term, etc.) whereas the n-word has no use other than an insult - they are not comparable words.
3. You jump from "it seems that overt sexism is now much less of a factor in our politics than overt racism" to "Clinton seems to have lost critical states not because so many people hated her, but because too few people were moved to come out and vote for her" with no explanation. Given the number of men in the media who been revealed to be serial harassers of women, I find it hard to believe that sexism wasn't a huge factor in her loss, even if overt sexism was not.

It is easy enough to play around with big data and come to some conclusions, but the reality is that coming to methodologically sound conclusions from big data is hard work. I would suggest reading Big Data, Little Data, No Data by Christine L. Borgman for a better idea of the downfalls.

Bozon said...

Very interesting topic and post.

Watching his presentation in Britain today, I was struck by how his race findings, plural, relate to racist findings across all state and party lines dotted across America. This is what I have been saying on my blog, whereas you have usuelly painted white racism as a continuing Southern white racist problem.

Here is your comment on his one finding:
"Now let us turn to politics—where Seth gives us only one major finding, but one which has really set me thinking...That finding has to do with race in politics, and its apparent impact in each of the last presidential elections."

All the best

David Kaiser said...


The author in fact does say that racism is concentrated in certain areas, not "all across state and party lines." He says it's very low in most of the west. You are correct that those areas are not all in the South, but it isn't evenly distributed.

Bozon said...


Thanks for this correction. That was my imprression, from his talk only.
Maybe 'dotted' is too vague for this context, too.
Here is the lecture I mentioned. Maybe the lecture and the book differ as well.


All the best