{"id":399,"date":"2008-10-27T00:50:47","date_gmt":"2008-10-27T05:50:47","guid":{"rendered":"http:\/\/www.jasonmorrison.net\/content\/?p=399"},"modified":"2008-10-27T01:22:20","modified_gmt":"2008-10-27T06:22:20","slug":"baby-name-significance-and-other-gratuitous-statistics-puns","status":"publish","type":"post","link":"http:\/\/www.jasonmorrison.net\/content\/2008\/baby-name-significance-and-other-gratuitous-statistics-puns\/","title":{"rendered":"Baby Name Significance (and other gratuitous statistics puns)"},"content":{"rendered":"<p><a class=\"tt-flickr tt-flickr-Small\" title=\"Twisted tree branches\" href=\"http:\/\/www.jasonmorrison.net\/content\/photos\/photo\/2826252987\/twisted-tree-branches.html\"><img loading=\"lazy\" decoding=\"async\" class=\"alignright\" src=\"http:\/\/farm4.static.flickr.com\/3072\/2826252987_58b1610cec_m.jpg\" alt=\"Twisted tree branches\" width=\"240\" height=\"161\" \/><\/a><\/p>\n<p>Now that we have more than 10,000 votes in <a href=\"http:\/\/www.jasonmorrison.net\/content\/2008\/hey-internet-help-us-name-our-child\/\">our baby name poll<\/a> I can start doing some basic statistical analysis.\u00a0 One of the things I&#8217;d like to do is figure out which names are popular in our poll, but still relatively unique compared to all those other babies being named out there.<\/p>\n<p>Before I get to that, though, I want to make sure that our vote totals are significantly different from random.<\/p>\n<p><em>Heads up:\u00a0 What follows is a basic intro to some concepts in statistics that I&#8217;m writing mainly to keep myself sharp.\u00a0 I haven&#8217;t done much research recently and I don&#8217;t want to get rusty.\u00a0 Feel free to read along, at the end I&#8217;ll show you how to detect the influence of Australians.<\/em><\/p>\n<p>Since the data for names included in the poll is completely different from the write-in votes, we&#8217;ll concentrate on the pre-selected names for now.<\/p>\n<p><!--more--><\/p>\n<p><strong>Testing for statistical significance<\/strong><\/p>\n<p>We want to make sure that the number of votes we see for Olivia vs. Ada vs. Erin, etc., are very unlikely to be the result of some random process.\u00a0 The null hypothesis (which is another way to say &#8220;most boring result&#8221;) is that any differences in the vote totals can be attributed to chance.\u00a0 That doesn&#8217;t mean the votes for all names would be exactly equal, but it does mean the the variance would be low enough that we couldn&#8217;t say with a high amount of certainty that it wasn&#8217;t just chance.<\/p>\n<p>Confused yet?\u00a0 Let&#8217;s look at a simpler example &#8211; instead of having 10,000 people choose between 8 or 9 names, think about flipping a coin.\u00a0 On average you should get heads 50% of the time and tails 50% of the time. If you don&#8217;t hit 50% heads, and instead get 60% or 80% or some other number, then you might think something is up.<\/p>\n<p>But you can&#8217;t quite stop there and say the coin is unfair.\u00a0 Think about it this way &#8211; if you only do two flips you have a 25% chance of getting heads twice.\u00a0 So even though there&#8217;s a big difference between what you expected (50%) and what you got (100%), if you ran the same two-coin flip experiment again and again you should expect to see this result 25% of the time.\u00a0 Now, if you flip a coin 10,000 times, and get heads every single time, it&#8217;s very, very unlikely that the result is due to chance.\u00a0 Even if you do the 10,000-coin flip experiment 100 times, getting all heads even once is still extremely unlikely.\u00a0 You would be pretty justified in investigating the coin further.<\/p>\n<p>In academic research we generally want to set a pretty high standard to make sure the results are significantly different from random chance.\u00a0 So we want data that is statistically significant to the .05 level.\u00a0 That means that if I did my coin flipping experiment 100 times, I would expect to see results that look like my data less than 5 times.\u00a0 It&#8217;s even better if your data is significant at the .01 level.<\/p>\n<p>So, back to baby name votes.\u00a0 I plugged my data into SPSS and ran a <a href=\"http:\/\/faculty.chass.ncsu.edu\/garson\/PA765\/chisq.htm\">Chi-Square test<\/a>:<\/p>\n<table border=\"0\">\n<tbody>\n<tr>\n<th>Name<\/th>\n<th>Observed N<\/th>\n<th>Expected N<\/th>\n<\/tr>\n<tr>\n<td>Ada<\/td>\n<td>1110<\/td>\n<td>808.0<\/td>\n<\/tr>\n<tr>\n<td>Alexandria<\/td>\n<td>614<\/td>\n<td>808.0<\/td>\n<\/tr>\n<tr>\n<td>Alexis<\/td>\n<td>859<\/td>\n<td>808.0<\/td>\n<\/tr>\n<tr>\n<td>Althea<\/td>\n<td>380<\/td>\n<td>808.0<\/td>\n<\/tr>\n<tr>\n<td>Ozma<\/td>\n<td>234<\/td>\n<td>808.0<\/td>\n<\/tr>\n<tr>\n<td>Athena<\/td>\n<td>817<\/td>\n<td>808.0<\/td>\n<\/tr>\n<tr>\n<td>Cassia<\/td>\n<td>782<\/td>\n<td>808.0<\/td>\n<\/tr>\n<tr>\n<td>Eirn<\/td>\n<td>869<\/td>\n<td>808.0<\/td>\n<\/tr>\n<tr>\n<td>Olivia<\/td>\n<td>1607<\/td>\n<td>808.0<\/td>\n<\/tr>\n<tr>\n<td>Total<\/td>\n<td>7272<\/td>\n<td><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>Chi-Square: 1593, Significant at the .01 level.<\/p>\n<table border=\"0\">\n<tbody>\n<tr>\n<th>Name<\/th>\n<th>Observed N<\/th>\n<th>Expected N<\/th>\n<\/tr>\n<tr>\n<td>Aaron<\/td>\n<td>609<\/td>\n<td>942.1<\/td>\n<\/tr>\n<tr>\n<td>Alexander<\/td>\n<td>1274<\/td>\n<td>942.1<\/td>\n<\/tr>\n<tr>\n<td>Dylan<\/td>\n<td>1082<\/td>\n<td>942.1<\/td>\n<\/tr>\n<tr>\n<td>Eric<\/td>\n<td>551<\/td>\n<td>942.1<\/td>\n<\/tr>\n<tr>\n<td>Isaac<\/td>\n<td>1202<\/td>\n<td>942.1<\/td>\n<\/tr>\n<tr>\n<td>Levi<\/td>\n<td>1107<\/td>\n<td>942.1<\/td>\n<\/tr>\n<tr>\n<td>Nikolas<\/td>\n<td>1000<\/td>\n<td>942.1<\/td>\n<\/tr>\n<tr>\n<td>Sydney<\/td>\n<td>712<\/td>\n<td>942.1<\/td>\n<\/tr>\n<tr>\n<td>Total<\/td>\n<td>7537<\/td>\n<td><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>Chi-Square: 578, Significant at the .01 level.<\/p>\n<p>So I can be pretty confident that these vote totals really do express some kind of preference by the voting population, and aren&#8217;t likely to be just random.\u00a0 This is great news, because it means I can go ahead with my comparison between the poll here and nation-wide baby name popularity.<\/p>\n<p><strong>Crosstabs and correlations<\/strong><\/p>\n<p>Another interesting thing to look at is correlation.\u00a0 If a person votes for Cassia on the girl&#8217;s side, are they more likely to vote for any particular boy&#8217;s name?<\/p>\n<p>Since you can&#8217;t put the baby names into any kind of numerical order, they are considered nominal variables. Nominal variables often look like categories: things like marital status, gender, favorite candidate in an election.<\/p>\n<p>One way to compare two variables is to look at a big table with all the girls names along the rows and the boys names along the columns, and see how many voters end up in each cell. \u00a0 This is a cross-tabluation or crosstab.<\/p>\n<p>Cramer&#8217;s V is a measure of association between nominal variables.\u00a0 The closer the result is to 1, the more the two variables are associated.\u00a0 I ran a Crosstabs analysis in SPSS and got a score Cramer&#8217;s V of .098, significant below .01.\u00a0 It turns out that the two name choices are pretty independent.<\/p>\n<p>This doesn&#8217;t mean that we didn&#8217;t have some interesting intersections.\u00a0 For example, people who voted for Ozma were more likely to vote for Sydney (63 total, when the expected value is about 20).\u00a0 This is probably explained by the huge surge in votes from Australia after the story was featured in the Sydney Morning Herald.<\/p>\n<p>People who voted for Alexandria also voted for Alexander more than expected (158 rather than 95), which seems pretty reasonable, and similarly Erin went with Aaron (118 vs. 64 expected).<\/p>\n<p><a href=\"http:\/\/spreadsheets.google.com\/pub?key=ppevxmL24UqlF5Y4J2KmNIg&amp;output=html&amp;gid=12&amp;single=true\">You can see the whole crosstab here<\/a>.<\/p>\n<p>Next up:\u00a0 how to pick a baby name that&#8217;s popular with voters but not too common.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Now that we have more than 10,000 votes in our baby name poll I can start doing some basic statistical analysis.\u00a0 One of the things I&#8217;d like to do is figure out which names are popular in our poll, but still relatively unique compared to all those other babies being named out there. Before I [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"footnotes":""},"categories":[19],"tags":[349,496,515,517,516,462,464,514,513],"class_list":["post-399","post","type-post","status-publish","format-standard","hentry","category-blog","tag-academic-research","tag-baby-names","tag-chi-square","tag-cramers-v","tag-crosstabs","tag-poll","tag-spss","tag-statistical-analysis","tag-statistical-significance"],"aioseo_notices":[],"_links":{"self":[{"href":"http:\/\/www.jasonmorrison.net\/content\/wp-json\/wp\/v2\/posts\/399","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/www.jasonmorrison.net\/content\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/www.jasonmorrison.net\/content\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/www.jasonmorrison.net\/content\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/www.jasonmorrison.net\/content\/wp-json\/wp\/v2\/comments?post=399"}],"version-history":[{"count":5,"href":"http:\/\/www.jasonmorrison.net\/content\/wp-json\/wp\/v2\/posts\/399\/revisions"}],"predecessor-version":[{"id":403,"href":"http:\/\/www.jasonmorrison.net\/content\/wp-json\/wp\/v2\/posts\/399\/revisions\/403"}],"wp:attachment":[{"href":"http:\/\/www.jasonmorrison.net\/content\/wp-json\/wp\/v2\/media?parent=399"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/www.jasonmorrison.net\/content\/wp-json\/wp\/v2\/categories?post=399"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/www.jasonmorrison.net\/content\/wp-json\/wp\/v2\/tags?post=399"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}