Hard News: The Big 2012 US Election PAS Thread
David Hood, in reply to
That is a scientist with deep committment to his research!
As occasional reading I am gradually working my way through the long outofprint "Ignition! An Informal History of Liquid Rocket Propellants" by John D. Clark, which is a somewhat hairraising memoir of the development of rocketry.
"Phil Pomerantz, of BuWeps, wanted me to try dimethyl mercury, Hg(CH3)2, as a fuel. I suggested that it might be somewhat toxic and a bit dangerous to synthesize and handle, but he assured me that it was (a) very easy to put together, and (b) as harmless as mother's milk. I was dubious, but told him that I'd see what I could do. I looked the stuff up, and discovered that, indeed, the synthesis was easy, but that it was extremely toxic, and a long way from harmless. As I had suffered from mercury poisoning on two previous occasions and didn't care to take a chance on doing it again, I thought that it would be an excellent idea to have somebody else make the compound for me. So I phoned Rochester, and asked my contact man at Eastman Kodak if they would make a hundred pounds of dimethyl mercury and ship it to NARTS. I heard a horrified gasp, and then a tightly controlled voice (I could hear the grinding of teeth beneath the words) informed me that if they were silly enough to synthesize that much dimethyl mercury, they would, in the process fog every square inch of photographic film in Rochester, and that, thank you just the same, Eastman was not interested." 
Rich of Observationz, in reply to
That's an awesome book  I believe I heard of it from Derek Lowe's "things I won't work with" blog series.

Thanks Rich!
Another blog to while away my morning coffees with 
Islander I am shocked......spell check please! You whiley young thing!
Edit: Shame shame shame on me. But I DO see it is now acceptable both ways!!!
Edit 2: oh dear...it is worse that that!! <hole in ground> 
Martin Lindberg, in reply to
Eric Dondero, who I think Russell linked to earlier, really is full of the win.
Surely the guy is some kind of political performance artist.

Islander, in reply to
8>))))
I wish there was a function wherebye I could put tiny wee pointed teeth inbetween my smiles )
but  nemmind

steve black, in reply to
In case anybody is still interested in the 92% all these hours later...
I'm your friendly resident statistician. That's the good news. The bad news is I haven't read Nate Silver's methodology yet. But if the description of it as a Monte Carlo simulation is right then I can help you. I used that technique for my Masters Thesis, although they made me do "real work" for my PhD because "simulation" wasn't considered real enough back then before the revolution.
Then they add up the number of times they see a given result, in this case 92% of the times they ran the model it said Obama won.
Exactly so. The 92% is an enumeration of the outcomes for the simulation in which Obama won as a percentage of all outcomes. This stands in for a probability. Again, the important thing is that the model is properly constructed and range testing on assumptions is adequate. It is even possible (but much more work) to construct confidence limits for the 92%.
But you simply can’t treat it like a dice roll or coin toss and say they’ll be wrong once every 8 or 9 elections. It just isn’t that kind of stats.
Spot on. The concerns about being wrong 1 in N elections are not well founded. Sounds like the law of averages at work to me. 2016 is not another "coin flip" following on from 2012, and there is no chain of probability. Once you get past very simple probability models it is advisable for professionals like doctors and lawyers to consult professional statisticians. We wouldn't want another Meadow incident would we?
And I wouldn't be as kind (diplomatic?) as Russell is:
I would just like to point out that when Corin Dann explained this story to One News viewers this week, he said that Nate Silver had forecast that “Obama will win by 92%”.
This data journalism thing clearly has a way to go yet.
This kind of reporting does not meet the professional standards for accuracy. Allowing "Obama has a 92% chance of obtaining enough Electoral College votes to become the President" to morph into “Obama will win by 92%” isn't good enough. Sloppy. Not uncommon. But still Sloppy.

David Hood, in reply to
The bad news is I haven’t read Nate Silver’s methodology yet
Allow me to point you in the right direction.
That said, he has been criticised for keeping the finer detail of some of his adjustments private, unlikely some of the academic based sites, which are completely open in their processes. 
steve black, in reply to
Thank you. Now I have no excuse not to get on it right away.
Keeping some of the finer detail of some of his adjustments private isn't playing the academic game properly, but if he wants to sell his results to somebody to earn a crust (he did say he was a liberal libertarian or something was it?).

Brad Luen has put together an open script for aggregating NZ's polling data, weighted to actual election results and DimPosted. It's an excellent effort. Unfortunately we lack the fine polling that the US has, which would be equivalent to polling at the electorate level. Does anyone know if this has been done by parties in NZ?

David Hood, in reply to
Please don’t mention Baysian Statistics
How about bringing it up in comic form, with a very timely new xkcd

Danielle, in reply to
It just seemed to me in the end that my guy simply hated Obama.
"I don't want a black guy to be president" isn't really an acceptable response in polite company these days.*
*I'm not saying your guy consciously thinks that. But the irrationality on display in some quarters seems so obviously informed by underlying racist anxieties.

So having read the methodology for the 538 project (thanks David Hood), it is more based in doing good meta analysis rather then simulation. The point estimates are done by "traditional statistical methods", as are most of the confidence limits for each point estimate.
The use of simulation seems to be limited to the construction of confidence limits in the case where we want to know the confidence limits for a compound event (N things happening together). This use of simulation is the domain of resampling methods which is what I alluded to earlier. It is basically a way to get confidence limits for a statement like "Obama will achieve N Electoral College delegates" which are corrected for the fact that we are using point estimates for a number of independent races (within each state) but each of the state level point estimates is itself subject to confidence limits.
So we start out with a stand alone proposition like "Obama takes the Electoral College delegates for California" and then we keep adding propositions which creates longer and longer chains of compound events such as "Obama takes the Electoral College delegates for California" AND "Obama takes the Electoral College delegates for Illinois" AND... which leads to having enough delegates to win. As that NY Times decision tree had it, there are a number of scenarios to consider about who takes which state.
This is why concerns about being "wrong N times out of so many coin tosses" get tossed out. That has been corrected for in the calculation of the confidence limits for the proposition "Obama will achieve N Electoral College delegates" as a compound event. Note that there may well be two levels of this correction going on, one for counties (districts within states  I guess that's "electorates" to us) and one at the state level. I may have simplified that discussion too much.
Would that I could express it better so that the "intelligent reader" would not have their eyes glaze over.

steve black, in reply to
Brad Luen has put together an open script for aggregating NZ’s polling data, weighted to actual election results and DimPosted. It’s an excellent effort. Unfortunately we lack the fine polling that the US has, which would be equivalent to polling at the electorate level. Does anyone know if this has been done by parties in NZ?
Good old R. I quit having anything to do with political polling a few elections ago. I did work on behalf of a party, but all such details are confidential of course. And that's one of the problems: I am 95% confident (you asked a statistician after all *wink* ) that polling of individual electorates is still done in "marginal seats" by parties with enough money. Perhaps they can now afford to do all electorates. But that sort of thing is never made public.
The real fly in the ointment for predicting outcomes in our MMP environment goes deeper than just lack of electorate level polling. We can't always predict how a particular party might cobble together enough seats from other parties to form a Government. That adds another level of uncertainty which is currently in the domain of the Pundits. I wouldn't know where to begin to do a computer simulation of that. Maybe Nate Silver could; he seems much more clever than I am.

BenWilson, in reply to
This is why concerns about being "wrong N times out of so many coin tosses" get tossed out.
Which, on my understanding, is a rewording of the very definition of "X% likely". Is there a better word for what this kind of confidence used in the 538 modeling is? Or is a phrase like "92% likely" simply something that doesn't convey any clear information because the definition of %likely is not set?

BenWilson, in reply to
Email Twitter
I wouldn't know where to begin to do a computer simulation of that.
The number of combinations of parties isn't too huge. The number that could form a government is even less. But how to work out what the chances are of one of the combinations being more likely than another is where I lose the plot. A matrix of friendliness between the parties?
Perhaps a matrix like that could be polled off the population, using the assumption that people that vote for a party are represented by the representatives, so some kind of averaging of their individual matrices of preference could represent what the representatives are likely to feel. How to assign confidence to these figures, though? Would need a lot more past data than I can presume exists.

Sacha, in reply to
Brad Luen has put together an open script for aggregating NZ's polling data, weighted to actual election results and DimPosted.
And those orange dots scattered along the bottom identify topical events when you point to them.

Sacha, in reply to
the assumption that people that vote for a party are represented by the representatives
heh

steve black, in reply to
The number of combinations of parties isn’t too huge. The number that could form a government is even less. But how to work out what the chances are of one of the combinations being more likely than another is where I lose the plot. A matrix of friendliness between the parties?
Perhaps a matrix like that could be polled off the population, using the assumption that people that vote for a party are represented by the representatives, so some kind of averaging of their individual matrices of preference could represent what the representatives are likely to feel. How to assign confidence to these figures, though? Would need a lot more past data than I can presume exists.
Actually, I was thinking more of having to create a neural simulation of the brain of Winston Peters, given that I seem to remember he has prior form in saying something like "Vote for me. I would never go into coalition with Party A" and later going into coalition with Party A. The idea of that puts me right off trying.
Which, on my understanding, is a rewording of the very definition of “X% likely”. Is there a better word for what this kind of confidence used in the 538 modeling is? Or is a phrase like “92% likely” simply something that doesn’t convey any clear information because the definition of %likely is not set?
I don't know exactly what the 92% likely to refers to. Hopefully it means there is a 92% chance that the statement "Obama gets 270 or more Electoral College votes" is true. What's missing of course is reporting of confidence intervals and such. Remember that when I talked of enumeration of scenarios before, that was when I hand't read the methodology section and thought more of it was done by simulation modeling based on what had been said here.
Maybe the whole basis of what the 92% means has been formally stated somewhere and I haven't seen it. But in most situations you wouldn't lay out the careful definition of all your probability statements in something for general public consumption. I'm sorry to be the bearer of bad tidings, but Journalists can't handle something as complex as percentages and more generally, numbers with denominators in many cases.
But while we are linking to StatsChat here is an interview re: predicting things accurately here in NZ

And then I remembered this about what the 72% means back from when the number de jure was 72% rather than 92%. There are some interesting links there to follow up.
Note I don’t have trouble with anybody saying “72% chance of winning” and “too close to call” and believing in both (Red Queen anybody?) for a different reason (at least I think it’s different) from the ones given. I’d just say the “too close to call” is in fact a personal probability statement which reflects how much somebody might bet on the outcome (if at all) and at what odds. For me that’s a different domain from the 72%. The “too close to call” comes from how risky you as an individual are willing to be – and that gets us back to “how certain do you want to be that the real percentage is 72%?” Every “hard number” turns into a pool of probability underneath if you look closely enough. Fortunately in this case the cat is still alive when the box is opened. Even if the fox isn’t even aware that it is dead.
Some context is that in "gold star" science you usually want to be 95 or 99 percent certain. Nate went with 92%, but he is more of a poker player I suspect.

Matthew Littlewood, in reply to
Some context is that in "gold star" science you usually want to be 95 or 99 percent certain. Nate went with 92%, but he is more of a poker player I suspect.
Funnily enough, Silver's admitted he's used his skills in poker player to clean up in online and "friendly" tournaments. He once joked that he became interested in politics when various states started outlawing online poker so he began writing to Congressmen about it :)
Meanwhile, it looks like first of the Obama/House Republicans faceoffs has already started.. I would expect Obama not to budge as easily as he done in the past.

Peter Green, in reply to
The graph in that second link was made with my script here. Brad disagrees with some of the assumptions used (e.g. fixed "house effect" and constant smoothing parameter), so I don't want to tar him with any of my simplifications.

BenWilson, in reply to
Email Twitter
Hopefully it means there is a 92% chance that the statement "Obama gets 270 or more Electoral College votes" is true. What's missing of course is reporting of confidence intervals and such.
So 92% isn't the confidence interval itself, relating to the statement "Obama gets 270 or more Electoral College votes", calculated from a probability density function of how many votes he will get (however they came by that p.d.f)?

David Hood, in reply to
Email
So 92% isn’t the confidence interval itself, relating to the statement “Obama gets 270 or more Electoral College votes”,
You are kind of sorta correct that it is not the case that 92% is the confidence interval. Let's try this summary of Step 7 of Nate's methodology:
From the polls (plus some inhouse adjustments known only to Nate), each state race has an expect result, and some uncertainty.
Random numbers are generated of potential electoral vote on the basis of that uncertainty, and many, many, many mock elections are held (there is a bit more to this on the way the state results connect together in various ways so each states uncertainity is not independent of the other states).
Looking at every single result of the mock elections 92% of the time Obama got 270 or more of the electoral college votes, with the average number of electoral college votes across all potential electoral results 313, but the single most likely result 330 (the mode, and with Florida likely to go Democrat Nate Silver seems to have been 1 of 2 analysts I have seen who had that as the most likely outcome number).
And here is where I am pushing the boundaries of my knowledge: to some extent, the traditional idea of uncertainty and confidence don't quite apply, because the random potential outcomes are being drawn from the uncertainty, which is why Nate's major warning were systematic bias in the polls (which would be the data flowing into the final step did not correctly model the real world). Other peoples concerns were that Nate's private adjustments at earlier stages might have been biasing the outcomes.
As I write this, I've thought of another way of phrasing in in more traditional statistical terminology: Given (from the cumulative analysis of polls) a mean electoral college vote for Obama of 313, and an amount of uncertainty (let's call it the standard deviation) that we haven't been told on the 538 site, then 92% of the time Obama is going to get 270 or more electoral college votes (a critical threshold within the range of uncertainty. So in some respects, the 92% is doind a similar job to expressing a confidence interval, but it is a different thing in the finer level of detail. 
David Hood, in reply to
Email
But people only have so much time and energy to absorb information. With so much media content focusing on poll results, it surely must be less likely that the electorate is well informed about policy
I saw an echo of this concern in a BBC article on poll analysts I was reading. Writing of Sam Wang, a molecular biologist who runs one of the poll aggregation analysis sites:
Wang originally started his site in the hopes of calming some of the polling mania by providing a clear look at what the polls really said. The time spent trying to read the tea leaves, he hoped, would be better spent discussing the issues.
