Speaker by Various Artists

88

‘Kiwimeter’ is a methodological car crash and I still can’t look away

by Tze Ming Mok

On a survey methodological level, it just looks worse the more I find out. Nerd rage to follow, but briefly up front: Increasingly, I think my original disagreement with the Human Rights Commission is more semantic than substantive.

I still believe it’s very important to test responses to that specific bellwether statement about Maori receiving so-called ‘special treatment’ in surveys, even if the ‘agree/disagree’ answer options don’t work properly in this one. (I’ve noted elsewhere that a better scale would be ‘how acceptable or unacceptable do you find this statement?’)

But because the overall effect of ‘Kiwimeter’ is one of causing emotional distress and feelings of marginalisation for Maori, even if due to incompetence, then it’s ultimately a racist effect. And as we know from our Human Rights Act harassment definitions, intent doesn’t matter; effect does. I’d like to thank folks on Twitter for sharing their experiences with me on this. 

Marama Fox is also right: in terms of research ethics, there was an ethical requirement to assess whether the survey would do ‘harm’ to respondents. I think there is good evidence that it has caused emotional distress and harm (one Twitter commenter described seeing that question as a “punch to the puku”). Those responsible seemingly did NOT due their due diligence to assess whether there was a risk of this. The academics who were involved need have a close look at the part they played.

How do you assess that risk? You test the survey. Did they do this?  Not properly.  This is what I have figured out so far:

Some people have reported taking part in the original survey that developed the archetypes, and we now know it was not a representative random population sample survey. It was a weighted selection of the self-selective group of people who had previously filled out ‘Vote Compass’.  From appearances, Vox Labs seems sort of confident that their approach to non-random selection is basically awesome, as if it’s as good as a YouGov panel, and that upweighting or downweighting certain demographics to match their population proportions is, indeed, magic.

[SPOILER: WEIGHTING IS NOT MAGIC AND VOX LABS IS NOT YOUGOV].

So yes, absolutely no part of this survey even originated as a representative ‘probability sample’ survey. For the moment, let’s leave behind speculation about whether Vox Labs is as good at constructing a representative-ish online panel as YouGov, as we have nothing to go on other than my own mean instincts.

Instead, let’s look at the failure of the survey testing stage and questionnaire development.  These are very big nerd-problems.

Essentially, the original more carefully-but-not-randomly selected survey was the pilot for Kiwimeter. From folks like Stephen Judd and Stephanie Rogers who remember taking part in that pilot, the questions in the original survey weren’t substantially different from the current survey and the one that has attracted all the controversy was exactly the same.  So if they got negative feedback at that stage, they didn’t care enough to change it.

The Founder/Director of Vox Labs, Cliff Van Der Linden, has been tweeting somewhat piteously from Canada in defense of the methodology, including a couple of tweets to me so far. I asked him whether there had been cognitive testing carried out on the survey, as this would have prevented the main problems I wrote about earlier.

As you can see from the screen grab, his completely off-topic response about factor analysis (a data crunching method to be applied to results) seemed to indicate that he did not understand my question, possibly because he did not know what cognitive testing was.

NEEERRRRRDRRAAAAAAAGE

I thought I was pretty restrained though, right?

What is cognitive testing? It’s a kind of interviewing technique where people talk about what is going through their minds as they fill out a survey – it would have picked up the ‘this seems racist’ problem immediately.  It’s a standard step that credible survey research organisations build into the development phase. And it is not complicated or expensive shit to do.

Most data nerds, programmers and political scientists don’t need to know what cognitive testing is, and this seems to be Van Der Linden’s background. But any nerd who works in survey research damn well better. I look at the culture of an online ‘engagement’ outfit like Vox Labs, and I don’t see a depth of knowledge about traditional survey research and its implementation, which is a problem for credibility on a project whose credibility is already compromised because it’s being carried out by, well, TVNZ.

Also on his Twitter stream, Van Der Linden however pleads that Vox was not responsible for questionnaire development, only technical delivery and analysis. I have some sympathy.  He states that the questionnaire was developed by a panel of New Zealanders that included academics and Maori - why would Vox have doubted their expertise?  Fair enough! What happened here?  I have no freakin’ idea.

But not all academics are necessarily going to have a professional survey research background in the nuts and bolts of delivering a questionnaire that works, even if they are great at analysing psychological constructs from data.

If the New Zealand panel did any cognitive piloting, they obviously didn’t sample widely enough. It’s possible that they viewed ‘piloting’ as Vox’s area. But Vox was in Canada: How could it carry out decent qualitative research with New Zealanders?  The blame does not lie with the Canadians.  This is a very disappointing day for New Zealand academia. The comments so far from those involved have not been illuminating.

When the jobs have been portioned out like this - questions here, implementation there - a meaningful on-the-ground pilot to test whether the questions actually worked was lost. This whole project seems like a classic case of a failure of research expertise and oversight of the whole enterprise, from development to delivery, start to finish.  Instead of nose to tail dining, we’ve got something half-assed. 

I hope at least it was as cheap as it looks.

Tze Ming Mok apologises for an entire blog about survey methodology 

88 responses to this post

First ←Older Page 1 2 3 4 Newer→ Last

First ←Older Page 1 2 3 4 Newer→ Last

Post your response…

This topic is closed.