ersby: The Milton and Wiseman meta-analysis of Ganzfeld experiments 1999

Disclaimer: While I am extremely well-read on the ganzfeld controversy, I have no training in statistics, although I have some understanding of the issues involved and how to interpret certain statistical measures. For the figures quoted in this blog post, I used an excel spreadsheet kindly sent to me by Patrizio Tressoldi to calculate z-scores and p-numbers for the individual studies, and then I used the meta-analysis software Meta Analysis 5.3 by Ralf Schwarzer.

Perhaps one of the most controversial parapsychological papers in the last twenty years was this statistical paper which reviewed and collated the results from a particular type of ESP experiment for the period 1991 to 1997. The experiment in question being the ″Ganzfeld″ protocol.

It was inspired by the 1994 paper ″Does Psi Exist?″ by Daryl Bem and Charles Honorton which described the results from one laboratory between 1982 and 1989 (z=2.89, p=0.002, or odds of 1 in 500). These results were considered a successful replication of a previous meta-analysis by Honorton in 1985. Milton & Wiseman's paper was meant to see if Honorton's results had been replicated by other researchers working in the field since 1985.

When the reported result was negative (unweighted z=0.7, p=0.24, odds of around 1 in 4), parapsychologists quickly began to discuss this paper and why it had failed to find an effect for what had, until then, been parapsychology's most reliable protocol.

Rushed into publication

The paper was initially presented in 1997 at the 40th Annual Parapsychological Association Convention, but it had been already submitted to the Psychological Bulletin in July 1997. Questions have been raised concerning the wisdom of submitting the paper before it could be peer-reviewed at the conference (Zingrone, 2002) and whether the timing had been influenced by a forthcoming successful ganzfeld experiment due to be presented by Dalton at the same Parapsychological Association Convention.

″One issue that might need further comment regards the timing of meta-analyses. Analysts' timing may be driven by many factors, and it is hard to remain blind as to how things are likely to be going, if one is at all active in the field. If we consciously or unconsciously do analyses when we think things are "going our way," then we are more likely to be selecting occasions when the results are strong in one direction or the other.″ [anonymous contributor to Schmeidler & Edge, note #95, 1999]

Some of the more conspiratorially minded have made a great deal about the absence of Dalton's work (Carter, 2012). Whether the deadline for Milton and Wiseman's meta-analysis was placed with Dalton's work in mind is difficult to say, although would it have made a huge difference? Using Milton and Wiseman's methods, the database including Dalton is still statistically non-significant, albeit marginally so (unweighted z=1.65, p=0.051, odds of about 1 in 20). Of course, there are several other statistical methods they could have used to conduct their meta-analysis, but more about that later.

Non-standard experiments included

The next, most common, criticism was that the Milton and Wiseman database included experiments that deviated from the most standard features of a ganzfeld experiment. It was surmised that this deviation could effect the success rate of psi, more notably in two experiments which used musical targets instead of visual targets, as had been the norm up until then (only one other experiment had used audio targets, Roney-Dougal's " A Comparison Of Psi And Subliminal Perception" (1979) which used the spoken word as targets with significantly high results, p=0.016).

To this end, a new meta-analysis was done by Bem, Palmer and Broughton taking both the previous criticisms into account. It was published in 2001 in the Journal of Parapsychology. In this, they introduced data which had been released since the February 1997 deadline of Milton and Wiseman, such as Dalton's work. They also used some judges who were previously unaware of the ganzfeld field of research to score the methods of each experiment according to how closely they adhered to a method given to them as an example of a typical ganzfeld procedure.

The danger with this is that it falls into the afore-mentioned trap of whether we consciously or unconsciously do analyses when we think things are "going our way," inasmuch as the inclusion of Dalton's experiment was going to make any new meta-analysis a successful one.

Also, despite the judges being blind to the results when grading the methods for standardness, the people who wrote the instructions and chose the example were not. It worth noting that in the instructions given to the judges, it is specified that creative subjects should not be marked as non-standard (Palmer & Broughton, 2000), insuring against the unlikely event of Dalton's work not being marked as standard.

Bem, Palmer and Broughton acknowledge that the earlier ganzfeld work had not been given the same amount of scrutiny and if it was, then this could alter the findings of Honorton's 1985 meta-analysis. "This possibility can only be assessed by a separate standardness analysis of the pre-autoganzfeld database" they wrote but, even to this day, no such analysis has been undertaken.

Statistical issues

Another criticism of the Milton and Wiseman database was that the data was heterogeneous. In other words, the results did not ″cluster″ around an average effect size as they would if all the experiments were measuring the same effect.

″The source of the heterogeneity is clear. Three studies are significantly negative (those labeled Kanthamani & Broughton, 1994, Series 5b; Kanthamani & Palmer, 1993; Williams et al., 1994). When these three studies are removed, the remaining 27 studies are now homogeneous ([[chi].sup.2] = 32.4, p = 0.35), and the resulting Stouffer z of these 27 studies is z = 1.99, p = .02 (one-tail). Thus, upon removing three outlier studies from this meta-analysis, the overall result is a statistically significant replication.″ [anonymous contributor to Schmeidler & Edge, note #42, 1999]

While it is true that the meta-analysis is heterogeneous (chi^2=46.15, df=29, p=0.02), only one negative experiment needs to be removed for the database to become homogeneous (chi^2=40.06, df=28, p=0.06), and this does not render the overall result statistically significant (unweighted z=1.13, p=0.127, odds of around 1 in 8). I am unsure as to why this commentator felt the need to remove three. Either way, once the debate focused on the inclusion of the Dalton data (ie, an extreme result in a positive direction) the outlier argument was dropped.

Other criticisms have been levelled at the statistical methods that Milton and Wiseman used. In the Psychological Bulletin in 2001, Storm and Ertel wrote a response to Milton and Wiseman's meta-analysis. It's an interesting, if peculiar, paper. One of the problems that Storm and Ertel identify with the meta-analysis is an ″unwarranted questioning of the existence of psi.″ They also criticise Milton and Wiseman for ignoring pre-1986 work, even though the paper clearly states that focus of the analysis was those experiments begun after 1987. It seems odd to complain that a meta-analysis should not stick to its own criteria.

It continues by saying that Milton and Wiseman should have used two-tailed statistics. In other words, any deviation from chance, positive or negative, should have been measured. However, ever since the earliest ganzfeld experiment, one-tailed statistics have been used because only positive results are of interest. That Storm and Ertel were unaware of this is a little strange.

Perhaps their greatest mistake was when they discuss Hyman and Honorton's Joint Communique (1986). This paper, where Hyman and Honorton discuss past failings and state the desired protocols that future ganzfeld experiments should follow is described by Storm and Ertel as ″a mere documentation of traditional and uncontroversial research rules.″ In fact, the Storm and Ertel paper reads as if it's written by someone with only a passing knowledge of the ganzfeld debate to that date.

Milton and Wiseman wrote a reply in that same issue, pointing out their mistakes, mis-quotes and misinterpretations, which lead to a further paper from Storm and Ertel. This time published in the Journal of Parapsychology in 2002, Storm and Ertel took issue with Milton and Wiseman, accusing them of selectively quoting to support their position, and insisting that two-tailed statistics were appropriate. Milton and Wiseman contribute one last, rather weary, reply noting that:

″[Storm and Ertel's] reply in this journal does not indicate that there is any point in repeating our arguments, and it includes a number of inaccurate descriptions of our statements and views.″

As you may have noticed, I've usually reported statistics using unweighted z scores [NB, link goes to a pdf], since that is the method reported by Milton and Wiseman. But this has come in for criticism. The unweighted z-score is an effect size which is the sum of each experiment's z-score divided by the square root of the number of experiments. But this treats each z-score equally, whether it came from an experiment with four trials or one hundred. There is a method by which each z-score is weighted according to size, but Milton and Wiseman did not use this. Instead they used the unweighted method because that was the same method that Honorton used in 1985.

There is another possible method, which would perhaps give an even more accurate picture, the exact binomial test: count of the number of success, and work out the probability of that happening given what you would expect by chance. In meta-analyses, it is common to use something like the weighted z-score since different results from different researchers often use different statistical measures. It is rare to have a collection of experiments that all present the data in the same way. The meta-analysis usually needs to convert all these into the same measure (e.g. a z-score) and then work from that. Plus, it should be mentioned that pooling subjects in this manner (i.e., treating the meta-analysis as if it were one big experiment) is directly against the advice of the Cochrane Collaboration and, as such, it could be argued that this "flaw" in Milton and Wiseman's work is, itself, flawed.

However, the experiments in Milton and Wiseman's paper did report the same statistical measure (the hit rate) for each experiment. Milton and Wiseman's method was to use each author's primary measure which, in three experiments, was not the hit rate. Two of these three experiments were very small (four trials and ten trials) and so they used a more sensitive method of scoring than the hit rate. In fact, the original papers did not report a hit rate at all (Kanthamani, Khilji, & Rustomji-Kerns,1988), and it wasn't until a later paper that summarised the results from that particular lab was published that the data for the hit rate was made public (Kanthamani & Broughton, 1994).

So there is a logic behind Milton and Wiseman's choice: use each papers' primary scoring method, and use the same statistical measure that had previously been so successful. On the other hand, if each paper happens to include the results using the same method, and it's possible to take a more precise measurement using a secondary measure then why not use it?

However, it should be noted that Milton and Wiseman's is the only meta-analysis where calculating the binomial is possible. All of the other ganzfeld meta-analyses contain experiments that don't report a hit rate with a 25% success rate expected by chance so, since they need to use an effect size like the unweighted or weighted z score, it may be useful to have the same figures from Milton and Wiseman's database for the sake of comparison. By the way, the weighted z for their 1999 meta-analysis is 1.07, p=0.14 or odds of about 1 in 7.

I admit, this has been a long and somewhat boring discussion of a topic that is of supreme disinterest to almost everybody. But I was becoming increasingly concerned that recent books and discussions on this subject had become too superficial. In an attempt at keeping a talk lively or the book interesting, this episode of parapsychology has become shorter and more glib and is now almost mythologised as an example of shoddy skeptical work holding back the scientifically superior parapsychologists. The truth is somewhat less clear cut.

References
BEM, D.J., & HONORTON, C. (1994). Does psi exist? Replicable evidence for an anomalous process of information transfer. Psychological Bulletin, 175, 4-18
BEM, D.J, PALMER, J, & BROUGHTON, R.S. (2001) ″Updating the Ganzfeld Database: A Victim of its own Success?″, The Journal of Parapsychology, Vol. 65, No. 3, 207-218
CARTER, C. (2012). "Science and Psychic Phenomena: The Fall of the House of Skeptics," Inner Traditions Bear & Company. Kindle Edition.
Cochrane Collaboration’s Open Learning Material for Cochrane reviewers.
DALTON, K. (1997). Exploring the links: Creativity and psi in the ganzfeld. Proceedings of Presented Papers: The Parapsychological Association 40th Annual Convention, 119–134.
HONORTON, C., (1985) "Meta-Analysis of Psi Ganzfeld Resarch: A Response to Hyman", Journal of Parapsychology 49, pp 51-91
KANTHAMANI, H., & BROUGHTON, R. S. (1994). ″Institute for Parapsychology
ganzfeld–ESP experiments: The manual series″ The Parapsychological Association 37th Annual Convention: Proceedings of presented papers, 182–189.
KANTHAMANI, H., KHILJI, A., & RUSTOMJI-KERNS, R. (1988) "An experiment in ganzfeld and dreams with a clairvoyance technique". The Parapsychological Association 31st Annual Convention: Proceedings of Presented Papers, 412-423.
MILTON, J., & WISEMAN, R. (1997). "Ganzfeld at the crossroads: A meta-analysis of the new generation of studies." Proceedings of Presented Papers: Parapsychological Association 40th Annual Convention, 267-282.
MILTON, J., & WISEMAN, R. (1999). "Does Psi Exist? Lack of Replication of an Anomalous Process of Information Transfer." Psychological Bulletin, Vol. 125, No. 4, 387-391
MILTON, J., & WISEMAN, R. (2001). ″Does Psi Exist? Reply to Storm and Ertel (2001),″ Psychological Bulletin, Vol. 127, No. 3, 434-438
MILTON, J., & WISEMAN, R. (2002). ″A response to Storm and Ertel (2002),″ The Journal of Parapsychology, Vol. 66, No. 2, 183-187
PALMER, J., & BROUGHTON, R. S. (2000). An updated meta-analysis of post-PRL ESP ganzfeld experiments: The effect of standardness. Proceedings of Presented Papers: The Parapsychological Association 43rd Annual Convention, 224-240.
RONEY-DOUGAL, S.M., (1979). "A Comparison Of Psi And Subliminal Perception: A Confirmatory Study," W.G. Roll (ed.), Research in Parapsychology 1978, 98 - 100.
SCHMEIDLER, G. R., & EDGE, H. (1999). ″Should Ganzfeld research continue to be crucial in the search for a replicable psi effect? Part II Edited Ganzfeld debate.″ Journal of Parapsychology, 63, 335-388
STORM, L. & ERTEL, S. (2001). ″Does Psi Exist? Comments on Milton and Wiseman's (1999) Meta-Analysis of Ganzfeld Research,″ Psychological Bulletin, Vol. 127, No. 3, 424-433
STORM, L. & ERTEL, S. (2001). ″The Ganzfeld Debate Continued: A Response to Milton and Wiseman (2001),″ The Journal of Parapsychology, Vol. 66, No. 1, 73-82
ZINGRONE, N.L. (2002). ″Controversy and the problems of parapsychology.″ Journal of Parapsychology, 66, 3-30.

Friday 14 December 2012

The Milton and Wiseman meta-analysis of Ganzfeld experiments 1999

No comments: