28 August 2014

Why we believe our article "A critical reanalysis of the relationship between genomics and well-being" is correct

Earlier this week (Monday 25 August 2014), our article was published in PNAS:
Brown, MacDonald, Samanta, Friedman, and Coyne (2014). A critical reanalysis of the relationship between genomics and well-being. doi:10.1073/pnas.1407057111

This was a critique of:
Fredrickson, Grewen, Coffey, Algoe, Firestine, Arevalo, Ma, and Cole (2013).  A functional genomic perspective on human well-being.  doi:10.1073/pnas.1305419110

There has been some discussion of our article on social media.  In particular, some people have noted that the principal authors of the original article (Cole and Fredrickson) have replied with a 550-word letter (which I guess is all they were allowed; they have more material, as shown below) claiming that our article contains multiple errors and is, hence, invalid.  So I'm writing this blog post to put our side of the story and explain why we feel that our article is correct, and that Cole and Fredrickson have not made a dent in it.  I was a little disappointed that Dr. Martin Seligman, on the APA Friends-of-PP mailing list, chose to describe our article as a "hatchet job". We believe that we have identified a number of serious scientific problems with Fredrickson et al.'s original article, which are not adequately addressed either by Cole and Fredrickson's published letter, or by their more extensive unpublished analysis.

First, some references, to save me linking to them repeatedly below.

Fredrickson et al.'s original article
Fredrickson et al.'s original supporting information (SI)
Coyne's letter to PNAS, criticising Fredrickson et al.'s factor structure
Cole and Fredrickson's letter to PNAS, replying to Coyne
Our article (in final draft form; I'm not sure if I'm allowed to link to the PDF of the published article in full PNAS format.  Any differences will be cosmetic, e.g. numbering of references.)
Our supporting information (SI) - download the PDF file marked "Appendix"
Cole and Fredrickson's letter to PNAS, claiming our analysis has many errors
Cole and Fredrickson's additional analysis for their claim that our analysis has many errors
Neuroskeptic's blog post, which provides additional evidence for the deficiencies in Fredrickson et al.'s regression procedure.
Dale Barr's blog post, which approaches the regression issues in a different way, but also finds many problems and provides graphical demonstrations of how unlikely Fredrickson et al.'s results are.

I will address each of the principal points of Cole and Fredrickson's response to our article in turn (although not in the exact sequence in which they appeared in their letter). Before I begin, though, I want to apologise for the length and complex nature of some of the points I will be making here (which is unfortunately necessary, as most of the issues under discussion here are quite technical).  This is particularly true of the section entitled "Bitmapping?" below, which might appear, to the reader who has tried to struggle through our article and SI, and Cole and Fredrickson's letter and additional analysis, to be not much more than a case of "he said/she said".  I note, however, that each of the major issues that we raised in our article is sufficient, on its own, to render Fredrickson et al.'s results meaningless.  These major issues are:
- The MHC-SF psychometric scale does not measure hedonic versus eudaimonic well-being
- Fredrickson et al.'s regression procedure produces mostly spurious correlations, even with random psychometric data
- The errors in Fredrickson et al.'s dataset directly invalidate their published numerical results

MHC-SF factor analysis

Cole and Fredrickson criticise us for attempting to perform factor analyses on the MHC-SF psychometric with such a small sample size. It is interesting to contrast this with Cole and Fredrickson's 2013 letter to PNAS, in reply to Coyne's criticism of the high degree of intercorrelation between their "hedonic" and "eudaimonic" factors, in which they describe how they themselves performed exploratory and confirmatory factor analyses on exactly the same data, apparently claiming to have found the hedonic/eudaimonic factor pair with a very good model fit. (A p-value of < .0001 is offered, but without sufficient context to establish to what exactly it refers; however, the message seems clear: we did EFA and CFA and obtained a fantastic model.)  We attempted to reproduce this, but were unable to do so; indeed, we noted ourselves in our article that the sample size was an issue here.  But the only reason we were doing this was in an attempt to replace the factor analyses that Cole and Fredrickson claimed, in their 2013 letter, to have performed.  We look forward to seeing the results of those analyses, which have so far not been published.

Still on the factor analysis, Cole and Fredrickson claim that their assumption of a hedonic/eudaimonic split for the MHC-SF scale is supported by three references from Fredrickson et al.'s article. We have examined these references (for your convenience, they are here, here, and here) and have not found any point at which any of them supports this claim, or indeed makes any statements at all about the factor structure of the MHC-SF.  Please feel free to check this yourself, and if you find such a discussion, let me know the page number.  In the meantime, the claim of a two-factor, hedonic/eudaimonic split for the MHC-SF seems to be supported by no published evidence.  (However, there has been plenty of reporting in the literature of a clear three-factor structure, e.g. here and here and here.)

Again, we stand by our analysis: Fredrickson et al.'s claim for a hedonic/eudaimonic factor split of the MHC-SF is not supported by theory, nor by the data, nor by historical studies. The factor structure of the MHC-SF that emerges from Fredrickson et al.'s dataset is unclear, but of the possible two-factor structures, the one that we described in our article and SI (i.e., "personal well-being" and "evaluative perception of the social environment") is a considerably better fit to the data in all respects than Fredrickson et al.'s claimed hedonic/eudaimonic split. The only structure that has been documented for the MHC-SF in prior published work is a three-factor structure corresponding to its three subscales, as designed by Keyes.

Bitmapping?

The "bitmapping" operation to which Cole and Fredrickson devote part of their letter (and most of their additional analysis document) is merely an artifact of the way in which our R program loops over all possible combinations of the 14 MHC-SF psychometric items into two factors. There are many ways in which we could have done this that do not involve the programming technique of converting an integer into a bitmap. Indeed, the inclusion in our SI document of the brief mention of how our outer loop works (the inner loop does the regressions, using Fredrickson et al.'s exact parameters) is arguably slightly redundant, but we included it to facilitate the understanding of our code, should someone wish to undertake a reproduction of our results.

Cole and Fredrickson's analysis seems mainly aimed at demonstrating that our "bitmapping" technique is an inadequate way to resample from a dataset. We agree. We never suggested that it was a way to perform resampling. We are not even sure how it could. We are not performing any resampling, bootstrapping, or any other form of Type 1 error reduction. Our program simply generates every possible factor combination of the psychometric data and determines whether or not it appears to show an effect, using Fredrickson et al.'s own regression procedure. The results of this procedure demonstrates that, no matter how the data are sliced or diced, Fredrickson et al.'s regression procedure will generate apparently statistically significant results in the majority of cases; indeed, in most of those cases, it will appear to show effect sizes larger than those found by Fredrickson et al.

The graphs in our SI document (Figures 7-11) plot the results obtained by iterating over all possible two-factor combinations of several forms of psychometric data: Fredrickson et al.'s actual data, assorted random numbers, etc.  We are not completely sure what Cole and Fredrickson think that these graphs show.  To be clear: they show all the possible "effects" (relationships between psychometric "factors" and gene expression values) results that Fredrickson et al. could have obtained, had they chosen another factor split of their data from the MHC-SF scale than the one that they did choose.  Figure 7, in particular, uses the real psychometric data to show that most of the possible factor combinations would have produced effects greater in magnitude than the ones that Fredrickson et al. claimed to show that their "Hedonic/Eudaimonic" split were associated (presumably uniquely) with differential gene expression.

Why, then, does this procedure continue to produce apparently significant results even when the psychometric data are replaced with uniformly-distributed random numbers (aka "white noise")?  We believe that this is due to strong correlations within the gene data.  As shown by Neuroskeptic, this leads to an enormous false-positive rate.  Thus, when Fredrickson et al. ran their regression procedure (which we called "RR53") and averaged the resulting correlation coefficients, they were making the elementary mistake of running a t-test on a set of non-independent observations.

Incidentally, there is an alternative way of doing the regression analysis.  Fredrickson et al. regressed each individual gene on Hed/Eud (and some control variables), collected the 53 coefficients per IV, and averaged them; this is what we called the "RR53" procedure.  The alternative is to average the gene expression values and regress this average on Hed/Eud.  We had noticed that this gave non-significant results. Then, just after the PNAS window for updating our supplementary information document closed, a colleague --- who, I believe, wishes to remain anonymous --- pointed out that using this alternate method, the apparent effect sizes are exactly the same as the ones "found" by RR53.  Only the p-values are different.  We believe this is because, when the RR53 procedure picks up the regression coefficients of the individual genes to analyse them, it conveniently loses the associated confidence interval (almost all of these coefficients are associated with non-significant t-tests or model ANOVA) and re-inserts them into the mix as if they were perfect fresh data from a measuring instrument, whereas in fact they are almost all carrying an amount of "noise" that makes them highly unreliable.

We have made many of our materials available online, including our program's source code, and we are happy to share our other files (some of which are quite voluminous) on request, or to answer any specific questions that anybody might have about how our program works. (We could have reproduced this work with SPSS, but it would have taken an awfully long time.)

Thus, we stand by our analysis: Fredrickson et al.'s regression procedure is guaranteed to produce huge numbers of spurious "effects" which have no psychological or physiological meaning.

Issues with the dataset

Finally, Cole and Fredrickson claim that they have recently reproduced the same numerical results as their 2013 study with a new sample.  I will leave aside, for now, the question of how meaningful it is to show that a procedure which has been criticised (by us) for producing invalid results can be "shown" to be correct if it produces much the same results a second time; perhaps there is some validity in having two data points rather than one.  However, this question turns out to be irrelevant.  In Cole and Fredrickson's reply, they notably fail to address the question of the various errors in their original dataset, which we discuss quite extensively (and for good reason) in our supporting information. In particular, Cole and Fredrickson do not address the coding error in their original dataset for participant SOBC1-1299, which we examine on pages 3 and 24 of our Supporting Information. Near the end of our Table 7, we show that this coding error can be (and should have been, in the original study) resolved in one of two ways, either of which results in a reduction of over half in the magnitude of the effect for "hedonic" well-being that was reported by Fredrickson et al. (as well as a small change in the magnitude of the effect for eudaimonic well-being). In other words, had this coding error not existed in the 2013 dataset, Fredrickson et al.'s figures of +0.28(hedonic)/-0.28(eudaimonic) for the 2013 study should have been calculated and reported as approximately +0.13(hedonic)/-0.27(eudaimonic). To subsequently obtain +0.25(hedonic)/-0.21(eudaimonic) with the new sample thus appears to be evidence against a successful reproduction of the original results (unless some new theory can explain why the effect of hedonic well-being has suddenly doubled).

Summary

In summary, we stand by our overall conclusions, namely that Fredrickson et al.'s article does not tell us anything about the relationship between different types of well-being and gene expression. We will be sending a summary of the above position to PNAS for peer review and possible publication as a Letter to the Editor.

I am sure that this will not be the final word on this matter. We trust that Cole and Fredrickson will go back and re-examine their study in the light of our response, and perhaps return with some additional information that might clarify matters further. We anticipate that our peers will also contribute to this debate.

Edit history
2014-08-27 22:08 UTC First version.
2014-08-28 11:50 UTC Removed the rather clunky reference to trying to factor analyse the gene data; added link to Neuroskeptic's blog post, and discussion of the problems with the RR53 procedure.
2014-09-15 23:27 UTC Added link to Dale Barr's blog post.
2014-10-16 22:03 UTC Fixed a couple of typos.

22 August 2014

(A few weeks ago I wrote a comment on Erik-Jan Wagenmakers' blog. Somebody contacted me to say that they would like to be able to link to my comment, but Disqus doesn't provide individual URLs for comments. So I am using my blog to repeat the comment here. The only change I have made is to italicise two words which was not possible in the original content format, or at least I didn't know how to do it. Please read the original post first to establish a little bit of context and perhaps save yourself wondering what I'm rambling on about.)

The standard human problems of power/status and money seem to be all-pervading in psychology; why should we expect anything else?

I would like to advance the radical thesis that the *entire point* of a whole class of contemporary social-psychological research (i.e., not just a nice side-effect, but the PI's principal purpose in running the study) is to generate "Gladwellizable" results. Such results will, as a minimum, earn you considerable kudos among the less critical of your colleagues and grad students, and probably also keep your institution's director of communications very happy ("University of Madeupstuff research is featured in the Economist/NY Times again"). More advanced practitioners can leverage their research into their own mass-market publications/lectures/audiotape series, thus bypassing the Gladwell/Pink axis and turning the results of their grant-funded research into $$$ for themselves.

I'm with Kahneman: this will not stop until a train wreck occurs, quite probably involving some major public policy decision. The actual train wreck will be 10-15 years down the line when the Government Accountability Office (etc) catches up with things, by which time the damage will have been done (it will take a generation or more to undo some of the myths floating around out there) and the perpetrators will be lying in the sun, untouchable (they will, perhaps, mutter "science self-corrects", aka "heads I win, tails I get away with it"). The asymmetry is visible from space: find a gee-whiz result, speculate loudly on its implications for humanity, and make a pile of money/power/influence; have it refuted (which almost never happens anyway, since in psychology "A" and "not A" seem to be very happy co-existing for ever) and the worst that can happen is that you have to spin your idea as having being "refined" by the latest findings, which in fact "make my idea even stronger".

As EJ is finding out here, defiant denial seems to impose very little cost on those who engage in it. Until the industry [sic] decides to change that, this will continue. But, remind me again what Gladwell's advance was for his latest book? That's what you're up against.