Biggest barrier to reproducibility

My previous post discussed Keith Baggerly and his efforts as a “forensic bioinformatician.”

In that article, the reporter asks Keith to name the biggest problem he sees in trying to reproduce results.

It’s not sexy, it’s not higher mathematics. It’s bookkeeping … keeping track of the labels and keeping track of what goes where. The thing that we have found repeatedly in our analyses is that it actually is one of the most difficult steps in performing some of these analyses.

I’ve seen presentations where Keith discusses specific bookkeeping errors. Quite often columns get transposed in spreadsheets, so researchers are not analyzing the data they say they are analyzing.

2 thoughts on “Biggest barrier to reproducibility

  1. Neil

    If Keith is using spreadsheets for his book-keeping that is perhaps where the problem lies.

    Databases are designed for storing data in an ordered manner, whilst statistical packages are designed for analysing data that is stored in the databases.

    Spreadsheets are poor tools that try to bridge the gap between the two and fail miserably an all fronts.

    Reply
  2. John

    Keith isn’t using spreadsheets for data storage; the scientists whose work he is trying to reproduce are using spreadsheets.

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *