Announcing the OpenNeuro platform – Open and Reproducible Science as a Service

June 26, 2017

We are pleased to announce that the OpenNeuro platform is now available for users at http://www.openneuro.org.  This platform represents the culmination of more than two years of work by the members of our center working closely with Squishymedia and supported by the Laura and John Arnold Foundation.  Here we would like to provide an overview of the platform and highlight some of its coolest features – we hope that you will all get on, kick the tires, and let us know what you think.

Screenshot_18

The vision behind OpenNeuro is to provide the field of neuroimaging a platform for the reproducible analysis and open sharing of data.  It represents the next step beyond the OpenfMRI data sharing platform. The name change is meant to reflect the fact that we are no longer focused exclusively on fMRI — the new platform will accept any imaging dataset that follows the BIDS standard, and we ultimately hope to have workflows that can be applied to many different types of neuroimaging data. OpenfMRI has been remarkably successful, particularly with the rapid growth of the database in recent years, and we intend to build on this success in the new project.  

OpenNeuro is an example of “science as a service,” meaning that it provides a set of web-accessible services to perform data analysis and sharing.  Our particular focus is on providing services that make data analysis both reproducible and open.  Perhaps the best way to illustrate this is to walk through some of the features of the platform — if you prefer a video walkthrough, Chris has made one here.

Public datasets

After logging in using their Google credentials, the user can access a large number of publicly available datasets, including all of the current OpenfMRI datasets.  These can be downloaded (in BIDS format, of course!), or the user can analyze them using any of a number of existing workflows (see below).  In fact, most of the OpenfMRI datasets have already been processed using fMRIprep and FreeSurfer, and the results of those can be downloaded immediately.  All of the data on the site are deidentified and are released under a Creative Commons CC0 public domain license, meaning that there are no restrictions on their use or redistribution. Of course, to use the data without citing the OpenNeuro project and the authors of the data would be seriously uncool.

Uploading new datasets

Any dataset that is in the BIDS format can be easily uploaded to the database.  It will be run through the BIDS validator, and any errors or warnings will be flagged.  After validation the data are uploaded to the database; the uploader is fairly smart and should be able to continue if an upload is interrupted.  The user’s own datasets are shown in the main dashboard.  Datasets are initially private, but can be easily shared with collaborators or made publicly available.

Analyzing data

The main new feature of the platform is the ability to process imaging data using a number of workflows. Currently, the workflows that have been implemented are:

The reproducibility of analyses in the OpenNeuro platform comes primarily from two features.  The first is the idea of a data “snapshot,” which is basically a frozen version of a dataset.  All analyses are run on specific snapshots, so that any results can be tracked back to a specific version of the data. The second is the use of containerized analysis apps (via the BIDS-Apps project).  Each container holds all of the code needed to perform the analyses, including all of the libraries and dependencies of the code.  These apps are versioned, such that any analysis is linked to a specific version of the container that is available via DockerHub. This means that in principle anyone should be able to download the data snapshot and the container and replicate the analysis results exactly.

When a user submits an analysis job it is run using cloud computing (via Amazon Web Services) but is free to the user thanks to the generous support of the Arnold Foundation.  The quid pro quo for this free computing is that the user agrees that all datasets processed on the platform will be made publicly available within 18 months.  Single participants from each dataset can be processed without this requirement, so that users can try out the platform on a limited scale before agreeing to share their data.

Once the analysis is completed, the user can view the results (including NIfTi images, web pages, and text files) in the web browser, as well as being able to download the data to their own computer.

What’s missing?

The OpenNeuro site is still in beta, and users may run into snags along the way.  One major missing link at this point is the ability to perform statistical analyses on task fMRI data.  This is an area that we are working on, and we hope to support this later in the year.  Currently, if you want to analyze task fMRI data using a standard GLM analysis you will need to download the data and run those analysis on your own machine.  Nonetheless we hope that the availability of free and reproducible preprocessing pipelines will be enough to entice many users to test out the platform.

Future plans

We have big plans for the platform in the coming months, which are currently under development.

First, we plan to add many more workflows. TRACULA, OPPNI, MAGET, QAP, and aa are in advanced test stages and should be deployed soon. We are always on the lookout for more high quality processing pipelines – so if you are devoloping one please get in touch with us. Making your software available on OpenNeuro will lower the access barriers for new users. Additionally each workflow is accompanied with clear instructions on how users should cite it in order to give developers appropriate credit.

Second, we plan to provide the ability to run a single dataset through multiple workflows, in order to assess the effects of different workflow choices on statistical results (a la Josh Carp’s 2012 paper). This will allow researchers to assess what John Ioannidis has called “vibration effects” and focus on results that are robust to specific processing choices.