Response to Next-Generation Data Science Challenges in Health and Biomedicine RFI

November 1, 2017
Text of the Request for Information call: https://grants.nih.gov/grants/guide/notice-files/NOT-LM-17-006.html

Recent years have proven that two components are essential for artificial intelligence breakthroughs: large, rich datasets and advanced algorithms. Advances in machine vision (especially object recognition) are a great example of this. The substantial increase in the accuracy of those systems was only possible due to availability of a large annotated dataset (ImageNet) and improvements to powerful analysis methods (deep neural networks). Biomedical data is heading in the right direction, but progress is stifled by the following major roadblocks:

  1. Goal: Share more data.
    Roadblock: NIH data sharing sharing policies are not enforced, leading to wasted taxpayer money. Even though data management plans are part of the grant review process they are rarely taken seriously by the review panels focused on the scientific aspects of the proposal. Researchers with strong track records of sharing data are not rewarded appropriately in the grant review process.
  2. Goal: Improve data curation.
    Roadblock: Building and maintaining biomedical data repositories is difficult given current NIH funding opportunities, which focus on development of new resources rather than long-term support for existing resources. A major concern of data submitters is long term preservation, which is hard to guarantee with short term grants. Similarly, maintaining a repository (even without developing new features) requires computational resources (storage, web servers etc.) that need to be covered over a period of time that is often longer than typical duration of an R01 grant.
  3. Goal: Innovative data reuses.
    Roadblock: Even though publicly available datasets are reused often by biomedical researchers (Gorgolewski, Wheeler, Halchenko, Poline, & Poldrack, 2015; Milham et al., 2017) they have low penetration of the broader machine learning community that tends to use non medical datasets as benchmarks. Biomedical datasets concerning important questions are often poorly advertised and only available in raw form using file formats that are not commonly used in data science and machine learning.

In attempt to improve this situation we recommend the following interventions:

Recommendation A1: Include data sharing history as compulsory part of the biosketch. It will highlight scientists’ commitment to data sharing, and cement data sharing efforts as a first class citizen among other academic outputs.

Recommendation A2: Make data management plans publicly available. It will lead to more transparency and public accountability. The fact that certain promises regarding data sharing will be public will make researchers more likely to abide to them.

Recommendation A3: Add an explicit “Data and Materials Sharing” criterion score to the grant scoring protocol. This additional dimension should take into account the applicant’s data sharing history (see Recommendation A1) adjusted for seniority. This mechanism will incentivize researchers to put more effort into providing realistic data sharing plans in their grants.

Recommendation B1: Intramural support for long term backup of publicly available data. Many existing field-specific data repositories are struggling to guarantee long term preservation of their records. NLM could help with this by providing a free service allowing affiliated repositories to deposit backup copies of their records. This would increase the chances of preserving those datasets in the long term. Such a service would be distinct from NIH-supported archives such as NDA since it would be provided for public data and without any data curation (assuming that deposited datasets are already curated).

Recommendation B2: Long term grants providing cloud credits for community run repositories and services. Provide a funding mechanism that would subsidize cloud computing costs for public data repositories. This mechanism could be targeted  at established repositories with the goal of maintaining their operations in the long term. The grants could come in a form of cloud computing credits or discounts for these services.

Recommendation C1: Creation of benchmark biomedical datasets curated for ease of use in the context of deep neural network applications. One of the most commonly used benchmark datasest for adversarial neural networks is a collection of photographs of celebrities. It is easy to access and work with and thus is the go-to dataset for validating new techniques. The same cannot be said about many publicly available biomedical datasets. There is a great potential in directing the machine learning community towards important biomedical problems, but work needs to be put into curating those datasets for better ease of use by computational scientists who are not necessarily biomedical experts. NIH should issue a set of special calls for grants aimed at creation of widely accessible benchmark datasets or competitions in the space of important biomedical problems.

We believe that implementing this set of practical recommendation will set the NIH on a track of more efficient, cheaper, and more interdisciplinary science. We are happy to discuss these ideas at greater length.

 

Krzysztof J. Gorgolewski (krzysztof.gorgolewski@gmail.com) and Russell A. Poldrack (russpold@stanford.edu)

References

Gorgolewski, K. J., Wheeler, K., Halchenko, Y. O., Poline, J.-B., & Poldrack, R. A. (2015). The impact of shared data in neuroimaging: the case of OpenfMRI.org. F1000Research. https://doi.org/10.7490/f1000research.1110040.1

Milham, M., Craddock, C., Fleischmann, M., Son, J., Clucas, J., Xu, H., … Klein, A. (2017, September 4). Assessment of the impact of shared data on the scientific literature. bioRxiv. https://doi.org/10.1101/183814

 

24 Comments

  1. É essencial continuamente deixar tronco firme. https://exerciciosparaperderbarriga.org/

  2. tocando piano 5 years

    Primeiro deite-se no pavimento com corpo reto. https://comotocarpiano.com.br

  3. 3) Coma gordura. Sim, você leu correto! https://www.cashnetsweeps.com/

  4. Para ganhar dinheiro com drones é essencial, fazer os seguintes passos: http://www.universidadedaconstrucao.com/como-ganhar-dinheiro-com-drones/

  5. Aprenda como fazer os melhores projetos de decoração para o seu ambiente: https://www.torresarquiteturaeinteriores.com/

  6. É importante esclarecer que a medicação não é vendida no Brasil, https://pauloalvares.com.br/bomba-peniana-premium/ todavia existem opções naturais tão eficazes quanto e que ainda tem a vantagem de não causar efeit

  7. absolutetattoo 3 years

    muito bom este artigo
    https://absolutetattoo.com.br/

  8. diecris. 3 years

    collection of photographs of celebrities. It is easy to access and work with and thus is the go-to dataset for validating new http://www.diecris.com.br/ techniques. The same cannot be said about many publicly available biomedical datasets.

  9. HoleClub 2 years

    Primeiro deite-se no pavimento com c

  10. syair sdy 2 months

    collection of photographs of celebrities. https://139.59.115.245/

  11. syair sdy 2 months

    FORUM SYAIR SYDNEY, kode syair sdy, syair sidney nagamas, code syair sydney, syair sdy hari ini, Syair sydney pools, forum syair sidney jitu.
    https://exerciciosparaperderbarriga.org/

  12. syair sdy 2 months

    Forum Syair Sydney – Syair Sdy – Syair Sydney sebuah situs syair sydney yang paling top untuk melihat gambar syair sydney atau syair sdy paling top

    https://spanishrailways.net/

  13. syair sdy 2 months

    Syair Sdy situs prediksi syair sydney dan kode syair sdy hari ini dengan gambar syair serta erek erek togel syair sdy 2022,Selamat datang di situs kami dan selamat anda mendapatkan situs syair sdy paling lengkap dan juga serta terpercaya.
    Syair Sdy Hari ini
    Syair Sdy 2022
    Karena forum syair sdy yang kami berikan dibawah ini merupakan kumpulan syair togel sydney di ind semua,dan kali ini kami akan memberikan suguhan yang menarik,karena dengan adanya kode syair sdy dibawah ini akan membantu kawan togelers semua mendapatkan prediksi syair sydney dan angka jitu sdy hari ini yang tentunya sangat jitu dan akurat terpercaya.
    http://syairsdy.me/

  14. syair sdy 2 months

    Forum Syair Sdy situs prediksi syair sydney dan kode syair sdy hari ini dengan gambar syair serta erek erek togel syair sdy 2022,Selamat datang di situs kami dan selamat anda mendapatkan situs syair sdy paling lengkap dan juga serta terpercaya.
    Syair Sdy Hari ini
    Syair Sdy 2022 – 2023
    Karena forum syair sdy yang kami berikan dibawah ini merupakan kumpulan syair togel sydney di ind semua,dan kali ini kami akan memberikan suguhan yang menarik,karena dengan adanya kode syair sdy dibawah ini akan membantu kawan togelers semua mendapatkan prediksi syair sydney dan angka jitu sdy hari ini yang tentunya sangat jitu dan akurat terpercaya.
    http://syairsdy.me/

  15. Syair HK 2 months

    Forum Syair Sdy situs prediksi syair sydney dan kode syair sdy hari ini dengan gambar syair serta erek erek togel syair sdy 2022,Selamat datang di situs kami dan selamat anda mendapatkan situs syair sdy paling lengkap dan juga serta terpercaya.
    Syair Sdy Hari ini

  16. Syair HK 2 months

    Forum Syair Sdy situs prediksi syair sydney dan kode syair sdy hari ini dengan gambar syair serta erek erek togel syair sdy 2022,Selamat datang di situs kami dan selamat anda mendapatkan situs syair sdy paling lengkap dan juga serta terpercaya.
    Syair Sdy Hari ini > https://128.199.101.189/

  17. Forum Syair HK 2 months

    Forum Syair Sdy situs prediksi syair sydney dan kode syair sdy hari ini dengan gambar syair serta erek erek togel syair sdy 2022,Selamat datang di situs kami dan selamat anda mendapatkan situs syair sdy paling lengkap dan juga serta terpercaya.
    Syair Sdy Hari ini

  18. Live Draw Macau 2 months

    Situs live draw Macau langsung. Hasil keluaran togel macau. Result toto Macaupools terlengkap. Pengeluaran Macau pools tercepat hari ini.

  19. Richi Machinery 2 months

    I think these suggestions are useful

  20. emita 6 days

    thanks for information

  21. Liukang 6 days

    Very good information.

  22. avril mega 6 days

    I read your article it is very interesting and every concept is very clear, thank you so much for sharing.

  23. kode syair 5 days

    Nice post