Copy-pasting your methods section is good, actually.

October 28, 2020

Methods disclosure is a central component of scientific communication, critical to the evaluation, verification and replication of a study. It is in the interest of transparency and reproducibility to enable the description of methods by the verbatim reproduction of texts automatically generated by those methods.

The chain of processing from measurement to inference can be long, with many consequential choices involving the selection of processes and the ordering of those processes. A researcher may choose to use a process that is itself a pipeline of sub-processes, and the selection of subprocesses may be a dynamic process. It is possible for complex tools to automatically generate a textual account of their processing that is appropriate for the inclusion in a scientific publication. This is increasingly common with the uptake of containerized pipelines that provide records of data provenance.

An example tool that provides this functionality is the functional MRI preprocessing tool, fMRIPrep. In addition to preprocessed data, fMRIPrep generates boilerplate text that describes the specific processing steps that are performed, based on the input data, user-specified options and the versions of fMRIPrep and the subprocesses that it coordinates. This text, while variable depending on the exact choices of analysis parameters, is substantially the same from study to study. Other tools are being actively developed to summarize the data acquisition parameters or the specification of models, to take two examples. In each of these cases, the automatic generation of text promotes the precise and transparent reporting of methods, reducing the burden on each researcher and reviewer to provide and demand an adequate description of the study.

It is an unfortunate fact that these software-generated texts will sometimes trigger the plagiarism detection software that is increasingly used by the publishers of scientific journals. This technology, which reduces the load on journal editors and reviewers by automatically flagging potentially plagiarized text in submitted papers, has already begun to identify software-generated methods sections as plagiarism. While the consequences of a paper being flagged by this technology are unclear, it is reasonable to suppose they may range from a clarifying conversation, a request to reword, an outright rejection or even a damaged reputation.

It is therefore critical that we address this issue now to provide clarity to authors and editors. We believe that to subject software-generated methods descriptions to the scrutiny of plagiarism detection is to make a category error. When reporting methods, clarity and precision must be the objective, while novelty for its own sake can only serve to obscure the methods used. If two studies use identical methods for some or all of their processing, it is appropriate, indeed clarifying, for them to use identical text to describe that processing.

The Committee on Publication Ethics (COPE) has issued guidance encouraging authors’ reuse of their own text across publications where the methods described are substantially the same. Specifically, they note that

text recycling may be unavoidable when using a technique that the author has described before and it may actually be of value when a technique that is common to a number of papers is described.

We agree that reusing text to describe the same methods is valuable, and we extend this argument to include text that was generated automatically. In keeping with COPE guidelines, the text generated by fMRIPrep is clearly labeled with regard to its purpose. We have further explicitly licensed any fMRIPrep generated text as CC0 (public domain dedication), so that there are no copyright concerns regarding its reuse.

We call on scientific publishers to explicitly recognize the validity of software-generated methods descriptions and provide guidance for researchers and editors on the use of such descriptions. Software-generated text should be clearly labeled as being subject to scrutiny for clarity and validity, but not novelty, and exempted from plagiarism detection software. Otherwise, researchers will be forced to needlessly modify this automatically generated text in a way that is likely to reduce its accuracy and clarity, and editors will find themselves spending extra time dealing with such issues.

The preceding text is a work-in-progress. The impetus for this statement was a pair of reports (0, 1) of editorial resistance to using fMRIPrep’s boilerplate text in the methods section, due to its being flagged by plagiarism detection software. However, this problem is not unique to fMRIPrep. Other tools, such as NiBetaSeries and eCobidas, generate text that is intended to be used in methods sections, and we believe that this practice will spread if researchers are permitted to take advantage of it.

We gratefully acknowledge initial feedback from Peer Herholz, Remi Gau, Tom Johnstone and Elizabeth DuPre. Thanks also to Vince Calhoun for the iThenticate Q&A reference (below). Please join the discussion on the Brainhack Mattermost!

For further reading on copyright and reuse of text in methods sections, see also:

Is reproducible reporting of research methods in conflict with copyright law? – protocols.io

According the the U.S. Copyright Office, recipes are not subject to copyright as they are statements of fact.
[…]
Science requires precision. Artificially changing method details to avoid detection by anti-plagiarism software leads to errors, confusion, and lack of precision that makes it hard to compare methods and detect changes in procedure. Reproducibility is challenging enough to begin with; we do not need any additional obfuscation.

Self-plagiarism Q&A forum – iThenticate

Q6: “If a scientist is describing a method that is used in different papers, can they use that same description?”
A: (Bob) Anecdotal feedback from CrossCheck members indicates that editors are largely unconcerned with plagiarism in method sections. In fact, it has been requested that iThenticate includes a feature that excludes methods from originality check.
(Rachael) I’d agree with Bob. An Editor reading the paper as a subject specialist will understand that there will necessarily be a degree of overlap/the same methods section if the same method has been used.

Copy-pasting your methods section is good, actually.

1 Comment

Cancel reply