Introduction
I am against open peer review because if I’m supposed to reveal my identity to the authors, I will have to do a much better job when I review, and I don’t have time for that…
These words came from a PhD student recently at a seminar on the topic ‘open science’, hosted by the Faculty of Health at the University of Southern Denmark. The room went completely silent after her statement. On the one hand, participants recognized their own dilemma when peer reviewing, while, on the other, they realized that perhaps it is time to revisit our current peer-review practices. Questions such as ‘Are the current procedures optimal for both reviewer and reviewed?’ and ‘Do current procedures ensure best-quality assessment of research?’ arose from her remark.
The purpose of peer review is to assess the quality of research. It can be done in a variety of ways, but the most prevalent types are single- or double-blind. Single-blind peer review is when the reviewers are aware of the authors’ identities, but the authors are not aware of the reviewers’ identities. For double-blind peer review, neither the authors nor the reviewers are aware of the others’ identities (Shema Reference Shema2014).
The double-blind procedure is generally considered less biased and consequently seen as being of higher quality than the single-blind. However, given the remark made by the PhD student above, there are reasons to doubt that, and several researchers have also questioned whether it is in fact possible to mask author identities among the research colleagues working in narrow research fields (Lee et al. Reference Lee, Sugimoto, Zhang and Cronin2013).
The peer review process was first introduced to scholarly publications in 1731 by the Royal Society of Edinburgh, where a procedure was established that resembled those used in modern scholarly publishing. Materials sent to the Society for publication would be inspected by a group of society members, who were presumed to be knowledgeable about the matter, and whose recommendation to the editor was influential for the future progress of the manuscript (Spier Reference Spier2002).
Adopting the peer review practice was slow and only gained momentum around the middle of the twentieth century. A famous story tells how Albert Einstein was ‘incredibly offended’ in 1936 when his manuscript submitted to Physical Review was sent out to be refereed. He withdrew it, protesting that he had not authorized the editor to do so (Al-Mousawi Reference Al-Mousawi2020). And, several of the last century’s greatest publications, such as Einstein’s four famous papers in Annalen der Physik and Watson and Crick’s work from 1953 describing the double helical structure of DNA, were never peer reviewed (Spicer and Roulet Reference Spicer and Roulet2014).
The modern peer review process found its current form after the Second World War, apace with a gradual and steady increase in scientific research, the specialization of articles, and competition for journal space (Al-Mousawi Reference Al-Mousawi2020). Spier (Reference Spier2002) also notes that an important driver in this respect was the commercial availability of the Xerox photocopier from 1959, making replication of manuscripts much easier.
Journals such as Science and the Journal of the American Medical Association (JAMA) started performing peer review in the 1950s and 1960s, Nature in 1973, and The Lancet in 1976. However, it was not until by the middle of the 1990s that peer review became commonplace (Al-Mousawi Reference Al-Mousawi2020).
Many authors in countless publications have discussed the double-blind peer-review process. Those in favour of the current system argue that peer review is perhaps not perfect, but it is the best we have for now (e.g., Anderson and Ledford Reference Anderson and Ledford2021), while other authors are critical of the procedure, although they do not present alternatives (e.g., Kern-Goldberger et al. Reference Kern-Goldberger, James, Berghella and Miller2022). I will take a slightly different approach, arguing that in the light of the current changes in the scholarly communication ecosystem, the constantly increasing publication pressure on researchers, and technological developments, perhaps it is time to reconsider this procedure, to review the workflows and seek means of improving our current system. That is the purpose of this article.Footnote a
This article is structured as follows. A section will raise the question of whether peer review is an act of communication and, if so, what the implications of this are. Then follows a section that summarizes the literature on the costs of peer review in terms of researcher work hours. I will then move on to a discussion of the consequences of the phenomenon known as the peer reviewer crisis: it is becoming increasingly difficult for editors to find qualified and willing peer reviewers. The section before the conclusion of the paper will summarize the discussions on biases in peer review. Finally, the conclusion will address to what extent new technologies and practices of the research community offer a potential for improving our current peer review procedures.
Peer Review as Communication
Any transaction where a message travels between a sender and a receiver is in its classical sense an act of communication (see for example, Burnett and Dollar Reference Burnett and Dollar1989). Shannon and Weaver developed one of the first communication theories describing this in 1948 (Shannon Reference Shannon1948; Shannon and Weaver Reference Shannon and Weaver1949).
Although their focus was technical, the model rapidly gained momentum due to the introduction of the concept of ‘noise’. The model acknowledges that ‘noise’ can result in distortion of the message at any point of its travel from sender to receiver. Although the model has been criticized for being too simplistic and has had several more advanced incarnations since 1949, it remains a solid frame of reference in communication research and is useful in this context, since it may be used to draw attention to the following point: at any time in any communication process, when the message travels between a sender and a receiver, there is a chance that the message will become distorted, leading to misunderstandings and/or unintended actions.
The peer review process can be considered a communication process since it involves a (series of) message(s) that travel(s) between sender and receiver. Furthermore, there can be no doubt that peer review is also a rather complicated process. For those who want to ascertain the truth in this statement, Googling ‘peer review process flow chart’, selecting display of images, gives a hint of how complex the process is, seen from the publishers’ and the authors’ points of view. While Shannon and Weaver’s model contained three elements (sender–message–receiver), some of the various flowcharts retrieved by the search suggested above contain more than 25 elements including author, editor, editorial assistant, editorial board, verification and plagiarism check, the actual review process, resubmission process, and so on. With such complicated procedures, it is highly likely that the messages (i.e., the manuscript, the peer-review report, the rebuttal) will be exposed to ‘noise’, leading to misunderstanding between the author and the reviewer.
It is well known that the most efficient way to reduce the noise-based distortions is by allowing the receiver to provide the sender with feedback, to ask for clarifications or indicate a lack of understanding of the message, and so on. Therefore, since the days of Shannon and Weaver, communication research has consistently shown that one-way communication is not nearly as effective as two- or multi-way communication. What adds to this is that it is also well known that the more complex the message, the more imperative it is that the communication be two or multi-way, to minimize misunderstandings (e.g., McQuail Reference McQuail and Donsbach2008).
Due to the blindness of the peer review processes, the possibilities of the reviewed to ask the reviewer for clarifications are limited, and consequently the peer review process is to a large extent a one-way communication. One could argue that since the reviewed is often encouraged to write a so-called rebuttal, an opportunity for feedback exists. But this is only the case if one can assume that the message of the reviewed was correctly understood by the reviewer to start with; second, that the review report was correctly understood by the reviewed; and, third, that the rebuttal letter is correctly understood by the reviewer. Bearing Shannon and Weaver’s model in mind, with all its potential for infusion of noise, it is hard to imagine that this is an efficient way of conveying a message. In other words, we have allowed the cornerstone of academic quality assessment – the peer review – to rely on a communication process where the chances of misunderstandings and misinterpretations are quite large.
What may be even more puzzling is that, while the Mertonian CUDOS norms – Communalism, Universalism, Disinterestedness, and Organized Scepticism (Merton Reference Merton and Merton1973 [1942]) − are generally accepted as institutional imperatives comprising the ethos of modern science, one could argue that the double-blind peer review process does not truly encourage organized scepticism. Reviewers may be biased against certain theories and approaches, and, due to the anonymity, the reviewed cannot defend him-/herself against such potential prejudices.
Similarly, when it comes to the principle of communalism, the double-blind nature of the process hinders collaboration and communication between the reviewers, since they are not able to build relations or engage in a constructive dialogue due to the enforced blindness. But still, we use blind peer review for almost all processes where research or researcher quality are assessed: for hiring, tenure, promotion, institutional assessments, funding, and publishing (e.g., Moher et al. Reference Moher, Naudet, Cristea, Miedema, Ioannidis and Goodman2018). Since the peer review practices of the scientific journals are the ones that are most thoroughly studied and that researchers are subjected to most frequently, the remainder of this article focuses on those.
The Price of Peer Review
It is well documented that a lot of time is spent on peer reviewing scientific journal articles. A study from 2021 conducted by a group of Hungarian psychologists made the following simple calculations based on publicly available data (Aczel et al. Reference Aczel, Szaszi and Holcombe2021): using a reference database that indexes approximately 87,000 scholarly journals, they found that in 2020 at least 4.7 million articles were published. It should be noted that even though the database covers 87,000 scholarly journals it does not cover all scientific journals. Consequently, the calculations to follow are based on a conservative estimate.
Their research also revealed that, on average, across all journals in the database, only 55% of manuscripts submitted to the journals are published. Consequently, they assumed that an additional 3.8 million articles are submitted, peer reviewed, and rejected. To assess how many peer reviews are needed to process this number of articles, they made another conservative estimate. Each published article on average needed three peer review reports, and each rejected article needed two. Therefore, they made the following calculation: (4.7 million × 3) + (3.8 million × 2) =21.8 million. This means that in the year 2020 a total of 21.8 million peer reviews were carried out to publish the 4.7 million articles.
The next question is how much time it takes to do this. Figures vary across disciplines, but, again based on data from Publons (2018), a conservative guess would say 6 hours per review.
As a result, one can calculate the number of hours spent on peer reviewing journal articles to be 130.8 million hours annually, or just about 15,000 years of working 365 days of 24 hours. So, assuming that peer review is only done during an 8-hour working day, the peer reviewing done in 2020 for these articles equals the annual workload of 45,000 researchers. Based on these calculations, it is fair to say that the peer review processes are quite time-consuming. That is not a problem in itself: Research is by nature time-consuming. The problem is that due to the blind nature of much of the peer review work, many institutions fail to acknowledge peer review as researcher workload (Bernstein Reference Bernstein2013).
The Gap in Demand and Supply of Peer Review
To this should be added that the growth rate of scientific articles has been around 4% annually (Bornmann et al. Reference Bornmann, Haunschild and Mutz2021). The growth rate of the number of researchers far exceeds the growth in research output: the number of researchers has grown by between 10% and 15% annually over the last few decades (Naujokaitytė Reference Naujokaitytė2021). However, the growth in the number of researchers is not equally distributed around the globe. It is in countries such as India and China that this growth is the highest.
Furthermore, a recent study found that only 10% of the active peer reviewers are responsible for 50% of all peer reviews. That study also found that in 70% of the cases where researchers decline to carry out a peer review, their reason for declining is that they consider the article to be outside their area of expertise (Petrescu and Krishen Reference Petrescu and Krishen2022). This high percentage is probably a sign of editors’ difficulties in finding qualified peer reviewers.
So, to be able to identify reviewers, editors must search further and further away from the centre of the disciplines and move down the academic ladder. For my own part, I am being asked on a regular basis to do peer review on topics that I have not worked on for the last 20 years, or even on topics that my co-authors from cross-disciplinary work have been working with, such as biology. In a survey of the qualifications of peer reviewers, 40% of them admitted that they have never received any training in peer reviewing (Petrescu and Krishen Reference Petrescu and Krishen2022).
To sum up, there is clearly a gap between the demand and the supply of quality peer review, often referred to as the ‘Peer Review Crisis’.
The editor-in-chief of the Journal of Psychiatric Research described in an editorial letter how she receives more than 50 manuscripts weekly and how she must send out at least ten invitations to peer review to get two acceptances (DeLisi Reference DeLisi2022), which amounts to sending 500 invitations to review weekly. Of the two that accept, on average only one of them will return a timely peer review. So, on top of the 500 weekly invitations, she must send an additional 250 emails to peer reviewers that do not deliver or to identify others who will.
On one day this editor decided to record all reasons for reviewers declining (DeLisi Reference DeLisi2022):
too many deadlines… I decline due to illness… illness in the family… I am on leave… about to have a child… on maternity leave… on paternity leave… on sabbatical for 6 months… not available at this time; try me another time… sorry, really have no time… due to my work schedule I am unable to do it at this time… not enough time right now… outside my area… don’t have the expertise… current work load does not allow me to do this… this is a busy time… on holidays… won’t review for your journal anymore because you took too long to get my own paper reviewed…
All these are very familiar and understandable excuses. Some editors experiment with APC vouchers or even ‘best-peer reviewer awards’, but none of these solutions have so far bridged the gap. Senior researchers are not in it ‘for the money’, APC vouchers only have value if the reviewer plans to submit to the journal he/she is reviewing for, and peer reviewer awards obviously lose their value if they are given to all peer reviewers. Studies on whether monetary rewards are effective have also been discouraging so far (Zaharie and Seeber Reference Zaharie and Seeber2018).
Biases of Peer Review
In a frequently cited essay from 2006, Richard Smith, a former editor of a highly ranked medical journal, summarized the flaws of the peer-review processes (Smith Reference Smith2006). In an entertainingly eloquent style, he accounts for many of the experiments done to find out whether peer review serves its purpose. He concludes as follows:
peer review is a flawed process, full of easily identified defects with little evidence that it works. Nevertheless, it is likely to remain central to science and journals because there is no obvious alternative, and scientists and editors have a continuing belief in peer review. How odd that science should be rooted in belief.
In one of the experiments, the editors of the journal inserted major errors into a series of manuscripts and sent them to regular peer reviewers. None of the peer reviewers identified all errors; some spotted none and most only about a quarter of them.
In another study dealing with the inconsistencies of peer review, he presents a series of examples of the subjectivity of the process. This example is among the most grotesque (Smith Reference Smith2006).
Reviewer A: I found this paper an extremely muddled paper with a large number of deficits.
Reviewer B: It is written in a clear style and would be understood by any reader.
On top of the subjectivity of the process, one might add that it is also documented by several studies that peer review is often biased (Haffar et al. Reference Haffar, Bazerbachi and Murad2019) against, for example, gender (Kern-Goldberger et al. Reference Kern-Goldberger, James, Berghella and Miller2022) or institutions. In a study, 12 already-published articles by famous authors from famous institutions were selected. The names of the institutions were changed to names such as ‘The Tri-Valley Center for Human Potential’. In only three instances did the journal editor and peer reviewers realize that the articles had already been published in the journal. The remaining nine were rejected (Smith Reference Smith2006).
On a regular basis, social media are awash with stories about how peer review processes have been used to steal other people’s ideas, to delay competitors’ research, or to suppress interpretations of data or theories with which the reviewers disagree for one reason or another. For those who find such stories entertaining there is a Facebook group with the title ‘Reviewer 2 must be stopped’ dedicated to sharing horror stories on peer review.
Quo Vadis?
One could optimistically think that all or at least some of these failures could be overcome by adequate education and training of peer reviewers. However, when the editors of the above-mentioned highly ranked medical journal undertook a randomized test, the result was disappointing. They divided a group of peer reviewers into three subgroups: one that received no training, one that underwent face-to-face training combined with a digital learning program, and one that only received the digital training. The conclusion of the experiment was that there was no difference in the performance across the three groups (Smith Reference Smith2006). A former editor of another highly ranked medical journal used to joke that he was not sure that anyone would notice if he swapped the pile of rejected manuscripts with the accepted ones (Smith Reference Smith2006).
However, keeping in mind that there is currently no obvious alternative to peer review for assessing the quality of manuscripts for scientific journals, we should seek a means of improving our current practices. One obvious question to ask is: could opening the peer review processes be part of the answer? Not even such a simple question as this is easy to answer. First we need to agree what we mean by ‘open peer review’.
A study from 2017 identified 122 different definitions of open peer review from a systematic review of 137 articles (Ross-Hellauer Reference Ross-Hellauer2017). The analysis of the articles revealed that the each of the definitions contained one or more of these elements:
-
Open identities: authors and reviewers are aware of each other’s identity.
-
Open reports: review reports are published with the article.
-
Open participation: the wider community may contribute to the review process.
-
Open interaction: a reciprocal discussion between author(s) and reviewers, and/or between reviewers, is allowed and encouraged.
-
Open pre-review manuscripts: manuscripts are made immediately available via preprint servers such as arXiv ahead of any formal peer review procedures.
-
Open final version commenting: review, or rather commenting, is done on the final version of the publication.
-
Open platforms (‘decoupled review’): reviews are facilitated by a different organizational entity than the venue for publication.
However, the core traits of the 122 different definitions were easily identified:
-
Open identity is part of 90% of the definitions.
-
Open reports are part of 60% of them and
-
Open participation of 30%
If the three are combined with a Boolean ‘or’ then 99% of the definitions are covered. Consequently, the discussion should revolve around these three core traits and to what extent they will solve the problems discussed above.
There is little evidence in this matter so far (Ross-Hellauer and Görögh Reference Ross-Hellauer and Görögh2019). One might assume that open identity would lead to better-quality peer review and that it could create better incentives for researchers to do peer review work, since it would enable consistent registration of peer review activities as part of the researcher workload.
However, it could also hinder peer review. Peer reviewers may decline reviewing assignments due to a fear of unprofessional behaviour of the reviewed and vice versa. A study from 2019 reported that when faced with the opportunity to reveal their identity to authors, only 8% of 18,000 reviewers chose to do so (Bravo et al. Reference Bravo, Grimaldo, López-Iñesta, Mehmani and Squazzoni2019). Their study also showed that among those who chose to have both their name and the review open, the rejection rates were lower than among those who remained anonymous.
The researchers found no significant negative effects on referees’ willingness to review, their recommendations, or turnaround time of open peer review. On the contrary, reviewers were keener to accept to review, more objective in their reports, and less demanding regarding the quality of submissions when under open peer review. The tone of the report was less negative and less subjective. Again, since only 8% of reviewers agreed to reveal their identity, we still need to understand the appropriate level of openness of peer review. In fact, more than 90% of the reviewers still preferred the blind procedure (Bravo et al. Reference Bravo, Grimaldo, López-Iñesta, Mehmani and Squazzoni2019).
Since the study only measured the effects on those who volunteered to do open peer review, it cannot be used to predict the consequences of making open peer review the default. There is a real danger that the gap in demand and supply of peer review may widen even more and that the increase in the quality of the peer reviews seen among volunteers would not apply in a full-scale setting.
In summary, opening peer review (defined as open identity and open reports) could be part of the answer, but could also have other unforeseen consequences: there is no guarantee that it will solve the deficit gap between supply and demand for peer review, and there is no evidence suggesting that it could solve bias problems.
Recent developments within the realm of Artificial Intelligence (AI) suggest that sometime in the not-too-distant future the role played by AI in peer review may become more significant (Checco et al. Reference Checco, Bracciale, Loreti, Pinfield and Bianchi2021). No fully automated AI peer reviewer tool has yet been developed. Nevertheless, editors and reviewers are already getting computer-based assistance with specific tasks relating to peer review, such as screening references, plagiarism detection, and checking compliance with journal policies (Hosseini and Horbach Reference Hosseini and Horbach2023). AI systems may also help identify reviewers, support reviewers’ writing of constructive and respectful reports, assist in formatting so that reviewers may focus more on content than format, and assist editors with desk rejection of manuscripts (Hosseini and Horbach Reference Hosseini and Horbach2023).
Currently, the discussions about using AI for peer review are mostly centred on how AI can assist reviewers and authors rather than replacing human decision making or to what extent AI can be used to model human reviewer decision making and to expose possible biases. However, research has shown that AI-based systems are able to successfully predict peer review outcome (Checco et al. Reference Checco, Bracciale, Loreti, Pinfield and Bianchi2021). This is not surprising since AI, until now, has been mimicking human intelligence, and AI techniques are trained on data from the past. Therefore, any AI system will have the same biases as human reviewers have (Hosseini and Horbach Reference Hosseini and Horbach2023). Hence, while AI may be capable of performing full peer review in the future, it may have exactly the same weaknesses as the current practices.
Concluding Remarks
In summary, neither opening up the peer review processes nor using AI for peer review can solve the current peer review crisis here and now. However, both may provide elements of a future solution. While opening the peer review process through open identity and open reports may modernize the communication process, reduce bias, increase the quality of peer review and create better incentives for researchers to perform peer review, it may not narrow the gap between demand and supply since opening the process may also mean that performing peer review will become more time-consuming. This problem, in turn, may be solved by developing advanced AI to assist human decision making in peer review processes and thus save time and money. However, with current technology, such tools may reinforce bias due to the inherent conservatism built into the learning processes of AI-based systems. So, apparently, there are no quick fixes, and the pressure on the scholarly communication ecosystem remains.
In this article, I have focused on peer review for scientific journals only. I have shown how the scholarly communication ecosystem is under significant pressure in relation to peer review in the publication process. In other areas of the ecosystem, the demand for peer review is also increasing. The European Researcher Assessment Reform under the Coalition for Advancing Research Assessment (CoARA) encourages the use of qualitative assessment of research and researchers for funding, promotion, and tenure. The intention of this is to reduce the irresponsible use of quantitative measures such as h-indexes, journal impact factors and citation rates for hiring and firing purposes. If this is successful and implemented in larger parts of the world, one must expect the demand for peer review to increase in the years to come. The question of how to manage this in the future still needs to be addressed.
About the Author
Charlotte Wien is a full professor of scholarly communication at Arctic University Tromsö, Norway. Furthermore, she has recently joined Elsevier as Vice President of Library Relations. Charlotte has worked both as a scholar and as a librarian for her entire career. Her MA is in humanities and library science and she holds a PhD degree in information retrieval.