IoP Publishing’s discovery that researchers are split down the middle on the merits of using AI in peer review is not surprising given the complexity of the issue.
The publisher’s August of just under 350 physics researchers found that 41 per cent were positive about the use of AI in peer review and 37 per cent were negative.
The case for using AI is obvious. With more than? in existence, the process is highly human-intensive. To illustrate this point, if the average journal published 50 papers per year, with each submitted paper being reviewed by two independent referees and each review requiring four hours of effort, that translates into an annual 12 million hours of peer review work.
Moreover, that figure does not include the work of the editorial boards that oversee the review process or the editorial staff who process the papers – not to mention the time editors take trying to find suitable reviewers willing to take on a manuscript. It also ignores the fact that many?, creating a multiplier effect that may increase the annual reviewing burden by a factor of five or more.
91茄子
Replacing human referees with generative AI instead would therefore ease a reviewing burden that is commonly agreed to be verging on unsustainable.
Then there is the issue of . Many journals promise rapid review. Yet speed must be balanced against the quality and depth of the reviews provided. With AI-generated reviews, time would no longer be an issue.
91茄子
Furthermore, if AI peer reviewing did become widely available, researchers could use it to evaluate their research manuscripts before submitting to a journal. Incorporating such a system into repositories, such as , would facilitate this, making peer review?part of the research process itself by offering suggestions that impact the final product.
But, of course, there are also challenges to adopting AI. The purpose of peer review is to answer three questions about the manuscript. Is the research new? Are the research results correct? And do the results add intellectual value in a field or provide benefits in a discipline or beyond?
Most research incrementally builds on existing results, and AI systems are well suited to make such evaluations. They can respond to an informal checklist of measures that capture what the research manuscripts build on and how well it has used the scientific method to achieve its objectives. This is no different from how a human peer reviewer would proceed.
However, the answers to the second two of the three questions above are highly dependent on the field of study and the type of research conducted. Although the is likely at the foundation of most research and discovery in STEM disciplines and some of the social sciences, variations in theoretical, experimental and data analytics research make a one-size-fits-all approach to AI peer review problematic.
In addition, if the research breaks new ground, providing a quantum shift in thinking, making such evaluations would be more difficult given that the existing literature would not provide any foundation to evaluate such new ideas.
91茄子
Perhaps the most difficult role for AI would be to assess the third question, pertaining to the value and benefits of the research. Although such evaluation is highly subjective, it is often what provides the insight that is at the core of peer review’s value.
Donald Trump’s recent executive order, ””, calls for the adoption of “unbiased peer review” to improve the research process, including how research is disseminated and evaluated. That could be read as an implicit call for the adoption of AI peer review. But, of course, bias is always in the eye of the beholder. While Trump and his MAGA allies might see research on gender or climate change as being of little value, others will disagree. AIs are no more “unbiased” than humans in that sense – as the of Elon Musk’s Grok AI aptly demonstrates.
Moreover,??must be trained with data, which itself may – depending on your opinion – be biased or contaminated with information that is demonstrably false. Although AI systems look smart, they are doing nothing more than regurgitating what they learned when trained. As the data modelling adage goes: “”.
91茄子
To return to the issue of recognising the value of groundbreaking research, it is possible that an AI’s training data could inadvertently create a “group think” assessment, which uprates research that methodologically builds on existing knowledge but fails to recognise the benefits of “out-of-the-box” ideas, potentially disincentivising research creativity.
In my view, when it comes to assessing the value and significance of research. But we should not rely on hunches. To test this possible limitation, an AI peer-review process should be implemented in parallel with human peer review, with the human peer reviewers allowed to see the AI peer review after they complete their own assessments. It may well turn out that humans and AIs agree with each other much more frequently in some fields than others, with the former more suited to a switch to AI reviewing.
Ultimately, AI’s most appropriate role might be to support human peer review, rather than replace it, picking up more perfunctory issues while the human reviewer connects the dots and has the final say. But we don’t know. And the bottom line is that we must proceed with caution until we do.
Let’s hold off implementing AI peer review until – no pun intended – it can itself be peer-reviewed to ensure it meets the very standards that authors and editors rightly expect.
91茄子
is founder professor in computer science at the University of Illinois Urbana-Champaign.
Register to continue
Why register?
- Registration is free and only takes a moment
- Once registered, you can read 3 articles a month
- Sign up for our newsletter
Subscribe
Or subscribe for unlimited access to:
- Unlimited access to news, views, insights & reviews
- Digital editions
- Digital access to 罢贬贰’蝉 university and college rankings analysis
Already registered or a current subscriber?