# AI/LLM use in evaluations

*This is an evolving policy; we have consulted with our Management Team but have not formally confirmed all aspects.*

## Framework

Our goal is relevant, insightful, and correct evaluations that fairly assess research credibility and value, help researchers improve their work, and help practitioners and other researchers use it appropriately. We want evaluators to have access to all tools that facilitate this.

At the same time, we are mindful of other concerns:

* continuing to cultivate and demonstrate the value of our evaluation process;
* encouraging people to continue to work with us;
* capturing credible and independent sources of judgment and reasoning;
* being transparent with readers of our evaluations.

## What we provide to evaluators

After evaluators have submitted their initial evaluations, we aim to provide (and later share publicly):

1. Flagship model / deep-research LLM evaluations of the paper
2. LLM consistency checks of the evaluation against the paper

Evaluators are encouraged to provide feedback on these, and may adjust their evaluations if they are confident the LLM has identified an important misunderstanding — noting any such adjustments.

We will also recommend and try to facilitate other 'bug-checking' tools, such as RegCheck (a preregistration checker).

## What evaluators may use

Evaluators may use AI language model tools selectively. Appropriate uses include looking up or clarifying content in the paper (e.g. using NotebookLM as a reference tool), or running extensive checks that are not feasible to do by hand — in which case the AI-assisted section should appear separately (in a dedicated section, link, or footnote) with at least a hand-checked sample.

Evaluators are encouraged not to use AI tools for overall evaluations or for the ratings and predictions components, except as described above under what we provide.

## Transparency requirements

Evaluators must report how AI tools were used, ideally by:

1. Providing links to their AI chats or explanations
2. Explicitly identifying any sections of text that were generated by LLMs

## Human work requirement

Evaluators are expected to put in at least 8 hours of human work, over and above the processing time of any LLM tools.

## Standing by your evaluation

Evaluators must stand by all content and language in their report as their own judgment. They must independently verify and carefully consider any points raised by AI tools.

## Related evidence and context

Emerging evidence points toward a complementary role for AI in peer review processes like ours. Biswas et al. (2025), [AI-Assisted Peer Review at Scale: The AAAI-26 AI Review Pilot](https://arxiv.org/abs/2604.13940)(arXiv:2604.13940), report on a large-scale survey of AAAI-26 authors and program committee members and found "participants not only found AI reviews useful, but actually preferred them to human reviews on key dimensions such as technical accuracy and research suggestions" The system processed all 22,977 full-paper submissions in under a day using frontier models with tool use and safeguards.

This offers promising evidence for AI-assisted peer review at scale. But we are being cautions — we consider we use AI checks as a complement and a supplement to human evaluation, preserving independent expert judgment while adding an additional consistency check and source of suggestions.

*Also relevant: Johnny Coates'* [*"Best Practices for preprint peer review services in the use of AI"*](https://docs.google.com/document/d/17Fz4opzft7mfvVMbnU8bnwXNabWnuw8F8YWS3whLMqM/edit?tab=t.0)*.*


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://globalimpact.gitbook.io/the-unjournal-project-and-communication-space/policies-projects-evaluation-workflow/evaluation/guidelines-for-evaluators/ai-llm-policy.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.