Guidelines for evaluators
This page describes The Unjournal's evaluation guidelines, considering our priorities and criteria, the metrics we ask for, and how these are considered.
These guidelines apply to the evaluation forms in Coda here (academic stream) and here (applied stream).
Please see For prospective evaluators for an overview of the evaluation process, as well as details on compensation, public recognition, and more.
What we'd like you to do
Write an evaluation of the target , similar to a standard, high-quality referee report. Please identify the paper's main claims and carefully assess their validity, leveraging your own background and expertise.
.
Answer a short questionnaire about your background and our processes.
Writing the evaluation (aka 'the review')
In writing your evaluation and providing ratings, please consider the following.
The Unjournal's expectations and criteria
In many ways, the written part of the evaluation should be similar to a report an academic would write for a traditional high-prestige journal (e.g., see some 'conventional guidelines' here). Most fundamentally, we want you to use your expertise to critically assess the main claims made by the authors. Are the claims well-supported? Are the assumptions believable? Are the methods are appropriate and well-executed? Explain why or why not.
However, we'd also like you to pay some consideration to our priorities, including
Advancing our knowledge and supporting practitioners
Justification, reasonableness, validity, and robustness of methods
Logic and communication, intellectual modesty, transparent reasoning
Open, communicative, replicable science
See our guidelines below for more details on each of these. Please don't structure your review according to these metrics, just pay some attention to them.
If you have questions about the authors’ work, you can ask them anonymously: we will facilitate this.
We want you to evaluate the most recent/relevant version of the paper/project that you can access. If you see a more recent version than the one we shared with you, please let us know.
Target audiences
We designed this process to balance three considerations with three target audiences. Please consider each of these:
Crafting evaluations and ratings that help researchers and policymakers judge when and how to rely on this research. For Research Users.
Ensuring these evaluations of the papers are comparable to current journal tier metrics, to enable them to be used to determine career advancement and research funding. For Departments, Research Managers, and Funders.
Providing constructive feedback to Authors.
We discuss this, and how it relates to our impact and "theory of change", here.
Quantitative metrics
We ask for a set of nine quantitative metrics. For each metric, we ask for a score and a 90% credible interval. We describe these in detail below. (We explain why we ask for these metrics here.)
Percentile rankings
For some questions, we ask for a percentile ranking from 0-100%. This represents "what proportion of papers in the reference group are worse than this paper, by this criterion". A score of 100% means this is essentially the best paper in the reference group. 0% is the worst paper. A score of 50% means this is the median paper; i.e., half of all papers in the reference group do this better, and half do this worse, and so on.
Here* the population of papers should be all serious research in the same area that you have encountered in the last three years.
Midpoint rating and credible intervals
For each metric, we ask you to provide a 'midpoint rating' and a 90% credible interval as a measure of your uncertainty. Our interface provides slider bars to express your chosen intervals:
See below for more guidance on uncertainty, credible intervals, and the midpoint rating as the 'median of your belief distribution'.
The table below summarizes the percentile rankings.
Overall assessment
Percentile ranking (0-100%)
Judge the quality of the research heuristically. Consider all aspects of quality, credibility, importance to future impactful applied research, and practical relevance and usefulness.
Claims, strength and characterization of evidence
Do the authors do a good job of (i) stating their main questions and claims, (ii) providing strong evidence and powerful approaches to inform these, and (iii) correctly characterizing the nature of their evidence?
Methods: Justification, reasonableness, validity, robustness
Percentile ranking (0-100%)
Are the used well-justified and explained; are they a reasonable approach to answering the question(s) in this context? Are the underlying assumptions reasonable?
Are the results and methods likely to be robust to reasonable changes in the underlying assumptions?
Avoiding bias and questionable research practices (QRP): Did the authors take steps to reduce bias from opportunistic reporting ? For example, did they do a strong pre-registration and pre-analysis plan, incorporate multiple hypothesis testing corrections, and report flexible specifications?
Advancing our knowledge and practice
Percentile ranking (0-100%)
To what extent does the project contribute to the field or to practice, particularly in ways that are to global priorities and impactful interventions?
(Applied stream: please focus on ‘improvements that are actually helpful’.)
Do the paper's insights inform our beliefs about important parameters and about the effectiveness of interventions?
Does the project add useful value to other impactful research?
Logic and communication
Percentile ranking (0-100%)
Are the goals and questions of the paper clearly expressed? Are concepts clearly defined and referenced?
Is the "? Are assumptions made explicit? Are all logical steps clear and correct? Does the writing make the argument easy to follow?
Are the conclusions consistent with the evidence (or formal proofs) presented? Do the authors accurately state the nature of their evidence, and the extent it supports their main claims?
Are the data and/or analysis presented relevant to the arguments made? Are the tables, graphs, and diagrams easy to understand in the context of the narrative (e.g., no major errors in labeling)?
Open, collaborative, replicable research
Percentile ranking (0-100%)
This covers several considerations:
Replicability, reproducibility, data integrity
Would another researcher be able to perform the same analysis and get the same results? Are the methods explained clearly and in enough detail to enable easy and credible replication? For example, are all analyses and statistical tests explained, and is code provided?
Is the source of the data clear?
Is the data made as available as is reasonably possible? If so, is it clearly labeled and explained??
Consistency
Do the numbers in the paper and/or code output make sense? Are they internally consistent throughout the paper?
Useful building blocks
Do the authors provide tools, resources, data, and outputs that might enable or enhance future work and meta-analysis?
Relevance to global priorities, usefulness for practitioners
Are the paper’s chosen topic and approach to global priorities, cause prioritization, and high-impact interventions?
Does the paper consider real-world relevance and deal with policy and implementation questions? Are the setup, assumptions, and focus realistic?
Do the authors report results that are relevant to practitioners? Do they provide useful quantified estimates (costs, benefits, etc.) enabling practical impact quantification and prioritization?
Do they communicate (at least in the abstract or introduction) in ways policymakers and decision-makers can understand, without misleading or oversimplifying?
Journal ranking tiers
To help universities and policymakers make sense of our evaluations, we want to benchmark them against how research is currently judged. So, we would like you to assess the paper in terms of journal rankings. We ask for two assessments:
a normative judgment about 'how well the research should publish';
a prediction about where the research will be published.
Journal ranking tiers are on a 0-5 scale, as follows:
0/5: "/little to no value". Unlikely to be cited by credible researchers
1/5: OK/Somewhat valuable journal
2/5: Marginal B-journal/Decent field journal
3/5: Top B-journal/Strong field journal
4/5: Marginal A-Journal/Top field journal
5/5: A-journal/Top journal
We give some example journal rankings here, based on SJR and ABS ratings.
We encourage you to , e.g. 4.6 or 2.2.
As before, we ask for a 90% credible interval.
PubPub note: as of 14 March 2024, the PubPub form is not allowing you to give non-integer responses. Until this is fixed, . (Or use the Coda form.)
What journal ranking tier should this work be published in?
Journal ranking tier (0.0-5.0)
Assess this paper on the journal ranking scale described above, considering only its merit, giving some weight to the category metrics we discussed above.
Equivalently, if:
the journal process was fair, unbiased, and free of noise, and that status, social connections, and lobbying to get the paper published didn’t matter;
journals assessed research according to the category metrics we discussed above.
What journal ranking tier will this work be published in?
Journal ranking tier (0.0-5.0)
The midpoint and 'credible intervals': expressing uncertainty
What are we looking for and why?
We want policymakers, researchers, funders, and managers to be able to use The Unjournal's evaluations to update their beliefs and make better decisions. To do this well, they need to weigh multiple evaluations against each other and other sources of information. Evaluators may feel confident about their rating for one category, but less confident in another area. How much weight should readers give to each? In this context, it is useful to quantify the uncertainty.
But it's hard to quantify statements like "very certain" or "somewhat uncertain" – different people may use the same phrases to mean different things. That's why we're asking for you a more precise measure, your credible intervals. These metrics are particularly useful for meta-science and meta-analysis.
You are asked to give a 'midpoint' and a 90% credible interval. Consider this as that you believe is 90% likely to contain the true value. See the fold below for further guidance.
Claim identification, assessment, and implications
We are now asking evaluators for “claim identification and assessment” where relevant. This is meant to help practitioners use this research to inform their funding, policymaking, and other decisions. It is not intended as a metric to judge the research quality per se. This is not required but we will reward this work.
See guidelines and examples here.
Survey questions
Lastly, we ask evaluators about their background, and for feedback about the process.
Other guidelines and notes
Length/time spent: This is up to you. We welcome detail, elaboration, and technical discussion.
If you still have questions, please contact us, or see our FAQ on Evaluation ('refereeing').
Our data protection statement is linked here.
Last updated