Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Thanks for your interest in evaluating research for The Unjournal!
The Unjournal is a nonprofit organization started in mid-2022. We commission experts to publicly evaluate and rate research. Read more about us here.
Write an evaluation of a specific research : essentially a standard, high-quality referee report.
research by filling in a structured form.
Answer a short questionnaire about your background and our processes.
See Guidelines for Evaluators for further details and guidance.
Why use your valuable time writing an Unjournal evaluation? There are several reasons: helping high-impact research users, supporting open science and open access, and getting recognition and financial compensation.
The Unjournal's goal is to make impactful research more rigorous, and rigorous research more impactful, while supporting open access and open science. We encourage better research by making it easier for researchers to get feedback and credible ratings. We evaluate research in high-impact areas that make a difference to global welfare. Your evaluation will:
Help authors improve their research, by giving early, high-quality feedback.
Help improve science by providing open-access, prompt, structured, public evaluations of impactful research.
Inform funding bodies and meta-scientists as we build a database of research quality, strengths and weaknesses in different dimensions. Help research users learn what research to trust, when, and how.
For more on our scientific mission, see here.
Your evaluation will be made public and given a DOI. You have the option to be identified as the author of this evaluation or to remain anonymous, as you prefer.
for providing a and complete evaluation and feedback ($100-$300 base + $100 'promptness bonus') in line with our expected standards.
Note, Aug. 2024: we're adjusting the base compensation to reward strong work and experience.
$100 + $100 for first-time evaluators
$300 + $100 for return Unjournal evaluators and those with previous strong public review experience. We will be integrating other incentives and prizes into this, and are committed to $450 in average compensation per evaluation, including prizes.
You will also be eligible for monetary prizes for "most useful and informative evaluation," plus other bonuses. We currently (Feb. 2024) set aside an additional $150 per evaluation for incentives, bonuses, and prizes.
See also "submitting claims and expenses"
If you have been invited to be an evaluator and want to proceed, simply respond to the email invitation that we have sent you. You will then be sent a link to our evaluation form.
To sign up for our evaluator pool, see 'how to get involved'
To learn more about our evaluation process, seeGuidelines for evaluators. If you are doing an evaluation, we highly recommend you read these guidelines carefully
See sections below
For prospective evaluators: An overview of what we are asking; payment and recognition details
Guidelines for evaluators: The Unjournal's evaluation guidelines, considering our priorities and criteria, the metrics we ask for, and how these are considered.
Other sections and subsections provide further resources, consider future initiatives, and discuss our rationales.
We are considering asking evaluators, with compensation, to assist and engage in the process of "robustness replication." This may lead to some interesting follow-on possibilities as we build our potential collaboration with the Institute for Replication and others in this space.
We might ask evaluators discussion questions like these:
What is the most important, interesting, or relevant substantive claim made by the authors, (particularly considering global priorities and potential interventions and responses)?
What statistical test or evidence does this claim depend on, according to the authors?
How confident are you in the substantive claim made?
"Robustness checks": What specific statistical test(s) or piece(s) of evidence would make you substantially more confident in the substantive claim made?
If a robustness replication "passed" these checks, how confident would you be then in the substantive claim? (You can also express this as a continuous function of some statistic rather than as a binary; please explain your approach.)
Background:
The Institute for Replication is planning to hire experts to do "robustness-replications" of work published in a top journal in economics and political science. Code- and data sharing is now being enforced in many or all of these journals and other important outlets. We want to support their efforts and are exploring collaboration possibilities. We are also considering how to best guide potential future robustness replication work.
We're happy for you to use whichever process and structure you feel comfortable with when writing your evaluation content.
Remember: The Unjournal doesn’t “publish” and doesn’t “accept or reject.” So don’t give an Accept, Revise-and-Resubmit', or Reject-type recommendation. We ask for quantitative metrics, written feedback, and expert discussion of the validity of the paper's main claims, methods, and assumptions.
Semi-relevant: Econometric Society: Guidelines for referees
Report: Improving Peer Review in Economics: Stocktaking and Proposal (Charness et al 2022)
Open Science
PLOS (Conventional but open access; simple and brief)
Peer Community In... Questionnaire (Open-science-aligned; perhaps less detail-oriented than we are aiming for)
Open Reviewers Reviewer Guide (Journal-independent “PREreview”; detailed; targets ECRs)
General, other fields
The Wiley Online Library (Conventional; general)
"Peer review in the life sciences (Fraser)" (extensive resources; only some of this is applicable to economics and social science)
Collaborative template: RRR assessment peer review
Introducing Structured PREreviews on PREreview.org
‘the 4 validities’ and seaboat
31 Aug 2023: Our present approach is a "working solution" involving some ad-hoc and intuitive choices. We are re-evaluating the metrics we are asking for as well as the interface and framing. We are gathering some discussion in this linked Gdoc, incorporating feedback from our pilot evaluators and authors. We're also talking to people with expertise as well as considering past practice and other ongoing initiatives. We plan to consolidate that discussion and our consensus and/or conclusions into the present (Gitbook) site.
Ultimately, we're trying to replace the question of "what tier of journal did a paper get into?" with "how highly was the paper rated?" We believe this is a more valuable metric. It can be more fine-grained. It should be less prone to gaming. It aims to reduce randomness in the process, through things like 'the availability of journal space in a particular field'. See our discussion of Reshaping academic evaluation: beyond the binary... .
To get to this point, we need to have academia and stakeholders see our evaluations as meaningful. We want the evaluations to begin to have some value that is measurable in the way “publication in the AER” is seen to have value.
While there are some ongoing efforts towards journal-independent evaluation, these . Typically, they either have simple tick-boxes (like "this paper used correct statistical methods: yes/no") or they enable descriptive evaluation without an overall rating. As we are not a journal, and we don’t accept or reject research, we need another way of assigning value. We are working to determine the best way of doing this through quantitative ratings. We hope to be able to benchmark our evaluations to "traditional" publication outcomes. Thus, we think it is important to ask for both an overall quality rating and a journal ranking tier prediction.
In addition to the overall assessment, we think it will be valuable to have the papers rated according to several categories. This could be particularly helpful to practitioners who may care about some concerns more than others. It also can be useful to future researchers who might want to focus on reading papers with particular strengths. It could be useful in meta-analyses, as certain characteristics of papers could be weighed more heavily. We think the use of categories might also be useful to authors and evaluators themselves. It can help them get a sense of what we think research priorities should be, and thus help them consider an overall rating.
However, these ideas have been largely ad-hoc and based on the impressions of our management team (a particular set of mainly economists and psychologists). The process is still being developed. Any feedback you have is welcome. For example, are we overemphasizing certain aspects? Are we excluding some important categories?
We are also researching other frameworks, templates, and past practice; we hope to draw from validated, theoretically grounded projects such as RepliCATS.
In eliciting expert judgment, it is helpful to differentiate the level of confidence in predictions and recommendations. We want to know not only what you believe, but how strongly held your beliefs are. If you are less certain in one area, we should weigh the information you provide less heavily in updating our beliefs. This may also be particularly useful for practitioners. Obviously, there are challenges to any approach. Even experts in a quantitative field may struggle to convey their own uncertainty. They may also be inherently "poorly calibrated" (see discussions and tools for calibration training). Some people may often be "confidently wrong." They might state very narrow "credible intervals", when the truth—where measurable—routinely falls outside these boundaries. People with greater discrimination may sometimes be underconfident. One would want to consider and As a side benefit, this may be interesting for research , particularly as The Unjournal grows. We see 'quantifying one's own uncertainty' as a good exercise for academics (and everyone) to engage in.
We had included the note:
We give the previous weighting scheme in a fold below for reference, particularly for those reading evaluations done before October 2023.
As well as:
Suggested weighting: 0.
Elsewhere in that page we had noted:
As noted above, we give suggested weights (0–5) to suggest the importance of each category rating to your overall assessment, given The Unjournal's priorities.
The weightings were presented once again along with each description in the section "Category explanations: what you are rating".
[FROM PREVIOUS GUIDELINES:]
You may feel comfortable giving your "90% confidence interval," or you may prefer to give a "descriptive rating" of your confidence (from "extremely confident" to "not confident").
Quantify how certain you are about this rating, either giving a 90% confidence/credibility interval or using our scale described below. (
[Previous...] Remember, we would like you to give a 90% CI or a confidence rating (1–5 dots), but not both.
And, for the 'journal tier' scale:
[Previous guidelines]: The description folded below focuses on the "Overall Assessment." Please try to use a similar scale when evaluating the category metrics.
#more-reliable-precise-and-useful-metrics This page explains the value of the metrics we are seeking from evaluators.
Unjournal Evaluator Guidelines and Metrics - Discussion space
This page describes The Unjournal's evaluation guidelines, considering our priorities and criteria, the metrics we ask for, and how these are considered.
These guidelines apply to the evaluation forms in Coda and ().
Please see for an overview of the evaluation process, as well as details on compensation, public recognition, and more.
Write an evaluation of the target , similar to a standard, high-quality referee report. Please identify the paper's main claims and carefully assess their validity, leveraging your own background and expertise.
.
Answer a short questionnaire about your background and our processes.
In writing your evaluation and providing ratings, please consider the following.
In many ways, the written part of the evaluation should be similar to a report an academic would write for a traditional high-prestige journal (e.g., see some 'conventional guidelines' ). Most fundamentally, we want you to use your expertise to critically assess the main claims made by the authors. Are the claims well-supported? Are the assumptions believable? Are the methods are appropriate and well-executed? Explain why or why not.
However, we'd also like you to pay some consideration to our priorities, including
Advancing our knowledge and supporting practitioners
Justification, reasonableness, validity, and robustness of methods
Logic and communication, intellectual modesty, transparent reasoning
Open, communicative, replicable science
If you have questions about the authors’ work, you can ask them anonymously: we will facilitate this.
We want you to evaluate the most recent/relevant version of the paper/project that you can access. If you see a more recent version than the one we shared with you, please let us know.
We designed this process to balance three considerations with three target audiences. Please consider each of these:
Crafting evaluations and ratings that help researchers and policymakers judge when and how to rely on this research. For Research Users.
Ensuring these evaluations of the papers are comparable to current journal tier metrics, to enable them to be used to determine career advancement and research funding. For Departments, Research Managers, and Funders.
Providing constructive feedback to Authors.
For some questions, we ask for a percentile ranking from 0-100%. This represents "what proportion of papers in the reference group are worse than this paper, by this criterion". A score of 100% means this is essentially the best paper in the reference group. 0% is the worst paper. A score of 50% means this is the median paper; i.e., half of all papers in the reference group do this better, and half do this worse, and so on.
Here* the population of papers should be all serious research in the same area that you have encountered in the last three years.
For each metric, we ask you to provide a 'midpoint rating' and a 90% credible interval as a measure of your uncertainty. Our interface provides slider bars to express your chosen intervals:
The table below summarizes the percentile rankings.
Percentile ranking (0-100%)
Do the authors do a good job of (i) stating their main questions and claims, (ii) providing strong evidence and powerful approaches to inform these, and (iii) correctly characterizing the nature of their evidence?
Percentile ranking (0-100%)
Percentile ranking (0-100%)
(Applied stream: please focus on ‘improvements that are actually helpful’.)
Do the paper's insights inform our beliefs about important parameters and about the effectiveness of interventions?
Does the project add useful value to other impactful research?
Percentile ranking (0-100%)
Are the goals and questions of the paper clearly expressed? Are concepts clearly defined and referenced?
Are the conclusions consistent with the evidence (or formal proofs) presented? Do the authors accurately state the nature of their evidence, and the extent it supports their main claims?
Are the data and/or analysis presented relevant to the arguments made? Are the tables, graphs, and diagrams easy to understand in the context of the narrative (e.g., no major errors in labeling)?
Percentile ranking (0-100%)
This covers several considerations:
Would another researcher be able to perform the same analysis and get the same results? Are the methods explained clearly and in enough detail to enable easy and credible replication? For example, are all analyses and statistical tests explained, and is code provided?
Is the source of the data clear?
Is the data made as available as is reasonably possible? If so, is it clearly labeled and explained??
Consistency
Do the numbers in the paper and/or code output make sense? Are they internally consistent throughout the paper?
Useful building blocks
Do the authors provide tools, resources, data, and outputs that might enable or enhance future work and meta-analysis?
Does the paper consider real-world relevance and deal with policy and implementation questions? Are the setup, assumptions, and focus realistic?
Do the authors report results that are relevant to practitioners? Do they provide useful quantified estimates (costs, benefits, etc.) enabling practical impact quantification and prioritization?
Do they communicate (at least in the abstract or introduction) in ways policymakers and decision-makers can understand, without misleading or oversimplifying?
To help universities and policymakers make sense of our evaluations, we want to benchmark them against how research is currently judged. So, we would like you to assess the paper in terms of journal rankings. We ask for two assessments:
a normative judgment about 'how well the research should publish';
a prediction about where the research will be published.
Journal ranking tiers are on a 0-5 scale, as follows:
1/5: OK/Somewhat valuable journal
2/5: Marginal B-journal/Decent field journal
3/5: Top B-journal/Strong field journal
4/5: Marginal A-Journal/Top field journal
5/5: A-journal/Top journal
As before, we ask for a 90% credible interval.
Journal ranking tier (0.0-5.0)
Assess this paper on the journal ranking scale described above, considering only its merit, giving some weight to the category metrics we discussed above.
the journal process was fair, unbiased, and free of noise, and that status, social connections, and lobbying to get the paper published didn’t matter;
journals assessed research according to the category metrics we discussed above.
Journal ranking tier (0.0-5.0)
We want policymakers, researchers, funders, and managers to be able to use The Unjournal's evaluations to update their beliefs and make better decisions. To do this well, they need to weigh multiple evaluations against each other and other sources of information. Evaluators may feel confident about their rating for one category, but less confident in another area. How much weight should readers give to each? In this context, it is useful to quantify the uncertainty.
But it's hard to quantify statements like "very certain" or "somewhat uncertain" – different people may use the same phrases to mean different things. That's why we're asking for you a more precise measure, your credible intervals. These metrics are particularly useful for meta-science and meta-analysis.
We are now asking evaluators for “claim identification and assessment” where relevant. This is meant to help practitioners use this research to inform their funding, policymaking, and other decisions. It is not intended as a metric to judge the research quality per se. This is not required but we will reward this work.
Lastly, we ask evaluators about their background, and for feedback about the process.
Length/time spent: This is up to you. We welcome detail, elaboration, and technical discussion.
It's a norm in academia that people do reviewing work for free. So why is The Unjournal paying evaluators?
From a recent
We estimate that the average (median) respondent spends 12 (9) working days per year on refereeing. The top 10% of the distribution dedicates 25 working days or more, which is quite substantial considering refereeing is usually unpaid.
The peer-review process in economics is widely-argued to be too slow and lengthy. But there is evidence that payments may help improve this.
, they note that few economics journals currently pay reviewers (and these payments tend to be small (e.g., JPE and AER paid $100 at the time). However, they also note, citing several papers:
The existing evidence summarized in Table 5 suggests that offering financial incentives could be an effective way of reducing turnaround time.
notes that the work of reviewing is not distributed equally. To the extent that accepting to do a report is based on individual goodwill, the unpaid volunteer model could be seen to unfairly penalize more generous and sympathetic academics. Writing a certain number of referee reports per year is generally considered part of "academic service". Academics put this on their CVs, and it may lead to being on the board of a journal which is valued to an extent. However, this is much less attractive for researchers who are not tenured university professors. Paying for this work would do a better job of including them in the process.
'Payment for good evaluation work' may also lead to fair and more useful evaluations.
In the current system academics may take on this work in large part to try to impress journal editors and get favorable treatment from them when they submit their own work. They may also write reviews in particular ways to impress these editors.
For less high-prestige journals, to get reviewers, editors often need to lean on their personal networks, including those they have power relationships with.
Reviewers are also known to strategically try to get authors to cite and praise the reviewer's own work. They maybe especially critical to authors they see as rivals.
To the extent that reviewers are doing this as a service they are being paid for, these other motivations will be comparatively somewhat less important. The incentives will be more in line with doing evaluations that are seen as valuable by the managers of the process, in order to get chosen for further paid work. (And, if evaluations are public, the managers can consider the public feedback on these reports as well.)
We are not ‘just another journal.’ We need to give incentives for people to put effort into a new system and help us break out of the old inferior equilibrium.
In some senses, we are asking for more than a typical journal. In particular, our evaluations will be made public and thus need to be better communicated.
We cannot rely on 'reviewers taking on work to get better treatment from editors in the future.' This does not apply to our model, as we don't have editors make any sort of ‘final accept/reject decision’
Our ‘paying evaluators’ brings in a wider set of evaluators, including non-academics. This is particularly relevant to our impact-focused goals.
The Unjournal Evaluators have the option of remaining anonymous (see ). Where evaluators choose this, we will carefully protect this anonymity, aiming at a high standard of protection, as good as or better than traditional journals. We will give evaluators the option to take extra steps to safeguard this further. We are offering anonymity in perpetuity to those who request it. (As well as anonymity on other terms to those who request it, on explicitly mutually agreed upon terms.)
If they choose to stay anonymous, there should be no way for authors to be able to ‘guess’ who has reviewed their work.
We will take steps to keep private any information that could connect the identity of an anonymous evaluator and their evaluation/the work they are evaluating.
We will take extra steps to make the possibility of accidental disclosure extremely small (this is never impossible of course, even in the case of conventional journal reviews). In particular, we will use pseudonyms or ID codes for these evaluators in any discussion or database that is shared among our management team that connects individual evaluators to research work.
If we ever share a list of Unjournal’s evaluators this will not include anyone who wished to remain anonymous (unless they explicitly ask us to be on such a list).
We will do our best to warn anonymous evaluators of ways that they might inadvertently be identifying themselves in the evaluation content they provide.
We will provide platforms to enable anonymous and secure discussion between anonymous evaluators and others (authors, editors, etc.) Where an anonymous evaluator is involved, we will encourage these platforms to be used as much as possible. In particular, see .
Aside: In future, we may consider , and these tools will also be
Category (importance) | Sugg. Wgt.* | Rating (0-100) | 90% CI | Confidence (alternative to CI) |
---|---|---|---|---|
Category (importance) | Rating (0-100) | 90% CI | Confidence (alternative to CI) |
---|---|---|---|
The Calibrate Your Judgment app from Clearer Thinking is fairly helpful and fun for practicing and checking how good you are at expressing your uncertainty. It requires creating account, but that doesn't take long. The 'Confidence Intervals' training seems particularly relevant for our purposes.
See our for more details on each of these. Please don't structure your review according to these metrics, just pay some attention to them.
We discuss this, and how it relates to our impact and "theory of change", .
We ask for a set of nine quantitative metrics. For each metric, we ask for a score and a 90% credible interval. We describe these in detail below. (We explain .)
for more guidance on uncertainty, credible intervals, and the midpoint rating as the 'median of your belief distribution'.
Quantitative metric | Scale |
---|
Judge the quality of the research heuristically. Consider all aspects of quality, credibility, importance to future impactful applied research, and practical relevance and usefulness.
Are the used well-justified and explained; are they a reasonable approach to answering the question(s) in this context? Are the underlying assumptions reasonable?
Are the results and methods likely to be robust to reasonable changes in the underlying assumptions?
Avoiding bias and (QRP): Did the authors take steps to reduce bias from opportunistic reporting ? For example, did they do a strong pre-registration and pre-analysis plan, incorporate multiple hypothesis testing corrections, and report flexible specifications?
To what extent does the project contribute to the field or to practice, particularly in ways that are to global priorities and impactful interventions?
Is the "? Are assumptions made explicit? Are all logical steps clear and correct? Does the writing make the argument easy to follow?
Are the paper’s chosen topic and approach to
Are the assumptions and setup realistic and relevant to the real world?
Could the paper's topic and approach help inform
Most work in our will not be targeting academic journals. Still, in some cases it might make sense to make this comparison; e.g., if particular aspects of the work might be rewritten and submitted to academic journals, or if the work uses certain techniques that might be directly compared to academic work. If you believe a comparison makes sense, please consider giving an assessment below, making reference to our guidelines and how you are interpreting them in this case.
0/5: "/little to no value". Unlikely to be cited by credible researchers
We give some example journal rankings , based on SJR and ABS ratings.
We encourage you to , e.g. 4.6 or 2.2.
Journal ranking tiers | Scale | 90% CI |
---|
PubPub note: as of 14 March 2024, the PubPub form is not allowing you to give non-integer responses. Until this is fixed, . (Or use the Coda form.)
Equivalently, if:
You are asked to give a 'midpoint' and a 90% credible interval. Consider this as that you believe is 90% likely to contain the true value. See the fold below for further guidance.
You are also asked to give a 90% credible interval. Consider this as that you believe is 90% likely to contain the true value.
For more information on credible intervals, may be helpful.
If you are "", your 90% credible intervals should contain the true value 90% of the time.
If you are "", your 90% credible intervals should contain the true value 90% of the time. To understand this better, assess your ability, and then practice to get better at estimating your confidence in results. will help you get practice at calibrating your judgments. We suggest you choose the "Calibrate your Judgment" tool, and select the "confidence intervals" exercise, choosing 90% confidence. Even a 10 or 20 minute practice session can help, and it's pretty fun.
.
For the two questions below, we will unless you specifically ask these questions to be kept anonymous.
Answers to the questions
12 Feb 2024: We are moving to a hosted form/interface in PubPub. That form is still somewhat a work-in-progress, and may need some further guidance; we try to provide this below, but please contact us with any questions. , you can also submit your response in a Google Doc, and share it back with us. Click to make a new copy of that directly.
recommends a 2–3 page referee report; suggest this is relatively short, but confirm that brevity is desirable. , economists report spending (median and mean) about one day per report, with substantial shares reporting "half a day" and "two days." We expect that reviewers tend spend more time on papers for high-status journals, and when reviewing work that is closely tied to their own agenda.
We have made some adjustments to this page and to our guidelines and processes; this is particularly relevant for considering earlier evaluations. See .
If you still have questions, please contact us, or see our FAQ on .
Our data protection statement is linked .
#overall-assessment(holistic, most important!)
39, 52
5
47, 54
5
45, 55
4
10, 35
3
40, 70
2
30,46
0**
21,65
#overall-assessment(holistic, most important!)
39, 52
47, 54
45, 55
10, 35
40, 70
30,46
21,65
What journal ranking tier should this work be published in? | 0.0-5.0 | lower, upper |
What journal ranking tier will this work be published in? | 0.0-5.0 | lower, upper |
Overall assessment | 0 - 100% |
Claims, strength and characterization of evidence: | 0 - 100% |
Methods: Justification, reasonableness, validity, robustness | 0 - 100% |
Advancing knowledge and practice | 0 - 100% |
Logic and communication | 0 - 100% |
Open, collaborative, replicable science | 0 - 100% |
0 - 100% |