Why these guidelines/metrics?
Last updated
Was this helpful?
Last updated
Was this helpful?
Ultimately, we're trying to replace the question of "what tier of journal did a paper get into?" with "how highly was the paper rated?" We believe this is a more valuable metric. It can be more fine-grained. It should be less prone to gaming. It aims to reduce randomness in the process, through things like 'the availability of journal space in a particular field'. See our discussion of .
To get to this point, we need to have academia and stakeholders see our evaluations as meaningful. We want the evaluations to begin to have some value that is measurable in the way “publication in the AER” is seen to have value.
While there are some ongoing efforts towards journal-independent evaluation, these . Typically, they either have simple tick-boxes (like "this paper used correct statistical methods: yes/no") or they enable descriptive evaluation without an overall rating. As we are not a journal, and we don’t accept or reject research, we need another way of assigning value. We are working to determine the best way of doing this through quantitative ratings. We hope to be able to benchmark our evaluations to "traditional" publication outcomes. Thus, we think it is important to ask for both an overall quality rating and a journal ranking tier prediction.
In addition to the overall assessment, we think it will be valuable to have the papers rated according to several categories. This could be particularly helpful to practitioners who may care about some concerns more than others. It also can be useful to future researchers who might want to focus on reading papers with particular strengths. It could be useful in meta-analyses, as certain characteristics of papers could be weighed more heavily. We think the use of categories might also be useful to authors and evaluators themselves. It can help them get a sense of what we think research priorities should be, and thus help them consider an overall rating.
However, these ideas have been largely ad-hoc and based on the impressions of our management team (a particular set of mainly economists and psychologists). The process is still being developed. Any feedback you have is welcome. For example, are we overemphasizing certain aspects? Are we excluding some important categories?
We are also researching other frameworks, templates, and past practice; we hope to draw from validated, theoretically grounded projects such as .
In eliciting expert judgment, it is helpful to differentiate the level of confidence in predictions and recommendations. We want to know not only what you believe, but how strongly held your beliefs are. If you are less certain in one area, we should weigh the information you provide less heavily in updating our beliefs. This may also be particularly useful for practitioners. Obviously, there are challenges to any approach. Even experts in a quantitative field may struggle to convey their own uncertainty. They may also be inherently "poorly calibrated" (see discussions and tools for ). Some people may often be "confidently wrong." They might state very narrow "credible intervals", when the truth—where measurable—routinely falls outside these boundaries. People with greater discrimination may sometimes be underconfident. One would want to consider and As a side benefit, this may be interesting for research , particularly as The Unjournal grows. We see 'quantifying one's own uncertainty' as a good exercise for academics (and everyone) to engage in.
39, 52
5
47, 54
5
45, 55
4
10, 35
3
40, 70
2
30,46
0**
21,65
We had included the note:
We give the previous weighting scheme in a fold below for reference, particularly for those reading evaluations done before October 2023.
As well as:
Suggested weighting: 0.
Elsewhere in that page we had noted:
As noted above, we give suggested weights (0–5) to suggest the importance of each category rating to your overall assessment, given The Unjournal's priorities.
39, 52
47, 54
45, 55
10, 35
40, 70
30,46
21,65
[FROM PREVIOUS GUIDELINES:]
You may feel comfortable giving your "90% confidence interval," or you may prefer to give a "descriptive rating" of your confidence (from "extremely confident" to "not confident").
[Previous...] Remember, we would like you to give a 90% CI or a confidence rating (1–5 dots), but not both.
And, for the 'journal tier' scale:
[Previous guidelines]: The description folded below focuses on the "Overall Assessment." Please try to use a similar scale when evaluating the category metrics.
We have removed suggested weightings for each of these categories. We discuss the rationale at some length .
Evaluators working before October 2023 saw a previous version of the table, which you can see .
The previous guidelines ; these may be useful in considering evaluations provided pre-2024.
(holistic, most important!)
The weightings were presented once again along with each description in the section .
(holistic, most important!)
Quantify how certain you are about this rating, either giving a 90% / interval or using our . (
This page explains the value of the metrics we are seeking from evaluators.
The from Clearer Thinking is fairly helpful and fun for practicing and checking how good you are at expressing your uncertainty. It requires creating account, but that doesn't take long. The 'Confidence Intervals' training seems particularly relevant for our purposes.