arrow-left

All pages
gitbookPowered by GitBook
1 of 21

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

What research to target?

(for pilot and beyond)

Our initial focus is quantitative work that informs global priorities (see linked discussion), especially in economics, policy, and social science. We want to see better research leading to better outcomes in the real world (see our 'Theory of Change').

circle-info

See (earlier) discussion in public call/EA forum discussion .

To reach these goals, we need to select "the right research" for evaluation. We want to choose papers and projects that are highly relevant, methodologically promising, and that will benefit substantially from our evaluation work. We need to optimize how we select research so that our efforts remain mission-focused and useful. We also want to make our process transparent and fair. To do this, we are building a coherent set of criteria and goals, and a specific approach to guide this process. We explore several dimensions of these criteria below.

circle-info

Management access only: General discussion of prioritization in Gdoc . Private discussion of specific papers in Airtable and links (e.g., ). We incorporate some of this discussion below.

hashtag
High-level considerations for prioritizing research

When considering a piece of research to decide whether to commission it to be evaluated, we can start by looking at its general relevance as well as the value of evaluating and rating it.

circle-info

Our prioritization of a paper for evaluation should not be seen as an assessment of its quality, nor of its 'vulnerability'. Furthermore, 'the prioritization is not the evaluation', it is less specific and less intensive.

  1. Why is it relevant and worth engaging with?

We consider (and prioritize) the importance of the research to global priorities; its relevance to crucial decisions; the attention it is getting, the influence it is having; its direct relevance to the real world; and the potential value of the research for advancing other impactful work. We de-prioritize work that has already been credibly (publicly) evaluated. We also consider the fit of the research with our scope (social science, etc.), and the likelihood that we can commission experts to meaningfully evaluate it. As noted , some 'instrumental goals' (, , driving change, ...) also play a role in our choices.

Some features we value, that might raise the probability we consider a paper or project include the commitment and contribution to open science, the authors' engagement with our process, and the logic, communication, and transparent reasoning of the work. However, if a prominent research paper is within our scope and seems to have a strong potential for impact, we will prioritize it highly, whether or not it has these qualities.

2. Why does it need (more) evaluation, and what are some key issues and claims to vet?

We ask the people who suggest particular research, and experts in the field:

  • What are (some of) the authors’ key/important claims that are worth evaluating?

  • What aspects of the evidence, argumentation, methods, and interpretation, are you unsure about?

  • What particular data, code, proofs, and arguments would you like to see vetted? If it has already been peer-reviewed in some way, why do you think more review is needed?

hashtag
Ultimate goals: what are we trying to optimize?

Put broadly, we need to consider how this research allows us to achieve our own goals in line with our , targeting "ultimate outcomes." The research we select and evaluate should meaningfully drive positive change. One way we might see this process: “better research & more informative evaluation” → “better decision-making” → “better outcomes” for humanity and for non-human animals (i.e., the survival and flourishing of life and human civilization and values).

hashtag
Prioritizing research to achieve these goals

As we weigh research to prioritize for evaluation, we need to balance directly having a positive impact against building our ability to have an impact in the future.

hashtag
A. Direct impact (‘score goals now’)

Below, we adapt the (popular in effective altruism circles) to assess the direct impact of our evaluations.

Importance

What is the direct impact potential of the research?

This is a massive question many have tried to address (see sketches and links below). We respond to uncertainty around this question in several ways, including:

  • Consulting a range of sources, not only EA-linked sources.

    • EA and more or less adjacent: and overviews, .

    • Non-EA, e.g., .

Neglectedness

Where is the current journal system failing GP-relevant work the most . . . in ways we can address?

Tractability

  1. “Evaluability” of research: Where does the UJ approach yield the most insight or value of information?

  2. Existing expertise: Where do we have field expertise on the UJ team? This will help us commission stronger evaluations.

  3. "Feedback loops": Could this research influence concrete intervention choices? Does it predict near-term outcomes? If so, observing these choices and outcomes and getting feedback on the research and our evaluation can yield strong benefits.

Consideration/discussion: How much should we include research with indirect impact potential (theoretical, methodological, etc.)?

hashtag
B. Sustainability: funding, support, participation

Moreover, we need to consider how the research evaluation might support the sustainability of The Unjournal and the broader general project of open evaluation. We may need to strike a balance between work informing the priorities of various audences, including:

  • Relevance to stakeholders and potential supporters

  • Clear connections to impact; measurability

  • Support from relevant academic communities

Consideration/discussion: What will drive further interest and funding?

hashtag
C. Credibility, visibility, driving positive institutional change

Finally, we consider how our choices will increase the visibility and solidify the credibility of The Unjournal and open evaluations. We consider how our work may help drive positive institutional change. We aim to:

  • Interest and involve academics—and build the status of the project.

  • Commission evaluations that will be visibly useful and credible.

  • ‘Benchmark traditional publication outcomes’, track our predictiveness and impact.

chevron-rightBut some of these concerns may have trade offshashtag

We are aware of possible pitfalls of some elements of our vision.

We are pursuing a second "high-impact policy and applied research" track for evaluation. This will consider work that is not targeted at academic audiences. This may have direct impact and please SFF funders, but, if not done carefully, this may distract us from changing academic systems, and may cost us status in academia.

A focus on topics perceived as niche (e.g., the economics and game theory of AI governance and AI safety) may bring a similar tradeoff.

On the other hand, perhaps a focus on behavioral and experimental economics would generate lots of academic interest and participants; this could help us benchmark our evaluations, etc.; but this may also be less directly impactful.

We hope we have identified the important considerations (above); but we may be missing key points. We continue to engage discussion and seek feedback, to hone and improve our processes and approaches.

hashtag
Data: what are we evaluating/considering?

We present and analyze the specifics surrounding our current evaluation data in

circle-info

Below: An earlier template for considering and discussing the relevance of research. This was/is provided both for our own consideration and for sharing (in part?) with evaluators, to give them some guidance. Think of these as bespoke evaluation notes for a

chevron-rightProposed templatehashtag

hashtag
Title

Evaluation

See sections below

: An overview of what we are asking; payment and recognition details

: The Unjournal's evaluation guidelines, considering our priorities and criteria, the metrics we ask for, and how these are considered.

Other sections and subsections provide further resources, consider future initiatives, and discuss our rationales.

Scoping what other sorts of work are representative inputs to GP-relevant work.

  • Get a selection of seminal GP publications; look back to see what they are citing and categorize by journal/field/keywords/etc.

Support from open science

Have strong leverage over research "outcomes and rewards."

  • Increase public visibility and raise public interest.

  • Bring in supporters and participants.

  • Achieve substantial output in a reasonable time frame and with reasonable expense.

  • Maintain goodwill and a justified reputation for being fair and impartial.

  • Giving managers autonomy and pushing forward quickly may bring the risk of perceived favoritism; a rule-based systematic approach to choosing papers to evaluate might be slower and less interesting for managers. However, it might be seen as fairer (and it might enable better measurement of our impact).

    "research overview, prioritization, and suggestions" document
    .
    One-click-link to paper
  • Link to any private hosted comments on the paper/project

  • hashtag
    Summary; why is this research relevant and worth engaging with?

    As mentioned under High level considerations, consider factors including importance to global priorities, relevance to the field, the commitment and contribution to open science, the authors’ engagement, and the transparency of data and reasoning. You may consider the ITN frameworkarrow-up-right explicitly, but not too rigidly.

    hashtag
    Why does it need (more) review, and what are some key issues and claims to vet?

    What are (some of) the authors’ main important claims that are worth carefully evaluating? What aspects of the evidence, argumentation, methods, interpretation, etc., are you unsure about? What particular data, code, proof, etc., would you like to see vetted? If it has already been peer-reviewed in some way, why do you think more review is needed?

    hashtag
    What sort of reviewers should be sought, and what should they be asked?

    What types of expertise and background would be most appropriate for the evaluation? Who would be interested? Please try to make specific suggestions.

    hashtag
    How well has the author engaged with the process?

    Do they need particular convincing? Do they need help making their engagement with The Unjournal successful?

    HEREarrow-up-right
    HEREarrow-up-right
    HEREarrow-up-right
    below
    sustainability
    building credibility
    Global Priorities Theory of Change flowchartarrow-up-right
    "ITN" cause prioritization frameworkarrow-up-right
    Agendasarrow-up-right
    Syllabiarrow-up-right
    https://globalchallenges.org/arrow-up-right
    this interactive notebook/dashboard here. arrow-up-right
    For prospective evaluators
    Guidelines for evaluators

    "Applied and Policy" Track: trial

    David Reinstein, 28 Mar 2024 I am proposing the following policies and approaches for our “Applied & Policy Stream”. We will move forward with these for now on a trial basis, but they may be adjusted. Please offer comments and ask questions in this Google doc, flagging the email 'contact@unjournal.org'

    hashtag
    Why have an “Applied & Policy Stream”?

    Much of the most impactful research is not aimed at academic audiences and may never be submitted to academic journals. It is written in formats that are very different from traditional academic outputs, and cannot be easily judged by academics using the same standards. Nonetheless, this work may use technical approaches developed in academia, making it important to gain expert feedback and evaluation.

    The Unjournal can help here. However, to avoid confusion, we want to make this clearly distinct from our main agenda, which aims at impactful academically-aimed research.

    This we are trialing an “Applied & Policy Stream” which will be clearly labeled as separate from our main stream. This may constitute roughly 10 or 15% of the work that we cover. Below, we refer to this as the “policy stream” for brevity.

    hashtag
    What should be included in the Policy stream?

    Our considerations for prioritizing this work are generally the same as for our academic stream – is it in the fields that we are focused on, using approaches that enable meaningful evaluation and rating? Is it already having impact (e.g., influencing grant funding in globally-important areas)? Does it have the potential for impact, and if so, is it high-quality enough that we should consider boosting its signal?

    We will particularly prioritize policy and applied work that uses technical methods that need evaluation by research experts, often academics.

    This could include the strongest work published on the EA Forum, as well as a range of further applied research from EA/GP/LT linked organizations such as GPI, Rethink Priorities, Open Philanthropy, FLI, HLI, Faunalytics, etc., as well as EA-adjacent organizations and relevant government white papers.

    hashtag
    How should our (evaluation etc.) policies differ here?

    Ratings/metrics: As in the academic stream, this work will be evaluated for its credibility, usefulness, communication/logic, etc. However, we are not seeking to have this work assessed by the standards of academia in a way that yields a comparison to traditional journal tiers. Evaluators: Please ignore these parts of our interface; if you are unsure if it is relevant feel free to ask.

    Evaluator selection, number, pay: Generally we want to continue to select academic research experts or non-academic researchers with strong academic and methodological background to do these evaluations. A key purpose of this policy stream is largely to bring research expertise, particularly from academia, to work that is not normally scrutinized by such experts.

    The compensation may be flexible as well; in some cases the work may be more involved than for the academic stream and in some cases less involved. As a starting point we will begin by offering the same compensation as for the academic stream.

    Careful flagging and signposting: To preserve the reputation of our academic-stream evaluations we need to make it clear, wherever people might see this work, that it is not being evaluated by the same standards as the academic stream and doesn't “count” towards those metrics.

    What specific areas do we cover?

    This discussion is a work-in-progress

    1. We are targeting global priorities-relevant research...

    2. With the potential for impact, and with the potential for Unjournal evaluations to have an impact (see our high-level considerations and our prioritization ratings discussions).

    3. Our initial focus is quantitative work that informs , especially in economics, policy, and social science, informing our .

    4. We give a data presentation of the work we have already covered and the work we are prioritizing , which will be continually updated.

    But what does this mean in practice? What specific research fields, topics, and approaches are we likely to classify as 'relevant to evaluate'?

    We give some lists and annotated examples below.

    hashtag
    Fields, methods, and approaches

    As of January 2024 The Unjournal focuses on ...

    1. Research where the fundamental question being investigated involves human behavior and beliefs and the consequences of these. This may involve markets, production processes, economic constraints, social interactions, technology, the 'market of ideas', individual psychology, government processes, and more. However, the main research question should not revolve around issues outside of human behavior, such as physical science, biology, or computer science and engineering. These areas are out of our scope (at least for now).

    2. Research that is fundamentally quantitative and uses

    This to generally involves the academic fields:

    • Economics

    • Applied Statistics (and some other applied math)

    • Psychology

    These discipline/field boundaries are not strict; they may adapt as we grow

    hashtag
    Why this field/method focus?

    These were chosen in light of two main factors:

    1. Our founder and our team is most comfortable assessing and managing the consideration of research in these areas.

    2. These fields seem to be particularly amenable to, and able to benefit from our journal-independent evaluation approach. Other fields, such as biology, are already being 'served' by strong initiatives like .

    hashtag
    Ex.: work we included/excluded based on field/method

    circle-info

    To do: We will give and explain some examples here

    hashtag
    Outcomes, focus areas, and causes

    The Unjournal's mission is to prioritize

    • research with the strongest potential for a positive impact on global welfare

    • where public evaluation of this research will have the greatest impact

    Given this broad goal, we consider research into any cause, topic, or outcome, as long as the research involves fields, methods, and approaches within our domain (see above), and as long as the work meets our other requirements (e.g., research must be publicly shared without a paywall).

    While we don't have rigid boundaries, we are nonetheless focusing on certain areas:

    hashtag
    Fields

    (As of Jan. 2024) we have mainly commissioned evaluations of work involving development economics and health-related outcomes and interventions in low-and middle-income countries.

    As well as research involving

    • Environmental economics, conservation, harm to human health

    • The social impact of AI and emerging technologies

    • Economics, welfare, and governance

    We are currently prioritizing further work involving

    • Psychology, behavioral science, and attitudes: the spread of misinformation; other-regarding preferences and behavior; moral circles

    • Animal welfare: markets, attitudes

    • Methodological work informing high-impact research (e.g., methods for impact evaluation)

    We are also considering prioritizing work involving

    • AI governance and safety

    • Quantitative political science (voting, lobbying, attitudes)

    • Political risks (including authoritarian governments and war and conflict)

    hashtag
    Examples of work we chose to prioritize or de-prioritize based on focus area

    circle-info

    To do: We will give and explain some examples here

    'Conditional embargos' & exceptions

    You can request a conditional embargo by emailing us at contact@unjournal.orgenvelope, or via the submission form. Please explain what sort of embargo you are asking for, and why. By default, we'd like Unjournal evaluations to be made public promptly. However, we may make exceptions in special circumstances, particularly for very early-career researchers.

    If there is an early-career researcher on the authorship team, we may allow authors to "embargo" the publication of the evaluation until a later date. Evaluators (referees) will be informed of this. This date can be contingent, but it should not be indefinite.

    For example, we might grant an embargo that lasts until after a PhD/postdoc’s upcoming job market or until after publication in a mainstream journal, with a hard maximum of 14 months. (Of course, embargoes can be ended early at the request of the authors.)

    In exceptional circumstances we may consider granting a "conditional indefinite embargo."

    hashtag
    Some examples of possible embargos (need approval)

    chevron-rightExtended time to revise and respondhashtag
    1. We will invite 2 or 3 relevant experts to evaluate and rate this work, letting them know about the following embargo

    2. When the evaluations come back, we will ask if you want to respond/revise. If you commit to responding (please let us know your plan within 1 week):

    chevron-rightRating-dependent embargo, allowing for revisionhashtag
    1. We will invite 2 or 3 relevant experts to evaluate and rate this work, letting them know about the following embargo

    2. When the evaluations come back..., we will ask if you want to respond.

    chevron-right'Job market embargo': Time, rating and outcome-dependent hashtag
    1. We will invite 2 or 3 relevant experts to evaluate and rate this work, letting them know about the following embargo

    2. When the evaluations come back. If all evaluators gave a 4.5 rating or higher as their middle rating on the "" rating (basically suggesting they think it's at the level meriting publication in a top-5+ journal) we will give you 3 weeks to respond before posting the package. (This is roughly our usual policy)

    hashtag

    Note: the above are all exceptions to our regular rules, examples of embargos we might or might not agree to.

    Suggesting research (forms, guidance)

    hashtag
    Paths to suggest research

    Research can be "submitted" by authors () or "suggested" by others. For a walk-through on suggesting research, see example.

    There are two main paths for making suggestions: or .

    For prospective evaluators

    Thanks for your interest in evaluating research for The Unjournal!

    hashtag
    Who we are

    The Unjournal is a nonprofit organization started in mid-2022. We commission experts to publicly evaluate and rate research. Read more about us .

    we will make it public that the evaluations are complete, and you have committed to revise and respond.

  • We will give you 8 weeks to revise the paper, to write a response note how you have revised,

  • We will give the evaluators additional time to adjust their evaluations and ratings in response to your revision/response

  • After this we will publish the evaluation package

  • If you do not commit to responding, we will post the evaluation package

  • If you are happy with the evaluations, we can post them at any time, by your request.

  • If all evaluators gave a 4.5 rating or higher as their middle rating on the "Journal rank tier, normativearrow-up-right" rating (basically suggesting they think it's at the level meriting publication in a top-5+ journal) we will give you 3 weeks to respond before posting the package. (This is roughly our usual policy)

  • Otherwise (if any rate below 4.5 but none rate it below 3.25) we will give you 8 weeks to revise the paper in response to this, to write a response noting how you have responded. We will give the evaluators further time to adjust their evaluations and ratings in turn, before posting the evaluation package.

  • If any evaluators rate the paper 'fairly negatively' (below 3.25) on this measure, we will grant a six month embargo from this point, before posting the package. During this time you will also have the opportunity to revise and respond, as in the previous case (case 2.2).

  • If you are happy with the evaluations, we can post them at any time, by your request.

  • Otherwise we will wait to post the evaluations until June 15, or until all PhD student or Post-doc authors have found a new job (as reported on social media, LinkedIn etc)

    1. During the intervening time, you have the opportunity to revise and respond, and if you do we give the evaluators time to update their evaluations and ratings in turn.

  • If you are happy with the evaluations, we can post them at any time, by your request.

  • Journal rank tier, normativearrow-up-right
    hashtag
    1. Through our survey formarrow-up-right

    Anyone can suggest research using the survey form at https://bit.ly/ujsuggestrarrow-up-right. (Note, if you want to "submit your own research," go to bit.ly/ujsubmitrarrow-up-right.) Please include the following steps:

    hashtag
    Review The Unjournal's Guidelines

    Begin by reviewing The Unjournal's guidelines on What research to targetarrow-up-right to get a sense of the research we cover and our priorities. Look for high-quality research that 1) falls within our focus areas and 2) would benefit from (further) evaluation.

    When in doubt, we encourage you to suggest the research anyway.

    hashtag
    Fill out the Suggestion Form

    Navigate to The Unjournal's Suggest Research Survey Formarrow-up-right. Most of the fields here are optional. The fields ask the following information:

    • Who you are: Let us know who is making the suggestion (you can also choose to stay anonymous).

      • If you leave your contact information, you will be eligible for financial "bounties" for strong suggestions.

      • If you are already a member of The Unjournal's team, additional fields will appear for you to link your suggestion to your profile in the Unjournal's database.

    • Research Label: Provide a short, descriptive label for the research you are suggesting. This helps The Unjournal quickly identify the topic at a glance.

    • Research Importance: Explain why the research is important, its potential impact, and any specific areas that require thorough evaluation.

    • Research Link: Include a direct URL to the research paper. The Unjournal prefers research that is publicly hosted, such as in a working paper archive or on a personal website.

    • Peer Review Status: Inform about the peer review status of the research, whether it's unpublished, published without clear peer review, or published in a peer-reviewed journal.

    • "Rate the relevance": This represents your best-guess at how relevant this work is for The Unjournal to evaluate, as a percentile relative to other work we are considering.

    • Research Classification: Choose categories that best describe the research. This helps The Unjournal sort and prioritize suggestions.

    • Field of Interest: Select the outcome or field of interest that the research addresses, such as global health in low-income countries.

    Complete all the required fields and submit your suggestion. The Unjournal team will review your submission and consider it for future evaluation. You can reach out to us at contact@unjournal.orgenvelope with any questions or concerns.

    hashtag
    2. For Field Specialists and managers: via Airtable

    People on our team may find it more useful to suggest research to The Unjournal directl via the Airtable. See this documentarrow-up-right for a guide to this. (Please request document permission to access this explanation.)

    hashtag
    Further guidance

    Aside on setting the prioritization ratings: In making your subjective prioritization rating, please consider “What percentile do you think this paper (or project) is relative to the others in our database, in terms of ‘relevance for The UJ to evaluate’?” (Note this is a redefinition; we previously considered these as probabilities.) We roughly plan to commission the evaluation of about 1 in 5 papers in the database, the ‘top 20%’ according to these percentiles. Please don’t consider the “publication status or the “author's propensity to engage” in this rating. We will consider those as separate criteria.

    hashtag
    Notes for field specialists/Unjournal Team

    Please don’t enter only the papers you think are ‘very relevant’; please enter in all research that you have spent any substantial time considering (more than a couple minutes). If we all do this, we should all aim for our percentile ratings to be approximately normally distributed; evenly spread over the 1-100% range.

    herearrow-up-right
    this videoarrow-up-right
    through our survey form
    through Airtable

    Proposed curating robustness replication

    We are considering asking evaluators, with compensation, to assist and engage in the process of "robustness replication." This may lead to some interesting follow-on possibilities as we build our potential collaboration with the Institute for Replicationarrow-up-right and others in this space.

    We might ask evaluators discussion questions like these:

    • What is the most important, interesting, or relevant substantive claim made by the authors, (particularly considering global priorities and potential interventions and responses)?

    • What statistical test or evidence does this claim depend on, according to the authors?

    • How confident are you in the substantive claim made?

    • "Robustness checks": What specific statistical test(s) or piece(s) of evidence would make you substantially more confident in the substantive claim made?

    • If a robustness replication "passed" these checks, how confident would you be then in the substantive claim? (You can also express this as a continuous function of some statistic rather than as a binary; please explain your approach.)

    Background:

    The Institute for Replication is planning to hire experts to do "robustness-replications" of work published in a top journal in economics and political science. Code- and data sharing is now being enforced in many or all of these journals and other important outlets. We want to support their efforts and are exploring collaboration possibilities. We are also considering how to best guide potential future robustness replication work.

    scientific methods
    . It will generally involve or consider measurable inputs, choices, and outcomes; specific categorical or quantitative questions; analytical and mathematical reasoning; hypothesis testing and/or belief updating, etc.
  • Research that targets and addresses a single specific question or goals, or a small cluster. It should not mainly be a broad discussion and overview of other research or conceptual issues.

  • Political Science
  • Other quantitative social science fields (perhaps Sociology)

  • Applied "business school" fields: finance, accounting, operations, etc.

  • Applied "policy and impact evaluation" fields

  • Life science/medicine where it targets human behavior/social science

  • Catastrophic risks; predicting and responding to these risks
  • The economics of innovation; scientific progress and meta-science

  • The economics of health, happiness, and wellbeing

  • Institutional decisionmaking and policymaking
  • Long-term growth and trends; the long-term future of civilization; forecasting

  • global priorities (see linked discussion)
    Theory of Change
    herearrow-up-right
    Peer Communities Inarrow-up-right
    hashtag
    What we are asking you to do
    1. Write an evaluation of a specific research paper or project: essentially a standard, high-quality referee report.

    2. Give quantitative ratings and predictions about the research by filling in a structured form.

    3. Answer a short questionnaire about your background and our processes.

    See Guidelines for Evaluators for further details and guidance.

    hashtag
    Why be an evaluator?

    Why use your valuable time writing an Unjournal evaluation? There are several reasons: helping high-impact research users, supporting open science and open access, and getting recognition and financial compensation.

    hashtag
    Helping research users, helping science

    The Unjournal's goal is to make impactful research more rigorous, and rigorous research more impactful, while supporting open access and open science. We encourage better research by making it easier for researchers to get feedback and credible ratings. We evaluate research in high-impact areas that make a difference to global welfare. Your evaluation will:

    1. Help authors improve their research, by giving early, high-quality feedback.

    2. Help improve science by providing open-access, prompt, structured, public evaluations of impactful research.

    3. Inform funding bodies and meta-scientists as we build a database of research quality, strengths and weaknesses in different dimensions. Help research users learn what research to trust, when, and how.

    For more on our scientific mission, see here.

    hashtag
    Public recognition

    Your evaluation will be made public and given a DOI. You have the option to be identified as the author of this evaluation or to remain anonymous, as you prefer.

    hashtag
    Financial compensation

    You will be given a $200-$400 honorarium for providing a prompt and complete evaluation and feedback ($100-$300 base + $100 'promptness bonus') in line with our expected standards.

    circle-info

    Note, Aug. 2024: we're adjusting the base compensation to reward strong work and experience. Minimum base compensation:

    • $100 + $100 for first-time evaluators

    • $300 + $100 for return Unjournal evaluators and those with previous strong public review experience. We will be integrating other incentives and prizes into this, and are committed to in average compensation per evaluation, including prizes.

    You will also be eligible for monetary prizes for "most useful and informative evaluation," plus other bonuses. We currently (Feb. 2024) set aside an additional $150 per evaluation for incentives, bonuses, and prizes.

    See also "submitting claims and expenses"

    chevron-rightAdditional rewards and incentiveshashtag

    We may occasionally offer additional payments for specifically requested evaluation tasks, or raise the base payments for particularly hard-to-source expertise.

    July 2023: The above is our current policy; we are working to build an effective, fair, transparent, and straightforward system of honorariums, incentives, and awards for evaluators.

    Feb. 204: Note that we currently set aside an additional $150 per evaluation (i.e., per evaluator) for evaluator incentives, bonuses, and prizes. This may be revised upwards or downwards in future (and this will be announced and noted).

    hashtag
    What do I do next?

    • If you have been invited to be an evaluator and want to proceed, simply respond to the email invitation that we have sent you. You will then be sent a link to our evaluation form.

    • To sign up for our evaluator pool, see 'how to get involved'

    To learn more about our evaluation process, seeGuidelines for evaluators. If you are doing an evaluation, we highly recommend you read these guidelines carefully

    chevron-rightNote on the evaluation platform (13 Feb 2024)hashtag

    12 Feb 2024: We are moving to a hosted form/interface in PubPub. That form is still somewhat a work-in-progress, and may need some further guidance; we try to provide this below, but please contact us with any questions. If you prefer, you can also submit your response in a Google Doc, and share it back with us. Click herearrow-up-right to make a new copy of that directly.

    here

    Why these guidelines/metrics?

    circle-info

    31 Aug 2023: Our present approach is a "working solution" involving some ad-hoc and intuitive choices. We are re-evaluating the metrics we are asking for as well as the interface and framing. We are gathering some discussion in this linked Gdocarrow-up-right, incorporating feedback from our pilot evaluators and authors. We're also talking to people with expertise as well as considering past practice and other ongoing initiatives. We plan to consolidate that discussion and our consensus and/or conclusions into the present (Gitbook) site.

    hashtag
    Why numerical ratings?

    Ultimately, we're trying to replace the question of "what tier of journal did a paper get into?" with "how highly was the paper rated?" We believe this is a more valuable metric. It can be more fine-grained. It should be less prone to gaming. It aims to reduce randomness in the process, through things like 'the availability of journal space in a particular field'. See our discussion of .

    To get to this point, we need to have academia and stakeholders see our evaluations as meaningful. We want the evaluations to begin to have some value that is measurable in the way “publication in the AER” is seen to have value.

    While there are some ongoing efforts towards journal-independent evaluation, these tend not use comparable metrics. Typically, they either have simple tick-boxes (like "this paper used correct statistical methods: yes/no") or they enable descriptive evaluation without an overall rating. As we are not a journal, and we don’t accept or reject research, we need another way of assigning value. We are working to determine the best way of doing this through quantitative ratings. We hope to be able to benchmark our evaluations to "traditional" publication outcomes. Thus, we think it is important to ask for both an overall quality rating and a journal ranking tier prediction.

    hashtag
    Why these categories?

    In addition to the overall assessment, we think it will be valuable to have the papers rated according to several categories. This could be particularly helpful to practitioners who may care about some concerns more than others. It also can be useful to future researchers who might want to focus on reading papers with particular strengths. It could be useful in meta-analyses, as certain characteristics of papers could be weighed more heavily. We think the use of categories might also be useful to authors and evaluators themselves. It can help them get a sense of what we think research priorities should be, and thus help them consider an overall rating.

    However, these ideas have been largely ad-hoc and based on the impressions of our management team (a particular set of mainly economists and psychologists). The process is still being developed. Any feedback you have is welcome. For example, are we overemphasizing certain aspects? Are we excluding some important categories?

    We are also researching other frameworks, templates, and past practice; we hope to draw from validated, theoretically grounded projects such as .

    hashtag
    Why ask for credible intervals?

    In eliciting expert judgment, it is helpful to differentiate the level of confidence in predictions and recommendations. We want to know not only what you believe, but how strongly held your beliefs are. If you are less certain in one area, we should weigh the information you provide less heavily in updating our beliefs. This may also be particularly useful for practitioners. Obviously, there are challenges to any approach. Even experts in a quantitative field may struggle to convey their own uncertainty. They may also be inherently "poorly calibrated" (see discussions and tools for ). Some people may often be "confidently wrong." They might state very narrow "credible intervals", when the truth—where measurable—routinely falls outside these boundaries. People with greater discrimination may sometimes be underconfident. One would want to consider and potentially correct for poor calibration. As a side benefit, this may be interesting for research in and of itself, particularly as The Unjournal grows. We see 'quantifying one's own uncertainty' as a good exercise for academics (and everyone) to engage in.

    hashtag
    "Weightings" for each rating category (removed for now)

    chevron-rightWeightings for each ratings category (removed for now)hashtag

    2 Oct 2023 -- We previously suggested 'weightings' for individual ratings, along with a note

    We give "suggested weights" as an indication of our priorities and a suggestion for how you might average these together into an overall assessment; but please use your own judgment.

    We included these weightings for several reasons:

    hashtag
    Adjustments to metrics and guidelines/previous presentations

    chevron-rightOct 2023 update - removed "weightings"hashtag

    We have removed suggested weightings for each of these categories. We discuss the rationale at some length .

    Evaluators working before October 2023 saw a previous version of the table, which you can see .

    chevron-rightDec. 2023: Hiding/de-emphasizing 'confidence Likerts'hashtag

    We previously gave evaluators two options for expressing their confidence in each rating:

    Either:

    1. The 90% Confidence/Credible Interval (CI) input you see below (now a 'slider' in PubPub V7) or

    hashtag
    Pre-October 2023 'ratings with weights' table, provided for reference (no longer in use)

    Category (importance)
    Sugg. Wgt.*
    Rating (0-100)
    90% CI
    Confidence (alternative to CI)

    We had included the note:

    We give the previous weighting scheme in a fold below for reference, particularly for those reading evaluations done before October 2023.

    As well as:

    Suggested weighting: 0. Why 0?

    Elsewhere in that page we had noted:

    As noted above, we give suggested weights (0–5) to suggest the importance of each category rating to your overall assessment, given The Unjournal's priorities. But you don't need, and may not want to use these weightings precisely.

    The weightings were presented once again along with each description in the section .

    hashtag
    Pre-2024 ratings and uncertainty elicitation, provided for reference (no longer in use)

    Category (importance)
    Rating (0-100)
    90% CI
    Confidence (alternative to CI)

    [FROM PREVIOUS GUIDELINES:]

    You may feel comfortable giving your "90% confidence interval," or you may prefer to give a "descriptive rating" of your confidence (from "extremely confident" to "not confident").

    Quantify how certain you are about this rating, either giving a 90% / interval or using our . (We prefer the 90% CI. Please don't give both.

    chevron-right[Previous guidelines] "1–5 dots": Explanation and relation to CIshashtag

    5 = Extremely confident, i.e., 90% confidence interval spans +/- 4 points or less

    4 = Very confident: 90% confidence interval +/- 8 points or less

    3 = Somewhat confident: 90% confidence interval +/- 15 points or less

    2 = Not very confident: 90% confidence interval, +/- 25 points or less

    [Previous...] Remember, we would like you to give a 90% CI or a confidence rating (1–5 dots), but not both.

    chevron-right[Previous guidelines] Example of confidence dots vs CIhashtag

    The example in the diagram above (click to zoom) illustrates the proposed correspondence.

    And, for the 'journal tier' scale:

    chevron-right[Previous guidelines]: Reprising the confidence intervals for this new metrichashtag

    From "five dots" to "one dot":

    5 = Extremely confident, i.e., 90% confidence interval spans +/– 4 points or less*

    4 = Very confident: 90% confidence interval +/– 8 points or less

    3 = Somewhat confident: 90% confidence interval +/– 15 points or less

    hashtag
    Previous 'descriptions of ratings intervals'

    [Previous guidelines]: The description folded below focuses on the "Overall Assessment." Please try to use a similar scale when evaluating the category metrics.

    chevron-rightTop ratings (90–100)hashtag

    95–100: Among the highest quality and most important work you have ever read.

    90–100: This work represents a major achievement, making substantial contributions to the field and practice. Such work would/should be weighed very heavily by tenure and promotion committees, and grantmakers.

    For example:

    chevron-rightNear-top (75–89) (*)hashtag

    This work represents a strong and substantial achievement. It is highly rigorous, relevant, and well-communicated, up to the standards of the strongest work in this area (say, the standards of the top 5% of committed researchers in this field). Such work would/should not be decisive in a tenure/promotion/grant decision alone, but it should make a very solid contribution to such a case.

    chevron-rightMiddle ratings (40–59, 60–74) (*)hashtag

    60–74.9: A very strong, solid, and relevant piece of work. It may have minor flaws or limitations, but overall it is very high-quality, meeting the standards of well-respected research professionals in this field.

    40–59.9: A useful contribution, with major strengths, but also some important flaws or limitations.

    chevron-rightLow ratings (5–19, 20–39) (*)hashtag

    20–39.9: Some interesting and useful points and some reasonable approaches, but only marginally so. Important flaws and limitations. Would need substantial refocus or changes of direction and/or methods in order to be a useful part of the research and policy discussion.

    5–19.9: Among the lowest quality papers; not making any substantial contribution and containing fatal flaws. The paper may fundamentally address an issue that is not defined or obviously not relevant, or the content may be substantially outside of the authors’ field of expertise.

    0–4: Illegible, fraudulent, or plagiarized. Please flag fraud, and notify us and the relevant authorities.

    chevron-right(*) 20 Mar 2023: We adjusted these ratings to avoid overlaphashtag

    The previous categories were 0–5, 5–20, 20–40, 40–60, 60–75, 75–90, and 90–100. Some evaluators found the overlap in this definition confusing.

    hashtag
    See also

    This page explains the value of the metrics we are seeking from evaluators.

    chevron-rightCalibration training toolshashtag

    The from Clearer Thinking is fairly helpful and fun for practicing and checking how good you are at expressing your uncertainty. It requires creating account, but that doesn't take long. The 'Confidence Intervals' training seems particularly relevant for our purposes.

    Guidelines for evaluators

    This page describes The Unjournal's evaluation guidelines, considering our priorities and criteria, the metrics we ask for, and how these are considered.

    circle-info

    30 July 2024: These guidelines below apply to the evaluation form currently hosted on PubPub. We're adjusting this form somewhat – the new form is temporarily hosted in Coda here (academic stream) arrow-up-rightand herearrow-up-right (applied streamarrow-up-right). If you prefer, you are welcome to use the Coda form instead (just let us know).

    If you have any doubts about which form to complete or about what we are looking for please ask the evaluation manager or email contact@unjournal.org.

    You can download a pdf version of these guidelines herearrow-up-right (updated March 2024).

    circle-info

    Please see for an overview of the evaluation process, as well as details on compensation, public recognition, and more.

    hashtag
    What we'd like you to do

    1. Write an evaluation of the target paper or project, similar to a standard, high-quality referee report. Please identify the paper's main claims and carefully assess their validity, leveraging your own background and expertise.

    hashtag
    Writing the evaluation (aka 'the review')

    In writing your evaluation and providing ratings, please consider the following.

    hashtag
    The Unjournal's expectations and criteria

    In many ways, the written part of the evaluation should be similar to a report an academic would write for a traditional high-prestige journal (e.g., see some 'conventional guidelines' ). Most fundamentally, we want you to use your expertise to critically assess the main claims made by the authors. Are the claims well-supported? Are the assumptions believable? Are the methods are appropriate and well-executed? Explain why or why not.

    However, we'd also like you to pay some consideration to our priorities:

    1. Advancing our knowledge and practice

    2. Justification, reasonableness, validity, and robustness of methods

    3. Logic and communication

    4. Open, communicative, replicable science

    See our for more details on each of these. Please don't structure your review according to these metrics, just pay some attention to them.

    chevron-rightSpecific requests for focus or feedbackhashtag

    Please pay attention to anything our managers and editors specifically asked you to focus on. We may ask you to focus on specific areas of expertise. We may also forward specific feedback requests from authors.

    chevron-rightThe evaluation will be made publichashtag

    Unless you were advised otherwise, this evaluation, including the review and quantitative metrics, will be given a DOI and, hopefully, will enter the public research conversation. Authors will be given two weeks to respond to reviews before the evaluations, ratings, and responses are made public. You can choose whether you want to be identified publicly as an author of the evaluation.

    If you have questions about the authors’ work, you can ask them anonymously: we will facilitate this.

    We want you to evaluate the most recent/relevant version of the paper/project that you can access. If you see a more recent version than the one we shared with you, please let us know.

    chevron-rightPublishing evaluations: considerations and exceptionshashtag

    We may give early-career researchers the right to veto the publication of very negative evaluations or to embargo the release of these for a defined period. We will inform you in advance if this will be the case for the work you are evaluating.

    You can reserve some "sensitive" content in your report to be shared with only The Unjournal management or only the authors, but we hope to keep this limited.

    hashtag
    Target audiences

    We designed this process to balance three considerations with three target audiences. Please consider each of these:

    1. Crafting evaluations and ratings that help researchers and policymakers judge when and how to rely on this research. For Research Users.

    2. Ensuring these evaluations of the papers are comparable to current journal tier metrics, to enable them to be used to determine career advancement and research funding. For Departments, Research Managers, and Funders.

    3. Providing constructive feedback to Authors

    We discuss this, and how it relates to our impact and "theory of change", .

    chevron-right"But isn't The Unjournal mainly just about feedback to authors"?hashtag

    We accept that in the near-term an Unjournal evaluation may not be seen to have substantial career value.

    Furthermore, work we are considering may tend be at an earlier stage. authors may submit work to us, thinking of this as a "pre-journal" step. The papers we select (e.g., from NBER) may also have been posted long before authors planned to submit them to journals.

    This may make the 'feedback for authors' and 'assessment for research users' aspects more important, relative to traditional journals' role. However, in the medium-term, a positive Unjournal evaluation should gain credibility and career value. This should make our evaluations an "endpoint" for a research paper.

    hashtag
    Quantitative metrics

    We ask for a set of nine quantitative metrics. For each metric, we ask for a score and a 90% credible interval. We describe these in detail below. (We explain .)

    hashtag
    Percentile rankings

    For some questions, we ask for a percentile ranking from 0-100%. This represents "what proportion of papers in the reference group are worse than this paper, by this criterion". A score of 100% means this is essentially the best paper in the reference group. 0% is the worst paper. A score of 50% means this is the median paper; i.e., half of all papers in the reference group do this better, and half do this worse, and so on.

    Here* the population of papers should be all serious research in the same area that you have encountered in the last three years.

    chevron-right*Unless this work is in our 'applied and policy stream', in which case...hashtag

    For the applied and policy stream the reference group should be "all applied and policy research you have read that is aiming at a similar audience, and that has similar goals".

    chevron-right"Serious" research? Academic research? hashtag

    Here, we are mainly considering research done by professional researchers with high levels of training, experience, and familiarity with recent practice, who have time and resources to devote months or years to each such research project or paper. These will typically be written as 'working papers' and presented at academic seminars before being submitted to standard academic journals. Although no credential is required, this typically includes people with PhD degrees (or upper-level PhD students). Most of this sort of research is done by full-time academics (professors, post-docs, academic staff, etc.) with a substantial research remit, as well as research staff at think tanks and research institutions (but there may be important exceptions).

    chevron-rightWhat counts as the "same area"?hashtag

    This is a judgment call. Here are some criteria to consider: first, does the work come from the same academic field and research subfield, and does it address questions that might be addressed using similar methods? Secondly, does it deal with the same substantive research question, or a closely related one? If the research you are evaluating is in a very niche topic, the comparison reference group should be expanded to consider work in other areas.

    chevron-right"Research that you have encountered"hashtag

    We are aiming for comparability across evaluators. If you suspect you are particularly exposed to higher-quality work in this category, compared to other likely evaluators, you may want to adjust your reference group downwards. (And of course vice-versa, if you suspect you are particularly exposed to lower-quality work.)

    hashtag

    hashtag
    Midpoint rating and credible intervals

    For each metric, we ask you to provide a 'midpoint rating' and a 90% credible interval as a measure of your uncertainty. Our interface provides slider bars to express your chosen intervals:

    circle-info

    for more guidance on uncertainty, credible intervals, and the midpoint rating as the 'median of your belief distribution'.

    The table below summarizes the percentile rankings.

    Quantitative metric
    Scale

    hashtag

    hashtag
    Overall assessment

    Percentile ranking (0-100%)

    Judge the quality of the research heuristically. Consider all aspects of quality, credibility, importance to knowledge production, and importance to practice.

    hashtag
    Claims, strength and characterization of evidence **

    Do the authors do a good job of (i) stating their main questions and claims, (ii) providing strong evidence and powerful approaches to inform these, and (iii) correctly characterizing the nature of their evidence?

    hashtag
    Methods: Justification, reasonableness, validity, robustness

    Percentile ranking (0-100%)

    Are the methods used well-justified and explained; are they a reasonable approach to answering the question(s) in this context? Are the underlying assumptions reasonable?

    Are the results and methods likely to be robust to reasonable changes in the underlying assumptions? Does the author demonstrate this?

    Avoiding bias and (QRP): Did the authors take steps to reduce bias from opportunistic reporting and QRP? For example, did they do a strong pre-registration and pre-analysis plan, incorporate multiple hypothesis testing corrections, and report flexible specifications?

    hashtag

    hashtag
    Advancing our knowledge and practice

    Percentile ranking (0-100%)

    To what extent does the project contribute to the field or to practice, particularly in ways that are relevant to global priorities and impactful interventions?

    (Applied stream: please focus on ‘improvements that are actually helpful’.)

    chevron-rightLess weight to "originality and cleverness’"hashtag

    Originality and cleverness should be weighted less than the typical journal, because The Unjournal focuses on impact. Papers that apply existing techniques and frameworks more rigorously than previous work or apply them to new areas in ways that provide practical insights for GP (global priorities) and interventions should be highly valued. More weight should be placed on 'contribution to GP' than on 'contribution to the academic field'.

    Do the paper's insights inform our beliefs about important parameters and about the effectiveness of interventions?

    Does the project add useful value to other impactful research?

    We don't require surprising results; sound and well-presented null results can also be valuable.

    hashtag
    Logic and communication

    Percentile ranking (0-100%)

    Are the goals and questions of the paper clearly expressed? Are concepts clearly defined and referenced?

    Is the reasoning "transparent"? Are assumptions made explicit? Are all logical steps clear and correct? Does the writing make the argument easy to follow?

    Are the conclusions consistent with the evidence (or formal proofs) presented? Do the authors accurately state the nature of their evidence, and the extent it supports their main claims?

    Are the data and/or analysis presented relevant to the arguments made? Are the tables, graphs, and diagrams easy to understand in the context of the narrative (e.g., no major errors in labeling)?

    hashtag
    Open, collaborative, replicable research

    Percentile ranking (0-100%)

    This covers several considerations:

    hashtag
    Replicability, reproducibility, data integrity

    Would another researcher be able to perform the same analysis and get the same results? Are the methods explained clearly and in enough detail to enable easy and credible replication? For example, are all analyses and statistical tests explained, and is code provided?

    Is the source of the data clear?

    Is the data made as available as is reasonably possible? If so, is it clearly labeled and explained??

    Consistency

    Do the numbers in the paper and/or code output make sense? Are they internally consistent throughout the paper?

    Useful building blocks

    Do the authors provide tools, resources, data, and outputs that might enable or enhance future work and meta-analysis?

    hashtag
    Relevance to global priorities, usefulness for practitioners**

    Are the paper’s chosen topic and approach likely to be useful to

    Does the paper consider real-world relevance and deal with policy and implementation questions? Are the setup, assumptions, and focus realistic?

    Do the authors report results that are relevant to practitioners? Do they provide useful quantified estimates (costs, benefits, etc.) enabling practical impact quantification and prioritization?

    Do they communicate (at least in the abstract or introduction) in ways policymakers and decision-makers can understand, without misleading or oversimplifying?

    chevron-rightEarlier category: "Real-world relevance"hashtag

    hashtag
    Real-world relevance

    Percentile ranking (0-100%)

    Are the assumptions and setup realistic and relevant to the real world?

    chevron-rightEarlier category: Relevance to global prioritieshashtag

    Percentile ranking (0-100%)

    Could the paper's topic and approach potentially help inform

    hashtag
    Journal ranking tiers

    chevron-rightNote: this is less relevant for work in our Applied Stream hashtag

    Most work in our will not be targeting academic journals. Still, in some cases it might make sense to make this comparison; e.g., if particular aspects of the work might be rewritten and submitted to academic journals, or if the work uses certain techniques that might be directly compared to academic work. If you believe a comparison makes sense, please consider giving an assessment below, making reference to our guidelines and how you are interpreting them in this case.

    To help universities and policymakers make sense of our evaluations, we want to benchmark them against how research is currently judged. So, we would like you to assess the paper in terms of journal rankings. We ask for two assessments:

    1. a normative judgment about 'how well the research should publish';

    2. a prediction about where the research will be published.

    Journal ranking tiers are on a 0-5 scale, as follows:

    • 0/5: "Won't publish/little to no value". Unlikely to be cited by credible researchers

    • 1/5: OK/Somewhat valuable journal

    • 2/5: Marginal B-journal/Decent field journal

    circle-info

    We give some example journal rankings , based on SJR and ABS ratings.

    We encourage you to consider a non-integer score, e.g. 4.6 or 2.2.

    As before, we ask for a 90% credible interval.

    Journal ranking tiers
    Scale
    90% CI
    circle-info

    PubPub note: as of 14 March 2024, the PubPub form is not allowing you to give non-integer responses. Until this is fixed, please multiply these by 10 and enter these using the 0-50 slider . (Or use the Coda form.)

    hashtag
    What journal ranking tier should this work be published in?

    Journal ranking tier (0.0-5.0)

    Assess this paper on the journal ranking scale described above, considering only its merit, giving some weight to the category metrics we discussed above.

    Equivalently, where would this paper be published if:

    1. the journal process was fair, unbiased, and free of noise, and that status, social connections, and lobbying to get the paper published didn’t matter;

    2. journals assessed research according to the category metrics we discussed above.

    hashtag
    What journal ranking tier will this work be published in?

    Journal ranking tier (0.0-5.0)

    chevron-rightWhat if this work has already been peer reviewed and published?hashtag

    If this work has already been published, and you know where, please report the prediction you would have given absent that knowledge.

    hashtag
    The midpoint and 'credible intervals': expressing uncertainty

    hashtag
    What are we looking for and why?

    We want policymakers, researchers, funders, and managers to be able to use The Unjournal's evaluations to update their beliefs and make better decisions. To do this well, they need to weigh multiple evaluations against each other and other sources of information. Evaluators may feel confident about their rating for one category, but less confident in another area. How much weight should readers give to each? In this context, it is useful to quantify the uncertainty.

    But it's hard to quantify statements like "very certain" or "somewhat uncertain" – different people may use the same phrases to mean different things. That's why we're asking for you a more precise measure, your credible intervals. These metrics are particularly useful for meta-science and meta-analysis.

    You are asked to give a 'midpoint' and a 90% credible interval. Consider this as the smallest interval that you believe is 90% likely to contain the true value. See the fold below for further guidance.

    chevron-rightHow do I come up with these intervals? (Discussion and guidance)hashtag

    You may understand the concepts of uncertainty and credible intervals, but you might be unfamiliar with applying them in a situation like this one.

    You may have a certain best guess for the "Methods..." criterion. Still, even an expert can never be certain. E.g., you may misunderstand some aspect of the paper, there may be a method you are not familiar with, etc.

    Your uncertainty over this could be described by some distribution, representing your beliefs about the true value of this criterion. Your "'best guess" should be the central mass point of this distribution.

    You are also asked to give a 90% credible interval. Consider this as

    chevron-rightConsider the midpoint as the 'median of your belief distribution'hashtag

    We also ask for the 'midpoint', the center dot on that slider. Essentially, we are asking for the median of your belief distribution. By this we mean the percentile ranking such that you believe "there's a 50% chance that the paper's true rank is higher than this, and a 50% chance that it actually ranks lower than this."

    chevron-rightGet better at this by 'calibrating your judgment'hashtag

    If you are "", your 90% credible intervals should contain the true value 90% of the time. To understand this better, assess your ability, and then practice to get better at estimating your confidence in results. will help you get practice at calibrating your judgments. We suggest you choose the "Calibrate your Judgment" tool, and select the "confidence intervals" exercise, choosing 90% confidence. Even a 10 or 20 minute practice session can help, and it's pretty fun.

    hashtag
    Claim identification, assessment, and implications

    We are now asking evaluators for “claim identification and assessment” where relevant. This is meant to help practitioners use this research to inform their funding, policymaking, and other decisions. It is not intended as a metric to judge the research quality per se. This is not required but we will reward this work.

    .

    hashtag
    Survey questions

    Lastly, we ask evaluators about their background, and for feedback about the process.

    chevron-rightSurvey questions for evaluators: detailshashtag

    For the two questions below, we will publish your responses unless you specifically ask these questions to be kept anonymous.

    1. How long have you been in this field?

    hashtag
    Other guidelines and notes

    chevron-rightNote on the evaluation platform (13 Feb 2024)hashtag

    12 Feb 2024: We are moving to a hosted form/interface in PubPub. That form is still somewhat a work-in-progress, and may need some further guidance; we try to provide this below, but please contact us with any questions. If you prefer, you can also submit your response in a Google Doc, and share it back with us. Click to make a new copy of that directly.

    Length/time spent: This is up to you. We welcome detail, elaboration, and technical discussion.

    chevron-rightLength and time: possible benchmarkshashtag

    recommends a 2–3 page referee report; suggest this is relatively short, but confirm that brevity is desirable. , economists report spending (median and mean) about one day per report, with substantial shares reporting "half a day" and "two days." We expect that reviewers tend spend more time on papers for high-status journals, and when reviewing work that is closely tied to their own agenda.

    chevron-rightAdjustments to earlier metrics; earlier evaluation formshashtag

    We have made some adjustments to this page and to our guidelines and processes; this is particularly relevant for considering earlier evaluations. See .

    circle-info

    If you still have questions, please contact us, or see our FAQ on .

    Our data protection statement is linked .

    Conventional guidelines for referee reports

    hashtag
    How to write a good review (general conventional guidelines)

    chevron-rightSome key pointshashtag

    Protecting anonymity

    The Unjournal Evaluators have the option of remaining anonymous (see ). Where evaluators choose this, we will carefully protect this anonymity, aiming at a high standard of protection, as good as or better than traditional journals. We will give evaluators the option to take extra steps to safeguard this further. We are offering anonymity in perpetuity to those who request it. (As well as anonymity on other terms to those who request it, on explicitly mutually agreed upon terms.)

    If they choose to stay anonymous, there should be no way for authors to be able to ‘guess’ who has reviewed their work.

    hashtag
    Some key principles/rules

    $450arrow-up-right
    Give quantitative metrics and predictions as described below
    .
  • Answer a short questionnaire about your background and our processes.

  • .

    Relevance to global priorities

    0 - 100%

    Does the paper consider the real-world relevance of the arguments and results presented, perhaps engaging policy and implementation questions?

    Do the authors communicate their work in ways policymakers and decision-makers can understand, without misleading or oversimplifying?

    Do the authors present practical impact quantifications, such as cost-effectiveness analyses? Do they report results that enable such analyses?

    3/5: Top B-journal/Strong field journal

  • 4/5: Marginal A-Journal/Top field journal

  • 5/5: A-journal/Top journal

  • the smallest interval
    that you believe is 90% likely to contain the true value.

    For some questions, the "true value" refers to something objective, e.g. will this work be published in a top-ranked journal? In other cases, like the percentile rankings, the true value means "if you had complete evidence, knowledge, and wisdom, what value would you choose?"

    For more information on credible intervals, this Wikipedia entryarrow-up-right may be helpful.

    If you are "well calibratedarrow-up-right", your 90% credible intervals should contain the true value 90% of the time.

    How many proposals and papers have you evaluated? (For journals, grants, and other peer review.)

    Answers to the questions below will not be made public:

    1. How would you rate this template and process?

    2. Do you have any suggestions or questions about this process or The Unjournal? (We will try to respond to your suggestions, and incorporate them in our practice.) [Open response]

    3. Would you be willing to consider evaluating a revised version of this project?

    Overall assessment

    0 - 100%

    Advancing our knowledge and practice

    0 - 100%

    Methods: Justification, reasonableness, validity, robustness

    0 - 100%

    Logic and communication

    0 - 100%

    Open, collaborative, replicable science

    0 - 100%

    Real world relevance

    0 - 100%

    What journal ranking tier should this work be published in?

    0.0-5.0

    lower, upper

    What journal ranking tier will this work be published in?

    0.0-5.0

    lower, upper

    For prospective evaluators
    here
    guidelines below
    here
    why we ask for these metrics here
    See below
    questionable research practicesarrow-up-right
    global priorities, cause prioritization, and high-impact interventions?
    global priorities, cause prioritization, and high-impact interventions?
    applied stream
    herearrow-up-right
    well calibratedarrow-up-right
    This web apparrow-up-right
    See guidelines and examples herearrow-up-right
    herearrow-up-right
    The Econometrics societyarrow-up-right
    Berk et al.arrow-up-right
    In a recent survey (Charness et al., 2022)arrow-up-right
    Evaluation (refereeing)
    herearrow-up-right
    Adjustments to metrics and guidelines/previous presentations
    People are found [reference needed] do a more careful job at prediction (and thus perhaps at overall rating too) if the outcome of interest is built up from components that are each judged separately.
  • We wanted to make the overall rating better defined and thus more useful to outsiders and comparable across raters

  • Emphasizing what we think is important (in particular, methodological reliability)

  • We didn't want evaluators to think we wanted them to weigh each category equally … some are clearly more important

  • However, we decided to remove these weightings because:

    1. Reduce clutter in an already overwhelming form and guidance doc. ‘More numbers’ can be particularly overwhelming

    2. These weights were ad-hoc, and they may suggest we have a more grounded ‘model of value’ than we already do. (And there is also some overlap in our categories anyways, something we are working on addressing.)

    3. Some people interpreted what we intended incorrectly (e.g., they thought we were saying ‘relevance to global priorities’ is not an important thing)

    A five-point 'Likert style' measure of confidence, which we described qualitatively and explained how we would convert it into CIs when we report aggregations.

    To make this process less confusing, to encourage careful quantification of uncertainty, and to enable better-justified aggregation of expert judgment, we are de-emphasizing the latter measure.

    Still, to accommodate those who may not be familiar with or comfortable stating "90% CIs on their own beliefs" we offer further explanations, and we are providing tools to help evaluators construct these. As a fallback, we will still allow evaluators to give the 1-5 confidence measure, noting the correspondence to CIs, but we discourage this somewhat.

    The previous guidelines can be seen here; these may be useful in considering evaluations provided pre-2024.

    starstarstarstarstarstarstarstarstarstar

    5

    51

    45, 55

    starstarstarstarstarstarstarstarstar

    4

    20

    10, 35

    starstarstarstarstarstarstarstar

    3

    60

    40, 70

    starstarstarstarstarstarstar

    2

    35

    30,46

    starstarstarstarstarstarstarstar

    0**

    30

    21,65

    starstarstarstarstarstar
    51

    45, 55

    starstarstarstarstarstarstarstarstar

    20

    10, 35

    starstarstarstarstarstarstarstar

    60

    40, 70

    starstarstarstarstarstarstar

    35

    30,46

    starstarstarstarstarstarstarstar

    30

    21,65

    starstarstarstarstarstar
    1 = Not confident: (90% confidence interval +/- more than 25 points)
    2 = Not very confident: 90% confidence interval, +/– 25 points or less

    1 = Not confident: 90% confidence interval +/– 25 points

    Most work in this area in the next ten years will be influenced by this paper.
  • This paper is substantially more rigorous or more insightful than existing work in this area in a way that matters for research and practice.

  • The work makes a major, perhaps decisive contribution to a case for (or against) a policy or philanthropic intervention.

  • Why these guidelines/metrics?(holistic, most important!)

    44

    39, 52

    starstarstarstarstarstarstarstarstar

    Why these guidelines/metrics?

    5

    50

    Why these guidelines/metrics?(holistic, most important!)

    44

    39, 52

    starstarstarstarstarstarstarstarstar

    Why these guidelines/metrics?

    50

    47, 54

    starstarstarstarstarstarstarstarstarstar
    Reshaping academic evaluation: beyond the binary...
    RepliCATSarrow-up-right
    calibration trainingarrow-up-right
    here
    HERE
    "Category explanations: what you are rating"
    confidencearrow-up-right
    credibilityarrow-up-right
    scale described below
    Unjournal Evaluator Guidelines and Metrics - Discussion spacearrow-up-right
    Calibrate Your Judgment apparrow-up-right

    47, 54

    Cite evidence and reference specific parts of the research when giving feedback.

  • Justify your critiques and claims in a reasoning-transparent way, rather than merely ‘"passing judgment." Avoid comments like "this does not pass the smell test".

  • Provide specific, actionable feedback to the author where possible.

  • Try to restate the authors’ arguments, clearly presenting the most reasonable interpretation of what they have written. See steelmanningarrow-up-right.

  • Be collegial and encouraging, but also rigorous. Criticize and question specific parts of the research without suggesting criticism of the researchers themselves.

  • We're happy for you to use whichever process and structure you feel comfortable with when writing your evaluation content.

    chevron-rightOne possible structurehashtag

    Core

    1. Briefly summarize the work in context

    2. Highlight positive aspects of the paper and its strengths and contributions, considered in the context of existing research.

    3. Most importantly: Identify and assess the paper's most important and impactful claim(s). Are these supported by the evidence provided? Are the assumptions reasonable? Are the authors using appropriate methods?

    4. Note major limitations and potential ways the work could be improved; where possible, reference methodological literature and discussion and work that models what you are suggesting.

    Optional/desirable

    • Offer suggestions for increasing the impact of the work, for incorporating the work into global priorities research and impact evaluations, and for supporting and enhancing future work.

    • Discuss minor flaws and their potential revisions.

    • Desirable: formal

    Please don't spend time copyediting the work. If you like, you can give a few specific suggestions and then suggest that the author look to make other changes along these lines.

    circle-info

    Remember: The Unjournal doesn’t “publish” and doesn’t “accept or reject.” So don’t give an Accept, Revise-and-Resubmit', or Reject-type recommendation. We ask for quantitative metrics, written feedback, and expert discussion of the validity of the paper's main claims, methods, and assumptions.

    hashtag
    Writing referee reports: resources and benchmarks

    Economics How to Write an Effective Referee Report and Improve the Scientific Review Process (Berk et al, 2017)arrow-up-right

    Semi-relevant: Econometric Society: Guidelines for refereesarrow-up-right

    Report: Improving Peer Review in Economics: Stocktaking and Proposal (Charness et al 2022)arrow-up-right

    Open Science

    PLOSarrow-up-right (Conventional but open access; simple and brief)

    Peer Community In... Questionnaire arrow-up-right(Open-science-aligned; perhaps less detail-oriented than we are aiming for)

    Open Reviewers Reviewer Guide arrow-up-right(Journal-independent “PREreview”; detailed; targets ECRs)

    General, other fields

    The Wiley Online Libraryarrow-up-right (Conventional; general)

    "Peer review in the life sciences (Fraser)"arrow-up-right (extensive resources; only some of this is applicable to economics and social science)

    hashtag
    Other templates and tools

    Collaborative template: RRR assessment peer reviewarrow-up-right

    Introducing Structured PREreviews on PREreview.orgarrow-up-right

    ‘the 4 validities’ and seaboatarrow-up-right

  • We will take steps to keep private any information that could connect the identity of an anonymous evaluator and their evaluation/the work they are evaluating.

  • We will take extra steps to make the possibility of accidental disclosure extremely small (this is never impossible of course, even in the case of conventional journal reviews). In particular, we will use pseudonyms or ID codes for these evaluators in any discussion or database that is shared among our management team that connects individual evaluators to research work.

  • If we ever share a list of Unjournal’s evaluators this will not include anyone who wished to remain anonymous (unless they explicitly ask us to be on such a list).

  • We will do our best to warn anonymous evaluators of ways that they might inadvertently be identifying themselves in the evaluation content they provide.

  • We will provide platforms to enable anonymous and secure discussion between anonymous evaluators and others (authors, editors, etc.) Where an anonymous evaluator is involved, we will encourage these platforms to be used as much as possible. In particular, see .

  • Aside: In future, we may consider allowing Evaluation Managers (formerly 'managing editors') to remain anonymous, and these tools will also be

    Why pay evaluators (reviewers)?

    It's a norm in academia that people do reviewing work for free. So why is The Unjournal paying evaluators?

    From a recent survey of economists:arrow-up-right

    We estimate that the average (median) respondent spends 12 (9) working days per year on refereeing. The top 10% of the distribution dedicates 25 working days or more, which is quite substantial considering refereeing is usually unpaid.

    hashtag
    General reasons to pay reviewers

    hashtag
    Economics, turnaround times

    The peer-review process in economics is widely-argued to be too slow and lengthy. But there is evidence that payments may help improve this.

    , they note that few economics journals currently pay reviewers (and these payments tend to be small (e.g., JPE and AER paid $100 at the time). However, they also note, citing several papers:

    The existing evidence summarized in Table 5 suggests that offering financial incentives could be an effective way of reducing turnaround time.

    hashtag
    Equity and inclusivity

    notes that the work of reviewing is not distributed equally. To the extent that accepting to do a report is based on individual goodwill, the unpaid volunteer model could be seen to unfairly penalize more generous and sympathetic academics. Writing a certain number of referee reports per year is generally considered part of "academic service". Academics put this on their CVs, and it may lead to being on the board of a journal which is valued to an extent. However, this is much less attractive for researchers who are not tenured university professors. Paying for this work would do a better job of including them in the process.

    hashtag
    Incentivizing useful, unbiased evaluations

    'Payment for good evaluation work' may also lead to fair and more useful evaluations.

    In the current system academics may take on this work in large part to try to impress journal editors and get favorable treatment from them when they submit their own work. They may also write reviews in particular ways to impress these editors.

    For less high-prestige journals, to get reviewers, editors often need to lean on their personal networks, including those they have power relationships with.

    Reviewers are also known to strategically try to get authors to cite and praise the reviewer's own work. They maybe especially critical to authors they see as rivals.

    To the extent that reviewers are doing this as a service they are being paid for, these other motivations will be comparatively somewhat less important. The incentives will be more in line with doing evaluations that are seen as valuable by the managers of the process, in order to get chosen for further paid work. (And, if evaluations are public, the managers can consider the public feedback on these reports as well.)

    hashtag
    Reasons for The Unjournal to pay evaluators

    1. We are not ‘just another journal.’ We need to give incentives for people to put effort into a new system and help us break out of the old inferior equilibrium.

    2. In some senses, we are asking for more than a typical journal. In particular, our evaluations will be made public and thus need to be better communicated.

    3. We cannot rely on 'reviewers taking on work to get better treatment from editors in the future.' This does not apply to our model, as we don't have editors make any sort of ‘final accept/reject decision’

    Prioritization ratings: discussion

    As noted in Process: prioritizing research, we ask people who suggest research to provide a numerical 0-100 rating:

    We also ask people within our team to act as 'assessors' to give as second and third opinions on this. This 'prioritization rating' is one of the criteria we will use to determine whether to commission research to be evaluated (along with author engagement, publication status, our capacity and expertise, etc.) Again, see the previous page for the current process.

    hashtag
    So what goes into this "prioritization rating"; what does it mean?

    We are working on a set of notes on this, fleshing this out and giving specific examples. At the moment this is available to members of our team only (ask for access to "Guidelines for prioritization ratings (internal)"). We aim to share a version of this publicly once it converges, and once we can get rid of arbitrary sensitive examples.

    hashtag
    Some key points

    I. This is not the evaluation itself. It is not an evaluation of the paper's merit per se:

    • Influential work, and prestigious work in influential areas may be highly prioritized regardless of its rigor and quality

    • The prioritization rating might consider quality for work that seems potentially impactful, which does not seem particularly prestigious or influential. Here aspects like writing clarity, methodological rigor, etc., might put it 'over the bar'. However, even here these will tend to be based on rapid and shallow assessments, and should not be seen as meaningful evaluations of research merit.

    II. These ratings will be considered along with the discussion by the field team and the management. Thus is helpful if you give a justification and explanation for your stated rating.

    hashtag
    One possible way of considering the rating criteria

    hashtag
    Key attributes/factors

    Define/consider the following ‘attributes’ of a piece of research:

    1. Global decision-relevance/VOI: Is this research decision-relevant to high-value choices and considerations that are important for global priorities and global welfare?

    2. Prestige/prominence: Is the research already prominent/valued (esp. in academia), highly cited, reported on, etc?

    3. Influence: Is the work already influencing important real-world decisions and considerations?

    Obviously, these are not binary factors; there is a continuum for each. But for the sake of illustration, consider the following flowcharts.

    circle-info

    If the flowcharts do not render, please refresh your browser. You may have to refresh twice.

    hashtag
    Prestigious work

    "Fully baked": Sometimes prominent researchers release work (e.g., on NBER) that is not particularly rigorous or involved, which may have been put together quickly. This might be research that links to a conference they are presenting at, to their teaching, or to specific funding or consulting. It may be survey/summary work, perhaps meant for less technical audiences. The Unjournal tends not to prioritize such work, or at least not consider it in the same "prestigious" basket (although there will be exceptions). In the flowchart above, we contrast this with their "fully-baked" work.

    Decision-relevant, prestigious work: Suppose the research is both ‘globally decision-relevant’ and prominent. Here, if the research is in our domain, we probably want to have it publicly evaluated. This is basically the case regardless of its apparent methodological strength. This is particularly true if it has been recently made public (as a working paper), if it has not yet been published in a highly-respected peer-reviewed journal, and if there are non-straightforward methodological issues involved.

    Prestigious work that seems less globally-relevant: We generally will not prioritize this work unless it adds to our mission in other ways (see, e.g., our ‘sustainability’ and ‘credibility’ goals ). In particular we will prioritize such research more if:

    • It is presented in innovative, transparent formats (e.g., dynamic documents/open notebooks, sharing code and data)

    • The research indirectly supports more globally-relevant research, e.g., through…

      • Providing methodological tools that are relevant to that ‘higher-value’ work

    hashtag
    Less prestigious work

    (If the flowchart below does not render, please refresh your browser; you may have to refresh twice.)

    Decision-relevant, influential (but less prestigious) work: E.g., suppose this research might be cited by a major philanthropic organization as guiding its decision-making, but the researchers may not have strong academic credentials or a track record. Again, if this research is in our domain, we probably want to have it publicly evaluated. However, depending on the rigor of the work and the way it is written, we may want to explicitly class this in our ‘non-academic/policy’ stream.

    Decision-relevant, less prestigious, less-influential work: What about for less-prominent work with fewer academic accolades that is not yet having an influence, but nonetheless seems to be globally decision-relevant? Here, our evaluations seem less likely to have an influence unless the work seems potentially strong, implying that our evaluations, rating, and feedback could boost potentially valuable neglected work. Here, our prioritization rating might focus more on our initial impressions of things like …

    • Methodological strength (this is a big one!)

    • Rigorous logic and communication

    • Open science and robust approaches

    Again: the prioritization process is not meant to be an evaluation of the work in itself. It’s OK to do this in a fairly shallow way.

    In future, we may want to put together a loose set of methodological ‘suggestive guidelines’ for work in different fields and areas, without being too rigid or prescriptive. (To do: we can draw from some existing frameworks for this [ref].)

    Communicating results

    hashtag
    Curating and publishing evaluations, linked to research

    • Unjournal PubPub pagearrow-up-right

      • Previous/less emphasized: Society

    • Evaluations and author response are given DOI's, enter the bibliometric record

      • Future consideration:

        • "publication tier" of authors' responses as a workaround to encode aggregated evaluation

    • Sharing evaluation data on public Github repo (see )

    hashtag
    Aggregating evaluators' ratings and predictions

    We aim to elicit the experiment judgment from Unjournal evaluators efficiently and precisely. We aim to communicate this quantitative information concisely and usefully, in ways that will inform policymakers, philanthropists, and future researchers.

    In the short run (in our pilot phase), we will attempt to present simple but reasonable aggregations, such as simple averages of midpoints and confidence-interval bounds. However, going forward, we are consulting and incorporating the burgeoning academic literature on "aggregating expert opinion." (See, e.g., ; ; ; .)

    We are working on this in our public data presentation (Quarto notebook) .

    hashtag
    Other communication

    We are considering...

    • Syntheses of evaluations and author feedback

    • Input to prediction markets, replication projects, etc.

    • Less technical summaries and policy-relevant summaries, e.g., for the , , or mainstream long-form outlets

    Project submission, selection and prioritization

    hashtag
    Submission/evaluation funnel

    As we are paying evaluators and have limited funding, we cannot evaluate every paper and project. Papers enter our database through

    1. submission by authors;

    our own searches (e.g., searching syllabi, forums, working paper archives, and white papers); and

  • suggestions from other researchers, practitioners, and members of the public, and recommendations from high-impact research-users. We have posted more detailed instructions for how to suggest research for evaluationarrow-up-right.

  • Our management team rates the suitability of each paper according to the criteria discussed below and in the aforementioned linked postarrow-up-right.

    hashtag
    Our procedures for identification and prioritization

    We have followed a few procedures for finding and prioritizing papers and projects. In all cases, we require more than one member of our research-involved team (field specialist, managers, etc.) to support a paper before prioritizing it.

    We are building a grounded systematic procedure with criteria and benchmarks. We also aim to give managers and field specialists some autonomy in prioritizing key papers and projects. As noted elsewhere, we are considering targets for particular research areas and sources.

    See our basic process (as of Dec. 2023) for prioritizing work: Process: prioritizing research

    See also (internal discussion):

    • Manager autonomy trackarrow-up-right

    • Airtable: columns for "crucial_research", "considering" view, "confidence," and "discussion"

    • Airtable: see "sources" (public view link herearrow-up-right)

    hashtag
    Authors' permission: sometimes required

    Through October 2022: For the papers or projects at the top of our list, we contacted the authors and asked if they wanted to engage, only pursuing evaluation if agreed.

    As of November 2022, we have a second track where, under certain conditions, we inform authors but do not request permission. For this track, we first focused on particularly relevant NBERarrow-up-right working papers.

    July 2023: We expanded this process to some other sources, with some discretion.

    See Direct evaluation track.

    hashtag
    Communicating: "editors'" process

    In deciding which papers or projects to send out to paid evaluators, we have considered the following issues. We aim to communicate the team's answers for each paper or project to evaluators before they write their evaluations.

    hashtag
    Summary: why is it relevant and worth engaging with?

    Consider: global priority importance, field relevance, open science, authors’ engagement, data and reasoning transparency. In gauging this relevance, the team may consider the ITN frameworkarrow-up-right, but not too rigidly.

    hashtag
    Why does it need (more) review? What are some key issues or claims to vet?

    What are (some of) the authors’ main claims that are worth carefully evaluating? What aspects of the evidence, argumentation, methods, interpretation, etc., is the team unsure about? What particular data, code, proof, etc., would they like to see vetted? If it has already been peer-reviewed in some way, why do they think more review is needed?

    hashtag
    To what extent is there author engagement?

    How well has the author engaged with the process? Do they need particular convincing? Do they need help making their engagement with The Unjournal successful?

    See What research to target? for further discussion of prioritization, scope, and strategic and sustainability concerns.

    'claim identification and assessment'arrow-up-right

    Our ‘paying evaluators’ brings in a wider set of evaluators, including non-academics. This is particularly relevant to our impact-focused goals.

    In Charness et al's full reportarrow-up-right
    The report cited abovearrow-up-right

    Hypothes.is annotation of hosted and linked papers and projects (aiming to integrate: see: hypothes.is for collab. annotation)

    Group: curating evaluations and papersarrow-up-right
    data reporting herearrow-up-right
    Hemming et al, 2017arrow-up-right
    Hanea et al, 2021arrow-up-right
    McAndrew et al, 2020arrow-up-right
    Marcoci et al, 2022arrow-up-right
    herearrow-up-right
    EA Forumarrow-up-right
    Asterisk magazinearrow-up-right
    our (proposed) use of Cryptpadarrow-up-right
    Evaluation (refereeing)
    Why these guidelines/metrics?
    Why these guidelines/metrics?
    Why these guidelines/metrics?
    Why these guidelines/metrics?
    Why these guidelines/metrics?
    Why these guidelines/metrics?
    Why these guidelines/metrics?
    Why these guidelines/metrics?
    Why these guidelines/metrics?
    Why these guidelines/metrics?
    More reliable, precise, and useful metrics

    Drawing attention to neglected high-priority research fields (e.g., animal welfare)

    Engagement with real-world policy considerations
    herearrow-up-right

    Our policies: evaluation & workflow

    See links below accessing current policies of The Unjournal, accompanied by discussion and including templates for managers and editors.

    hashtag
    1. Project submission, selection and prioritization

    People and organizations submit their own research or suggest research they believe may be high-impact. The Unjournal also directly monitors key sources of research and research agendas. Our team then systematically prioritizes this research for evaluation. See the link below for further details.

    Project submission, selection and prioritizationchevron-right

    hashtag
    2. Evaluation

    1. We choose an evaluation manager for each research paper or project. They commission and compensate expert evaluators to rate and discuss the research, following our evaluation template and . The original research authors are given a chance to publicly respond before we post these evaluations. See the link below for further details.

    hashtag
    3. Communicating results

    We make all of this evaluation work public on , along with an evaluation summary. We create DOIs for each element and submit this work to scholarly search engines. We also present a summary and analysis of our .

    We outline some further details in the link below.

    hashtag
    Flowchart

    See the link below for a full 'flowchart' map of our evaluation workflow

    Process: prioritizing research

    This page is a work-in-progress

    15 Dec 2023: Our main current process involves

    • Submitted and (internally/externally) suggested research

    • Prioritization ratings and discussion by Unjournal field specialists

    • Feedback from field specialist area teams

    • A final decision by the management team, guided by the above

    See (also embedded below) for more details of the proposed process.

    "Direct evaluation" track

    hashtag
    Second track: Direct evaluation of prominent work

    In addition to soliciting research submissionsarrow-up-right by authors, we directly prioritize unsubmitted research for evaluation, with a specific process and set of rules, outlined below.

    1. Choose a set of "top-tier working paper series" and medium-to-top-tier journals.

      This program started with the . We expanded this beyond NBER to research posted in other exclusive working paper archives and to work where all authors seem to be prominent, secure, and established. See .

    2. Identify relevant papers in this series, following our stated criteria (i.e., , strength, ). For NBER this tends to include

      • recently released work in the early stages of the journal peer-review process, particularly if it addresses a timely subject; as well as

      • work that has been around for many years, is widely cited and influential, yet has never been published in a peer-reviewed journal.

    We do this systematically and transparently; authors shouldn't feel singled out nor left out.

    1. Notify the work's authors that The Unjournal plans to commission evaluations. We're not asking for permission, but

      • making them aware of The Unjournal, the process, the , and the authors' opportunities to engage with the evaluation and publicly respond to the evaluation before it is made public;

      • letting us know if we have the most recent version of the paper, and if updates are coming soon;

    chevron-rightThe case for this "direct evaluation"hashtag
    1. Public benefit: Working papers (especially NBER) are already influencing policy and debate, yet they have not been peer-reviewed and may take years to go through this process, if ever (e.g., many NBER papers). However, it is difficult to understand the papers' limitations unless you happen to have attended an academic seminar where they were presented. Evaluating these publicly will provide a service.

    chevron-rightDiscussion: possible downsides and risks from this, responseshashtag

    1. Negative backlash: Some authors may dislike having their work publicly evaluated, particularly when there is substantial criticism. Academics complain a lot about unfair peer reviews, but the difference is that here the evaluations will be made public. This might lead The Unjournal to be the target of some criticism.

    Responses:

    circle-info

    Aside: in the future, we hope to work directly with working paper series, associations, and research groups to get their approval and engagement with Unjournal evaluations. We hope that having a large share of papers in your series evaluated will serve as a measure of confidence in your research quality. If you are involved in such a group and are interested in this, please reach out to us ().

    hashtag
    Direct evaluation: eligibility rules and guidelines

    hashtag
    NBER

    All NBER working papers are generally eligible, but watch for exceptions where authors seem vulnerable in their career. (And remember, we contact authors, so they can plead their case.)

    hashtag
    CEPR

    We treat these on a case-by-case basis and use discretion. All CEPR members are reasonably secure and successful, but their co-authors might not be, especially if these co-authors are PhD students they are supervising.

    hashtag
    Journal-published papers (i.e., 'post-publication evaluation')

    In some areas and fields (e.g., psychology, animal product markets) the publication process is relatively rapid or it may fail to engage general expertise. In general, all papers that are already published in peer-reviewed journals are eligible for our direct track.

    hashtag
    Papers or projects posted in any other working paper (pre-print) series

    These are eligible (without author permission) if all authors

    • have tenured or ‘long term’ positions at well-known, respected universities or other research institutions, or

    • have tenure-track positions at top universities (e.g., top-20 globally by some credible rankings), or

    • are clearly not pursuing an academic career (e.g., the "partner at the aid agency running the trial").

    On the other hand, if one or more authors is a PhD student close to graduation or an untenured academic outside a "top global program,’’ then we will ask for permission and potentially offer an embargo.

    • A possible exception to this exception: If the PhD student or untenured academic is otherwise clearly extremely high-performing by conventional metrics; e.g., an REStud "tourist" or someone with multiple published papers in top-5 journals. In such cases the paper might be considered eligible for direct evaluation.

    Mapping evaluation workflow

    The flowchart below focuses on the evaluation part of our process.

    hashtag
    Describing key steps in the flowchart

    (Section updated 1 August 2023)

    1. Submission/selection (multiple routes)

      1. Author (A) submits work (W), creates new submission (submits a URL and DOI), through our platform or informally.

        • Author (or someone on their behalf) can complete a submission form; this includes a potential "request for embargo" or other special treatment.

      2. Managers and field specialists select work (or the project is submitted independently of authors) and the management team agrees to prioritize it.

    2. Prioritization

      • Following author submission ...

        • Manager(s) (M) and Field Specialists (FS) prioritize work for review (see ).

    3. M assigns an Evaluation Manager (EM – typically part of our ) to selected project.

    4. EM invites evaluators (aka "reviewers") and shares the paper to be evaluated along with (optionally) a brief summary of why The Unjournal thinks it's relevant, and what we are asking.

      • Potential evaluators are given full access to (almost) all information submitted by the author and M, and notified of any embargo or special treatment granted.

      • EM may make special requests to the evaluator as part of a management policy (e.g., "signed/unsigned evaluation only," short deadlines, extra incentives as part of an agreed policy, etc.).

    5. Evaluator accepts or declines the invitation to review, and if the former, agrees on a deadline (or asks for an extension).

      • If the evaluator accepts, the EM shares full guidelines/evaluation template and specific suggestions with the evaluator.

    6. Evaluator completes an evaluation form.

    7. Evaluator submits evaluation including numeric ratings and predictions, plus "CI's" for these.

      • Possible addition (future plan): Reviewer asks for minor revisions and corrections; see "How revisions might be folded in..." in the fold below.

    8. EM collates all evaluations/reviews, shares these with Author(s).

      • Evaluator must be very careful not to share evaluators' identities at this point.

        • This includes caution to avoid accidentally-identifying information, especially where

    9. Author(s) read(s) evaluations, given two working weeks to submit responses.

      • If there is an embargo, there is more time to do this, of course.

    10. EM creates evaluation summary and "EM comments."

    11. EM or UJ team publishes each element on our space as a separate "pub" with a DOI for each (unless embargoed):

      1. Summary and EM comments

        • With a prominent section for the "ratings data tables"

    12. Authors and evaluators are informed once elements are on PubPub; next steps include promotion, checking bibliometrics, etc.

    13. ("Ratings and predictions data" to enter an additional public database.)

    Note that we intend to automate and integrate many of the process into an editorial-management-like system in PubPub.

    hashtag
    Consideration for the future: enabling "minor revisions"

    In our current (8 Feb 2023 pilot) phase, we have the evaluators consider the paper "as is," frozen at a certain date, with no room for revisions. The authors can, of course, revise the paper on their own and even pursue an updated Unjournal review; we would like to include links to the "permanently updated version" in the Unjournal evaluation space.

    After the pilot, we may consider making minor revisions part of the evaluation process. This may add substantial value to the papers and process, especially where evaluators identify straightforward and easily-implementable improvements.

    chevron-rightHow revisions might be folded into the above flowhashtag

    If "minor revisions" are requested:

    • ... the author has four (4) weeks (strict) to make revisions if they want to, submit a new linked manuscript, and also submit their response to the evaluation.

    hashtag
    Why would we (potentially) consider only minor revisions?

    We don't want to replicate the slow and inefficient processes of the traditional system. Essentially, we want evaluators to give a report and rating as the paper stands.

    We also want to encourage papers as projects. The authors can improve it, if they like, and resubmit it for a new evaluation.

    Recap: submissions

    Text to accompany the Impactful Research Prize discussion

    hashtag
    Details of submissions to The Unjournal

    Note: This section largely repeats content in our guide for researchers/authorsarrow-up-right, especially our FAQ on "why engage."

    Jan. 2024: We have lightly updated this page to reflect our current systems.

    hashtag
    What we are looking for

    We describe the nature of the work we are looking to evaluate, along with examples, in . Update 2024: This is now better characterized under and .

    If you are interested in submitting your work for public evaluation, we are looking for research which is relevant to global priorities—especially quantitative social sciences—and impact evaluations. Work that would benefit from further feedback and evaluation is also of interest.

    Your work will be evaluated using our evaluation guidelines and metrics. You can read these before submitting.

    Important Note: We are not a journal. By having your work evaluated, you will not be giving up the opportunity to have your work published in a journal. We simply operate a system that allows you to have your work independently evaluated.

    If you think your work fits our criteria and would like it to be publicly evaluated, please submit your work through .

    If you would like to submit more than one of your papers, you will need to complete a new form for each paper you submit.

    hashtag
    Conditional embargo on the publishing of evaluations

    By default, we would like Unjournal evaluations to be made public. We think public evaluations are generally good for authors, as explained . However, in special circumstances and particularly for very early-career researchers, we may make exceptions.

    If there is an early-career researcher on the author team, we will allow authors to "embargo" the publication of the evaluation until a later date. This date is contingent, but not indefinite. The embargo lasts until after a PhD/postdoc’s upcoming job search or until it has been published in a mainstream journal, unless:

    • the author(s) give(s) earlier permission for release; or

    • until a fixed upper limit of 14 months is reached.

    If you would like to request an exception to a public evaluation, you will have the opportunity to explain your reasoning in the submission form.

    See "" for more detail, and examples.

    hashtag
    Why might an author want to engage with The Unjournal?

    1. The Unjournal presents an additional opportunity for evaluation of your work with an emphasis on impact.

    2. Substantive feedback will help you improve your work—especially useful for young scholars.

    3. Ratings can be seen as markers of credibility for your work that could help your career advancement at least at the margin, and hopefully help a great deal in the future. You also gain the opportunity to

    hashtag
    What we might ask of authors

    If we consider your work for public evaluation, we may ask for some of the items below, although most are optional. We will aim to make this a very light touch for authors.

    1. A link to a non-paywalled, hosted version of your work (in any format—PDFs are not necessary) that can be given a Digital Object Identifier (DOI). Again, we will not be "publishing" this work, just evaluating it.

    2. A link to data and code, if possible. We will work to help you to make it accessible.

    3. Assignment of two evaluators who will be paid to assess your work. We will likely keep their identities confidential, although this is flexible depending on the reviewer. Where it seems particularly helpful, we will facilitate a confidential channel to enable a dialogue with the authors. One person on our managing team will handle this process.

    • By completing the submission form, you are providing your permission for us to post the evaluations publicly unless you request an embargo.

    • You will have a two-week window to respond through our platform before anything is posted publicly. Your responses can also be posted publicly.

    For more information on why authors may want to engage and what we may ask authors to do, please see .

    hashtag

    guidelines
    Evaluationchevron-right
    our PubPub pagearrow-up-right
    evaluation ratings dataarrow-up-right
    Communicating results
    Mapping evaluation workflowchevron-right
    publicly respond to critiques and correct misunderstandings.
  • You will gain visibility and a connection to the EA/Global Priorities communities and the Open Science movement.

  • You can take advantage of this opportunity to gain a reputation as an ‘early adopter and innovator’ in open science.

  • You can win prizes: You may win a “best project prize,” which could be financial as well as reputational.

  • Entering into our process will make you more likely to be hired as a paid reviewer or editorial manager.

  • We will encourage media coverage.

  • Have evaluators publicly post their evaluations (i.e., 'reviews') of your work on our platform. As noted above, we will ask them to provide feedback, thoughts, suggestions, and some quantitative ratings for the paper.

  • this forum postarrow-up-right
    "What research to target?"
    "What specific areas do we cover?"
    here
    this formarrow-up-right
    here
    Conditional embargos & exceptions
    For researchers/authors
    Here again is the link to submit your work on our platform. arrow-up-right
  • letting the authors complete our forms if they wish, giving further information about the paper or e.g. adding a "permalink" to updated versions;

  • asking if there are authors in sensitive career positions justifying a; and

  • asking the authors if there is specific feedback they would like to receive.

  • Reaching out to and commissioning evaluators, as in our regular process. Considerations:

    • Evaluators should be made aware that the authors have not directly requested this review, but have been informed it is happening.

    • As this will allow us to consider a larger set of papers more quickly, we can reach out to multiple evaluators more efficiently.

  • Specifically for NBER: This working paper series is highly influential and relied upon by policy makers and policy journalists. It'd an elite outlet: only members of NBER are able to post working papers here.

  • Fear of public evaluation (safety in numbers): There may be some shyness or reluctance to participate in The Unjournal evaluation process (for reasons to do so, see our discussion). It is scary to be a first mover, and it may feel unfair to be among the few people to have an evaluation of your work out there in public (in spite of the Bayesian arguments presented in the previous link). There should be "safety" in numbers: having a substantial number of prominent papers publicly evaluated by The Unjournal will ease this concern.

  • Passive evaluation may be preferred to active consent: Academics (especially early-career) may also worry that they will seem weird or rebellious by submitting to The Unjournal, as this may be taken as "rejecting mainstream system norms." Again, this will be less of a problem if a substantial number of public evaluations of prominent papers are posted. You will be in good company. Furthermore, if we are simply identifying papers for evaluation, the authors of these papers cannot be seen as rejecting the mainstream path (as they did not choose to submit).

  • Piloting and building a track record or demonstration: The Unjournal needs a reasonably large set of high-quality, relevant work to evaluate in order to help us build our system and improve our processes. Putting out a body of curated evaluation work will also allow us to demonstrate the reasonableness and reliability of this process.

  • Public engagement in prominent and influential work is fair and healthy. It is good to promote public intellectual debate. Of course, this process needs to allow constructive criticism as well as informative praise.
  • We will work to ensure that the evaluations we publish involve constructive dialogue, avoid unnecessary harshness, and provide reasons for their critiques. We also give authors the opportunity to respond.

  • We are focusing on more prominent papers, with authors in more secure positions. Additionally, we offer a potential "embargo" for sensitive career situations, e.g., those that might face early-career researchers.

  • 2. Less author engagement: If authors do not specifically choose to have their work evaluated, they are less likely to engage fullly with the process.

    Response: This is something we will keep an eye on, weighing the benefits and costs.

    3. Evaluator/referee reluctance: As noted above, evaluators may be more reluctant to provide ratings and feedback on work where the author has not instigated the process.

    Response: This should largely be addressed by the fact that we allow evaluators to remain anonymous. A potential cost here is discouraging signed evaluations, which themselves have some benefits (as well as possible costs).

    4. Slippery-slope towards "unfairly reviewing work too early": In some fields, working papers are released at a point where the author does not wish them to be evaluated, and where the author is not implicitly making strong claims about the validity of this work. In economics, working papers tend to be released when they are fairly polished and the authors typically seek feedback and citations. The NBER series is a particularly prominent example. However, we don't want extend the scope of direct evaluation too far.

    Response: We will be careful with this. Initially, we are extending this evaluation process only to the NBER series. Next, we may consider direct evaluation of fairly prestigious publications in "actual" peer-reviewed journals, particularly in fields (such as psychology) where the peer-review process is much faster than in economics. As NBER is basically "USA-only", we have extended this to other series such as , while being sensitive to the prestige/vulnerability tradeoffs.

    NBER working paper seriesarrow-up-right
    relevance
    need for further review
    benefits to authors
    are never published in peer-reviewed journalsarrow-up-right
    Direct evaluation: eligibility rules and guidelines
    • For either of these cases (1 or 2), authors are asked for permission.

  • Alternate : "Work enters prestige archive" (NBER, CEPR, and some other cases).

    • Managers inform and consult the authors but permission is not needed. (Particularly relevant: we confirm with author that we have the latest updated version of the research.)

  • Following direct evaluation selection...

    • M or FS may add additional (fn1) "evaluation suggestions" (see ) explaining why it's relevant, what to evaluate, etc., to be shared later with evaluators.

  • If requested (in either case), M decides whether to grant embargo or other special treatment, notes this, and informs authors.

  • EM (also, optionally) may add "evaluation suggestions" to share with the evaluators.

  • evaluators chose anonymity
    .
  • Even if evaluators chose to "sign their evaluation," their identity should not be disclosed to authors at this point. However, evaluators are told they can reach out to the authors if they desire.

  • Evaluations are shared with the authors as a separate doc, set of docs, file, or space; which the evaluators do not have automatic access to. (Going forward, this will be automated.)

  • It is made clear to authors that their responses will be published (and given a DOI, when possible).

  • Each evaluation, with summarized ratings at the top

  • The author response

    • All of the above are linked in a particular way, with particular settings;

  • Optional: Reviewers can comment on any minor revisions and adjust their rating.
    Project selection and evaluation
    management team or advisory board
    PubPubarrow-up-right
    permanent-beta
    this docarrow-up-right
    temporary "embargo"
    Membership is prestigious and available only by invitation.arrow-up-right
    benefits to authors
    CEPRarrow-up-right
    Direct Evaluation track
    examples herearrow-up-right
    see notesarrow-up-right
    spinner
    spinner
    contact@unjournal.orgenvelope
    spinner