1 of 9

Project submission, selection and prioritization

Submission/evaluation funnel

As we are paying evaluators and have limited funding, we cannot evaluate every paper and project. Papers enter our database through

submission by authors;
our own searches (e.g., searching syllabi, forums, working paper archives, and white papers); and
s from other researchers, practitioners, and members of the public, and recommendations from . We have posted more detailed instructions for .

Our management team rates the suitability of each paper according to the criteria discussed below and .

Our procedures for identification and prioritization

We have followed a few procedures for finding and prioritizing papers and projects. In all cases, we require more than one member of our research-involved team (field specialist, managers, etc.) to support a paper before prioritizing it.

We are building a grounded systematic procedure with criteria and benchmarks. We also aim to give managers and field specialists some autonomy in prioritizing key papers and projects. As noted elsewhere, we are considering targets for particular research areas and sources.

See our basic process (as of Dec. 2023) for prioritizing work:

See also (internal discussion):

Airtable: columns for "crucial_research", "considering" view, "confidence," and "discussion"
Airtable: see "sources" ()

Authors' permission: sometimes required

Through October 2022: For the papers or projects at the top of our list, we contacted the authors and asked if they wanted to engage, only pursuing evaluation if agreed.

July 2023: We expanded this process to some other sources, with some discretion.

Communicating: "editors'" process

Summary: why is it relevant and worth engaging with?

Why does it need (more) review? What are some key issues or claims to vet?

What are (some of) the authors’ main claims that are worth carefully evaluating? What aspects of the evidence, argumentation, methods, interpretation, etc., is the team unsure about? What particular data, code, proof, etc., would they like to see vetted? If it has already been peer-reviewed in some way, why do they think more review is needed?

To what extent is there author engagement?

How well has the author engaged with the process? Do they need particular convincing? Do they need help making their engagement with The Unjournal successful?

What research to target?

(for pilot and beyond)

Our is quantitative work that informs global priorities (see linked discussion), especially in . We want to see better research leading to better outcomes in the real world (see our 'Theory of Change').

See (earlier) discussion in public call/EA forum discussion HERE.

To reach these goals, we need to select "the right research" for evaluation. We want to choose papers and projects that are highly relevant, methodologically promising, and that will benefit substantially from our evaluation work. We need to optimize how we select research so that our efforts remain mission-focused and useful. We also want to make our process transparent and fair. To do this, we are building a coherent set of criteria and goals, and a specific approach to guide this process. We explore several dimensions of these criteria below.

Management access only: General discussion of prioritization in Gdoc HERE. Private discussion of specific papers in Airtable and links (e.g., HERE). We incorporate some of this discussion below.

High-level considerations for prioritizing research

When considering a piece of research to decide whether to commission it to be evaluated, we can start by looking at its general relevance as well as the value of evaluating and rating it.

Our prioritization of a paper for evaluation should not be seen as an assessment of its quality, nor of its 'vulnerability'. Furthermore, specific and less intensive.

Why is it relevant and worth engaging with?

We consider (and prioritize) the importance of the research to global priorities; its relevance to crucial decisions; the attention it is getting, the influence it is having; its direct relevance to the real world; and the potential value of the research for advancing other impactful work. We de-prioritize work that has already been credibly (publicly) . We also consider the fit of the research with our scope (social science, etc.), and the likelihood that we can commission experts to meaningfully evaluate it. As noted below, some 'instrumental goals' (sustainability, building credibility, driving change, ...) also play a role in our choices.

Some features we value, that might raise the probability we consider a paper or project include the commitment and contribution to open science, the authors' engagement with our process, and the logic, communication, and transparent reasoning of the work. However, if a prominent research paper is within our scope and seems to have a strong potential for impact, we will prioritize it highly, whether or not it has these qualities.

2. Why does it need (more) evaluation, and what are some key issues and claims to vet?

We ask the people who suggest particular research, and experts in the field:

What are (some of) the authors’ key/important claims that are worth evaluating?
What aspects of the evidence, argumentation, methods, and interpretation, are you unsure about?
What particular data, code, proofs, and arguments would you like to see vetted? If it has already been peer-reviewed in some way, why do you think more review is needed?

Ultimate goals: what are we trying to optimize?

Put broadly, we need to consider how this research allows us to achieve our own goals in line with our Global Priorities Theory of Change flowchart, The research we select and evaluate should meaningfully drive positive change. One way we might see this process: “better research & more informative evaluation” → “better decision-making” → “better outcomes” for humanity and for non-human animals (i.e., the survival and flourishing of life and human civilization and values).

Prioritizing research to achieve these goals

As we weigh research to prioritize for evaluation, we need to balance directly having a positive impact against building our ability to have an impact in the future.

A. Direct impact (‘score goals now’)

Below, we adapt the "ITN" cause prioritization framework (popular in effective altruism circles) to assess the direct impact of our evaluations.

Importance

What is the direct impact potential of the research?

This is a massive question many have tried to address (see sketches and links below). We respond to uncertainty around this question in several ways, including:

Consulting a range of sources, not only EA-linked sources.
- EA and more or less adjacent: Agendas and overviews, Syllabi.
- Non-EA, e.g., https://globalchallenges.org/.
Scoping what other sorts of work are representative inputs to GP-relevant work.
- Get a selection of seminal GP publications; look back to see what they are citing and categorize by journal/field/keywords/etc.

Neglectedness

Where is the current journal system failing GP-relevant work the most . . . in ways we can address?

Tractability

“Evaluability” of research: Where does the UJ approach yield the most insight or value of information?
Existing expertise: Where do we have field expertise on the UJ team? This will help us commission stronger evaluations.
"Feedback loops": Could this research influence concrete intervention choices? Does it predict near-term outcomes? If so, observing these choices and outcomes and getting feedback on the research and our evaluation can yield strong benefits.

Consideration/discussion: How much should we include research with indirect impact potential (theoretical, methodological, etc.)?

B. Sustainability: funding, support, participation

Moreover, we need to consider how the research evaluation might support the sustainability of The Unjournal and the broader general project of open evaluation. We may need to strike a balance between work informing the priorities of various audences, including:

Relevance to stakeholders and potential supporters
Clear connections to impact; measurability
Support from relevant academic communities
Support from open science

Consideration/discussion: What will drive further interest and funding?

C. Credibility, visibility, driving positive institutional change

Finally, we consider how our choices will increase the visibility and solidify the credibility of The Unjournal and open evaluations. We consider how our work may help drive positive institutional change. We aim to:

Interest and involve academics—and build the status of the project.
Commission evaluations that will be visibly useful and credible.
‘Benchmark traditional publication outcomes’, track our predictiveness and impact.
Have strong leverage over research "outcomes and rewards."
Increase public visibility and raise public interest.
Bring in supporters and participants.
Achieve substantial output in a reasonable time frame and with reasonable expense.
Maintain goodwill and a justified reputation for being fair and impartial.

But some of these concerns may have trade offs

We are aware of possible pitfalls of some elements of our vision.

We are pursuing a second "high-impact policy and applied research" track for evaluation. This will consider work that is not targeted at academic audiences. This may have direct impact and please SFF funders, but, if not done carefully, this may distract us from changing academic systems, and may cost us status in academia.

A focus on topics perceived as niche (e.g., the economics and game theory of AI governance and AI safety) may bring a similar tradeoff.

On the other hand, perhaps a focus on behavioral and experimental economics would generate lots of academic interest and participants; this could help us benchmark our evaluations, etc.; but this may also be less directly impactful.

Giving managers autonomy and pushing forward quickly may bring the risk of perceived favoritism; a rule-based systematic approach to choosing papers to evaluate might be slower and less interesting for managers. However, it might be seen as fairer (and it might enable better measurement of our impact).

We hope we have identified the important considerations (above); but we may be missing key points. We continue to engage discussion and seek feedback, to hone and improve our processes and approaches.

Data: what are we evaluating/considering?

We present and analyze the specifics surrounding our current evaluation data in this interactive notebook/dashboard here.

Below: An earlier template for considering and discussing the relevance of research. This was/is provided both for our own consideration and for sharing (in part?) with evaluators, to give . Think of these as bespoke evaluation notes for a .

Proposed template

Title

One-click-link to paper
Link to any private hosted comments on the paper/project

Summary; why is this research relevant and worth engaging with?

As mentioned under High level considerations, consider factors including importance to global priorities, relevance to the field, the commitment and contribution to open science, the authors’ engagement, and the transparency of data and reasoning. You may consider the ITN framework explicitly, but not too rigidly.

Why does it need (more) review, and what are some key issues and claims to vet?

What are (some of) the authors’ main important claims that are worth carefully evaluating? What aspects of the evidence, argumentation, methods, interpretation, etc., are you unsure about? What particular data, code, proof, etc., would you like to see vetted? If it has already been peer-reviewed in some way, why do you think more review is needed?

What sort of reviewers should be sought, and what should they be asked?

What types of expertise and background would be most appropriate for the evaluation? Who would be interested? Please try to make specific suggestions.

How well has the author engaged with the process?

Do they need particular convincing? Do they need help making their engagement with The Unjournal successful?

What specific areas do we cover?

This discussion is a work-in-progress

We are targeting global priorities-relevant research...
With the potential for impact, and with the potential for Unjournal evaluations to have an impact (see our high-level considerations and our prioritization ratings discussions).
Our is quantitative work that informs global priorities (see linked discussion), especially in , informing our Theory of Change.
We give a data presentation of the work we have already covered and the work we are prioritizing here, which will be continually updated.

But what does this mean in practice? What specific research fields, topics, and approaches are we likely to classify as 'relevant to evaluate'?

We give some lists and annotated examples below.

Fields, methods, and approaches

As of January 2024 The Unjournal focuses on ...

Research where the fundamental question being investigated involves human behavior and beliefs and the consequences of these. This may involve markets, production processes, economic constraints, social interactions, technology, the 'market of ideas', individual psychology, government processes, and more. However, the main research question should not revolve around issues outside of human behavior, such as physical science, biology, or computer science and engineering. These areas are out of our scope (at least for now).
Research that is fundamentally quantitative and uses . It will generally involve or consider measurable inputs, choices, and outcomes; specific categorical or quantitative questions; analytical and mathematical reasoning; hypothesis testing and/or belief updating, etc.
Research that targets and addresses a single specific question or goals, or a small cluster. It should not mainly be a broad discussion and overview of other research or conceptual issues.

This to generally involves the academic fields:

Economics
Applied Statistics (and some other applied math)
Psychology
Political Science
Other quantitative social science fields (perhaps Sociology)
Applied "business school" fields: finance, accounting, operations, etc.
Applied "policy and impact evaluation" fields
Life science/medicine where it targets human behavior/social science

These discipline/field boundaries are not strict; they may adapt as we grow

Why this field/method focus?

These were chosen in light of two main factors:

Our founder and our team is most comfortable assessing and managing the consideration of research in these areas.
These fields seem to be particularly amenable to, and able to benefit from our journal-independent evaluation approach. Other fields, such as biology, are already being 'served' by strong initiatives like Peer Communities In.

Ex.: work we included/excluded based on field/method

To do: We will give and explain some examples here

Outcomes, focus areas, and causes

The Unjournal's mission is to prioritize

research with the strongest potential for a positive impact on global welfare
where public evaluation of this research will have the greatest impact

Given this broad goal, we consider research into any cause, topic, or outcome, as long as the research involves fields, methods, and approaches within our domain (see above), and as long as the work meets our other requirements (e.g., research must be publicly shared without a paywall).

While we don't have rigid boundaries, we are nonetheless focusing on certain areas:

Fields

(As of Jan. 2024) we have mainly commissioned evaluations of work involving development economics and health-related outcomes and interventions in low-and middle-income countries.

As well as research involving

Environmental economics, conservation, harm to human health
The social impact of AI and emerging technologies
Economics, welfare, and governance
Catastrophic risks; predicting and responding to these risks
The economics of innovation; scientific progress and meta-science
The economics of health, happiness, and wellbeing

We are currently prioritizing further work involving

Psychology, behavioral science, and attitudes: the spread of misinformation; other-regarding preferences and behavior; moral circles
Animal welfare: markets, attitudes
Methodological work informing high-impact research (e.g., methods for impact evaluation)

We are also considering prioritizing work involving

AI governance and safety
Quantitative political science (voting, lobbying, attitudes)
Political risks (including authoritarian governments and war and conflict)
Institutional decisionmaking and policymaking
Long-term growth and trends; the long-term future of civilization; forecasting

Examples of work we chose to prioritize or de-prioritize based on focus area

To do: We will give and explain some examples here

Process: prioritizing research

This page is a work-in-progress

15 Dec 2023: Our main current process involves

Submitted and (internally/externally) suggested research
Prioritization ratings and discussion by Unjournal field specialists
Feedback from field specialist area teams
A final decision by the management team, guided by the above

See this doc (also embedded below) for more details of the proposed process.

Prioritization ratings: discussion

As noted in Process: prioritizing research, we ask people who suggest research to provide a numerical 0-100 rating:

We also ask people within our team to act as 'assessors' to give as second and third opinions on this. This 'prioritization rating' is one of the criteria we will use to determine whether to commission research to be evaluated (along with author engagement, publication status, our capacity and expertise, etc.) Again, see the previous page for the current process.

So what goes into this "prioritization rating"; what does it mean?

We are working on a set of notes on this, fleshing this out and giving specific examples. At the moment this is available to members of our team only (ask for access to "Guidelines for prioritization ratings (internal)"). We aim to share a version of this publicly once it converges, and once we can get rid of arbitrary sensitive examples.

Some key points

I. This is not the evaluation itself. It is not an evaluation of the paper's merit per se:

Influential work, and prestigious work in influential areas may be highly prioritized regardless of its rigor and quality
The prioritization rating might consider quality for work that seems potentially impactful, which does not seem particularly prestigious or influential. Here aspects like writing clarity, methodological rigor, etc., might put it 'over the bar'. However, even here these will tend to be based on rapid and shallow assessments, and should not be seen as meaningful evaluations of research merit.

II. These ratings will be considered along with the discussion by the field team and the management. Thus is helpful if you give a justification and explanation for your stated rating.

One possible way of considering the rating criteria

Key attributes/factors

Define/consider the following ‘attributes’ of a piece of research:

Global decision-relevance/VOI: Is this research decision-relevant to high-value choices and considerations that are important for global priorities and global welfare?
Prestige/prominence: Is the research already prominent/valued (esp. in academia), highly cited, reported on, etc?
Influence: Is the work already influencing important real-world decisions and considerations?

Obviously, these are not binary factors; there is a continuum for each. But for the sake of illustration, consider the following flowcharts.

If the flowcharts do not render, please refresh your browser. You may have to refresh twice.

Prestigious work

"Fully baked": Sometimes prominent researchers release work (e.g., on NBER) that is not particularly rigorous or involved, which may have been put together quickly. This might be research that links to a conference they are presenting at, to their teaching, or to specific funding or consulting. It may be survey/summary work, perhaps meant for less technical audiences. The Unjournal tends not to prioritize such work, or at least not consider it in the same "prestigious" basket (although there will be exceptions). In the flowchart above, we contrast this with their "fully-baked" work.

Decision-relevant, prestigious work: Suppose the research is both ‘globally decision-relevant’ and prominent. Here, if the research is in our domain, we probably want to have it publicly evaluated. This is basically the case regardless of its apparent methodological strength. This is particularly true if it has been recently made public (as a working paper), if it has not yet been published in a highly-respected peer-reviewed journal, and if there are non-straightforward methodological issues involved.

Prestigious work that seems less globally-relevant: We generally will not prioritize this work unless it adds to our mission in other ways (see, e.g., our ‘sustainability’ and ‘credibility’ goals here). In particular we will prioritize such research more if:

It is presented in innovative, transparent formats (e.g., dynamic documents/open notebooks, sharing code and data)
The research indirectly supports more globally-relevant research, e.g., through…
- Providing methodological tools that are relevant to that ‘higher-value’ work
- Drawing attention to neglected high-priority research fields (e.g., animal welfare)

Less prestigious work

(If the flowchart below does not render, please refresh your browser; you may have to refresh twice.)

Decision-relevant, influential (but less prestigious) work: E.g., suppose this research might be cited by a major philanthropic organization as guiding its decision-making, but the researchers may not have strong academic credentials or a track record. Again, if this research is in our domain, we probably want to have it publicly evaluated. However, depending on the rigor of the work and the way it is written, we may want to explicitly class this in our ‘non-academic/policy’ stream.

Decision-relevant, less prestigious, less-influential work: What about for less-prominent work with fewer academic accolades that is not yet having an influence, but nonetheless seems to be globally decision-relevant? Here, our evaluations seem less likely to have an influence unless the work seems potentially strong, implying that our evaluations, rating, and feedback could boost potentially valuable neglected work. Here, our prioritization rating might focus more on our initial impressions of things like …

Methodological strength (this is a big one!)
Rigorous logic and communication
Open science and robust approaches
Engagement with real-world policy considerations

Again: the prioritization process is not meant to be an evaluation of the work in itself. It’s OK to do this in a fairly shallow way.

In future, we may want to put together a loose set of methodological ‘suggestive guidelines’ for work in different fields and areas, without being too rigid or prescriptive. (To do: we can draw from some existing frameworks for this [ref].)

Suggesting research (forms, guidance)

Paths to suggest research

Research can be "submitted" by authors () or "suggested" by others. For a walk-through on suggesting research, see example.

There are two main paths for making suggestions: or .

1. Through our

Anyone can suggest research using the survey form at . (Note, if you want to "submit your own research," go to .) Please include the following steps:

Review The Unjournal's Guidelines

Begin by reviewing to get a sense of the research we cover and our priorities. Look for high-quality research that 1) falls within our focus areas and 2) would benefit from (further) evaluation.

When in doubt, we encourage you to suggest the research anyway.

Fill out the Suggestion Form

Navigate to The Unjournal's . Most of the fields here are optional. The fields ask the following information:

Who you are: Let us know who is making the suggestion (you can also choose to stay anonymous).
- If you leave your contact information, you will be eligible for financial "bounties" for strong suggestions.
- If you are already a member of The Unjournal's team, additional fields will appear for you to link your suggestion to your profile in the Unjournal's database.
Research Label: Provide a short, descriptive label for the research you are suggesting. This helps The Unjournal quickly identify the topic at a glance.
Research Importance: Explain why the research is important, its potential impact, and any specific areas that require thorough evaluation.
Research Link: Include a direct URL to the research paper. The Unjournal prefers research that is publicly hosted, such as in a working paper archive or on a personal website.
Peer Review Status: Inform about the peer review status of the research, whether it's unpublished, published without clear peer review, or published in a peer-reviewed journal.
"Rate the relevance": This represents your best-guess at how relevant this work is for The Unjournal to evaluate, as a percentile relative to other work we are considering.
Research Classification: Choose categories that best describe the research. This helps The Unjournal sort and prioritize suggestions.
Field of Interest: Select the outcome or field of interest that the research addresses, such as global health in low-income countries.

2. For Field Specialists and managers: via Airtable

Further guidance

Aside on setting the prioritization ratings: In making your subjective prioritization rating, please consider “What percentile do you think this paper (or project) is relative to the others in our database, in terms of ‘relevance for The UJ to evaluate’?” (Note this is a redefinition; we previously considered these as probabilities.) We roughly plan to commission the evaluation of about 1 in 5 papers in the database, the ‘top 20%’ according to these percentiles. Please don’t consider the “publication status or the “author's propensity to engage” in this rating. We will consider those as separate criteria.

Notes for field specialists/Unjournal Team

Please don’t enter only the papers you think are ‘very relevant’; please enter in all research that you have spent any substantial time considering (more than a couple minutes). If we all do this, we should all aim for our percentile ratings to be approximately normally distributed; evenly spread over the 1-100% range.

"Direct evaluation" track

Second track: Direct evaluation of prominent work

In addition to soliciting research submissions by authors, we directly prioritize unsubmitted research for evaluation, with a specific process and set of rules, outlined below.

Choose a set of "top-tier working paper series" and medium-to-top-tier journals.
This program started with the NBER working paper series. We expanded this beyond NBER to research posted in other exclusive working paper archives and to work where all authors seem to be prominent, secure, and established. See #direct-evaluation-eligibility-rules-and-guidelines.
Identify relevant papers in this series, following our stated criteria (i.e., relevance, strength, need for further review). For NBER this tends to include
- recently released work in the early stages of the journal peer-review process, particularly if it addresses a timely subject; as well as
- work that has been around for many years, is widely cited and influential, yet has never been published in a peer-reviewed journal.

We do this systematically and transparently; authors shouldn't feel singled out nor left out.

Notify the work's authors that The Unjournal plans to commission evaluations. We're not asking for permission, but
- making them aware of The Unjournal, the process, the benefits to authors, and the authors' opportunities to engage with the evaluation and publicly respond to the evaluation before it is made public;
- letting us know if we have the most recent version of the paper, and if updates are coming soon;
- letting the authors complete our forms if they wish, giving further information about the paper or e.g. adding a "permalink" to updated versions;
- asking if there are authors in sensitive career positions justifying a temporary "embargo"; and
- asking the authors if there is specific feedback they would like to receive.
Reaching out to and commissioning evaluators, as in our regular process. Considerations:
- Evaluators should be made aware that the authors have not directly requested this review, but have been informed it is happening.
- As this will allow us to consider a larger set of papers more quickly, we can reach out to multiple evaluators more efficiently.

The case for this "direct evaluation"

Public benefit: Working papers (especially NBER) are already influencing policy and debate, yet they have not been peer-reviewed and may take years to go through this process, if ever (e.g., many NBER papers are never published in peer-reviewed journals). However, it is difficult to understand the papers' limitations unless you happen to have attended an academic seminar where they were presented. Evaluating these publicly will provide a service.
- Specifically for NBER: This working paper series is highly influential and relied upon by policy makers and policy journalists. It'd an elite outlet: only members of NBER are able to post working papers here. Membership is prestigious and available only by invitation.
Fear of public evaluation (safety in numbers): There may be some shyness or reluctance to participate in The Unjournal evaluation process (for reasons to do so, see our benefits to authors discussion). It is scary to be a first mover, and it may feel unfair to be among the few people to have an evaluation of your work out there in public (in spite of the Bayesian arguments presented in the previous link). There should be "safety" in numbers: having a substantial number of prominent papers publicly evaluated by The Unjournal will ease this concern.
Passive evaluation may be preferred to active consent: Academics (especially early-career) may also worry that they will seem weird or rebellious by submitting to The Unjournal, as this may be taken as "rejecting mainstream system norms." Again, this will be less of a problem if a substantial number of public evaluations of prominent papers are posted. You will be in good company. Furthermore, if we are simply identifying papers for evaluation, the authors of these papers cannot be seen as rejecting the mainstream path (as they did not choose to submit).
Piloting and building a track record or demonstration: The Unjournal needs a reasonably large set of high-quality, relevant work to evaluate in order to help us build our system and improve our processes. Putting out a body of curated evaluation work will also allow us to demonstrate the reasonableness and reliability of this process.

Discussion: possible downsides and risks from this, responses

1. Negative backlash: Some authors may dislike having their work publicly evaluated, particularly when there is substantial criticism. Academics complain a lot about unfair peer reviews, but the difference is that here the evaluations will be made public. This might lead The Unjournal to be the target of some criticism.

Responses:

Public engagement in prominent and influential work is fair and healthy. It is good to promote public intellectual debate. Of course, this process needs to allow constructive criticism as well as informative praise.
We will work to ensure that the evaluations we publish involve constructive dialogue, avoid unnecessary harshness, and provide reasons for their critiques. We also give authors the opportunity to respond.
We are focusing on more prominent papers, with authors in more secure positions. Additionally, we offer a potential "embargo" for sensitive career situations, e.g., those that might face early-career researchers.

2. Less author engagement: If authors do not specifically choose to have their work evaluated, they are less likely to engage fullly with the process.

Response: This is something we will keep an eye on, weighing the benefits and costs.

3. Evaluator/referee reluctance: As noted above, evaluators may be more reluctant to provide ratings and feedback on work where the author has not instigated the process.

Response: This should largely be addressed by the fact that we allow evaluators to remain anonymous. A potential cost here is discouraging signed evaluations, which themselves have some benefits (as well as possible costs).

4. Slippery-slope towards "unfairly reviewing work too early": In some fields, working papers are released at a point where the author does not wish them to be evaluated, and where the author is not implicitly making strong claims about the validity of this work. In economics, working papers tend to be released when they are fairly polished and the authors typically seek feedback and citations. The NBER series is a particularly prominent example. However, we don't want extend the scope of direct evaluation too far.

Response: We will be careful with this. Initially, we are extending this evaluation process only to the NBER series. Next, we may consider direct evaluation of fairly prestigious publications in "actual" peer-reviewed journals, particularly in fields (such as psychology) where the peer-review process is much faster than in economics. As NBER is basically "USA-only", we have extended this to other series such as CEPR, while being sensitive to the prestige/vulnerability tradeoffs.

Direct evaluation: eligibility rules and guidelines

NBER

All NBER working papers are generally eligible, but watch for exceptions where authors seem vulnerable in their career. (And remember, we contact authors, so they can plead their case.)

CEPR

We treat these on a case-by-case basis and use discretion. All CEPR members are reasonably secure and successful, but their co-authors might not be, especially if these co-authors are PhD students they are supervising.

Journal-published papers (i.e., 'post-publication evaluation')

In some areas and fields (e.g., psychology, animal product markets) the publication process is relatively rapid or it may fail to engage general expertise. In general, all papers that are already published in peer-reviewed journals are eligible for our direct track.

Papers or projects posted in any other working paper (pre-print) series

These are eligible (without author permission) if all authors

have tenured or ‘long term’ positions at well-known, respected universities or other research institutions, or
have tenure-track positions at top universities (e.g., top-20 globally by some credible rankings), or
are clearly not pursuing an academic career (e.g., the "partner at the aid agency running the trial").

On the other hand, if one or more authors is a PhD student close to graduation or an untenured academic outside a "top global program,’’ then we will ask for permission and potentially offer an embargo.

A possible exception to this exception: If the PhD student or untenured academic is otherwise clearly extremely high-performing by conventional metrics; e.g., an REStud "tourist" or someone with multiple published papers in top-5 journals. In such cases the paper might be considered eligible for direct evaluation.

"Applied and Policy" Track

David Reinstein, Nov 2024: Over the last six months we have considered and evaluated a small amount of work under this “Applied & Policy Stream”. We are planning to continue this stream for the forseeable future.

Why have an “”?

Much of the most impactful research is not aimed at academic audiences and may never be submitted to academic journals. It is written in formats that are very different from traditional academic outputs, and cannot be easily judged by academics using the same standards. Nonetheless, this work may use technical approaches developed in academia, making it important to gain expert feedback and evaluation.

The Unjournal can help here. However, to avoid confusion, we want to make this clearly distinct from our main agenda, which focuses on impactful academically-aimed research.

Our “Applied & Policy Stream” will be clearly labeled as separate from our main stream. This may constitute roughly 10 or 15% of the work that we cover. Below, we refer to this as the “applied stream” for brevity.

What should be included in the Policy stream?

Our considerations for prioritizing this work are generally the same as for our academic stream – is it in the fields that we are focused on, using approaches that enable meaningful evaluation and rating? Is it already having impact (e.g., influencing grant funding in globally-important areas)? Does it have the potential for impact, and if so, is it high-quality enough that we should consider boosting its signal?

We will particularly prioritize policy and applied work that uses technical methods that need evaluation by research experts, often academics.

a range of applied research from EA/GP/LT linked organizations such as GPI, Rethink Priorities, Open Philanthropy, FLI, HLI, Faunalytics, etc., as well as EA-adjacent organizations and relevant government white papers.

How should our (evaluation etc.) policies differ here?

Ratings/metrics: As in the academic stream, this work will be evaluated for its credibility, usefulness, communication/logic, etc. However, we are not seeking to have this work assessed by the standards of academia in a way that yields a comparison to traditional journal tiers. Evaluators: Please ignore these parts of our interface; if you are unsure if it is relevant feel free to ask.

Evaluator selection, number, pay: Generally we want to continue to select academic research experts or non-academic researchers with strong academic and methodological background to do these evaluations. , particularly from academia, to work that is not normally scrutinized by such experts.

The compensation may be flexible as well; in some cases the work may be more involved than for the academic stream and in some cases less involved. As a starting point we will begin by offering the same compensation as for the academic stream.

Careful flagging and signposting: To preserve the reputation of our academic-stream evaluations we need to make it clear, wherever people might see this work, that it is not being evaluated by the same standards as the academic stream and doesn't “count” towards those metrics.

'Conditional embargos' & exceptions

You can request a conditional embargo by emailing us at contact@unjournal.org, or via the submission form. Please explain what sort of embargo you are asking for, and why. By default, we'd like Unjournal evaluations to be made public promptly. However, we may make exceptions in special circumstances, particularly for very early-career researchers.

If there is an early-career researcher on the authorship team, we may allow authors to "embargo" the publication of the evaluation until a later date. Evaluators (referees) will be informed of this. This date can be contingent, but it should not be indefinite.

For example, we might grant an embargo that lasts until after a PhD/postdoc’s upcoming job market or until after publication in a mainstream journal, with a hard maximum of 14 months. (Of course, embargoes can be ended early at the request of the authors.)

In exceptional circumstances we may consider granting a ""

Some examples of possible embargos (need approval)

Extended time to revise and respond

We will invite 2 or 3 relevant experts to evaluate and rate this work, letting them know about the following embargo
When the evaluations come back, we will ask if you want to respond/revise. If you commit to responding (please let us know your plan within 1 week):
1. we will make it public that the evaluations are complete, and you have committed to revise and respond.
2. We will give you 8 weeks to revise the paper, to write a response note how you have revised,
3. We will give the evaluators additional time to adjust their evaluations and ratings in response to your revision/response
4. After this we will publish the evaluation package
If you do not commit to responding, we will post the evaluation package
If you are happy with the evaluations, we can post them at any time, by your request.

Rating-dependent embargo, allowing for revision

We will invite 2 or 3 relevant experts to evaluate and rate this work, letting them know about the following embargo
When the evaluations come back..., we will ask if you want to respond.
1. If all evaluators gave a 4.5 rating or higher as their middle rating on the "Journal rank tier, normative" rating (basically suggesting they think it's at the level meriting publication in a top-5+ journal) we will give you 3 weeks to respond before posting the package. (This is roughly our usual policy)
2. Otherwise (if any rate below 4.5 but none rate it below 3.25) we will give you 8 weeks to revise the paper in response to this, to write a response noting how you have responded. We will give the evaluators further time to adjust their evaluations and ratings in turn, before posting the evaluation package.
3. If any evaluators rate the paper 'fairly negatively' (below 3.25) on this measure, we will grant a six month embargo from this point, before posting the package. During this time you will also have the opportunity to revise and respond, as in the previous case (case 2.2).
If you are happy with the evaluations, we can post them at any time, by your request.

'Job market embargo': Time, rating and outcome-dependent

We will invite 2 or 3 relevant experts to evaluate and rate this work, letting them know about the following embargo
When the evaluations come back. If all evaluators gave a 4.5 rating or higher as their middle rating on the "Journal rank tier, normative" rating (basically suggesting they think it's at the level meriting publication in a top-5+ journal) we will give you 3 weeks to respond before posting the package. (This is roughly our usual policy)
Otherwise we will wait to post the evaluations until June 15, or until all PhD student or Post-doc authors have found a new job (as reported on social media, LinkedIn etc)
1. During the intervening time, you have the opportunity to revise and respond, and if you do we give the evaluators time to update their evaluations and ratings in turn.
If you are happy with the evaluations, we can post them at any time, by your request.

Note: the above are all exceptions to our regular rules, examples of embargos we might or might not agree to.

What research to target?

(for pilot and beyond)

See (earlier) discussion in public call/EA forum discussion HERE.

Management access only: General discussion of prioritization in Gdoc HERE. Private discussion of specific papers in Airtable and links (e.g., HERE). We incorporate some of this discussion below.

High-level considerations for prioritizing research

When considering a piece of research to decide whether to commission it to be evaluated, we can start by looking at its general relevance as well as the value of evaluating and rating it.

Our prioritization of a paper for evaluation should not be seen as an assessment of its quality, nor of its 'vulnerability'. Furthermore, specific and less intensive.

Why is it relevant and worth engaging with?

2. Why does it need (more) evaluation, and what are some key issues and claims to vet?

We ask the people who suggest particular research, and experts in the field:

What are (some of) the authors’ key/important claims that are worth evaluating?
What aspects of the evidence, argumentation, methods, and interpretation, are you unsure about?
What particular data, code, proofs, and arguments would you like to see vetted? If it has already been peer-reviewed in some way, why do you think more review is needed?

Ultimate goals: what are we trying to optimize?

Prioritizing research to achieve these goals

As we weigh research to prioritize for evaluation, we need to balance directly having a positive impact against building our ability to have an impact in the future.

A. Direct impact (‘score goals now’)

Below, we adapt the "ITN" cause prioritization framework (popular in effective altruism circles) to assess the direct impact of our evaluations.

Importance

What is the direct impact potential of the research?

This is a massive question many have tried to address (see sketches and links below). We respond to uncertainty around this question in several ways, including:

Consulting a range of sources, not only EA-linked sources.
- EA and more or less adjacent: Agendas and overviews, Syllabi.
- Non-EA, e.g., https://globalchallenges.org/.
Scoping what other sorts of work are representative inputs to GP-relevant work.
- Get a selection of seminal GP publications; look back to see what they are citing and categorize by journal/field/keywords/etc.

Neglectedness

Where is the current journal system failing GP-relevant work the most . . . in ways we can address?

Tractability

“Evaluability” of research: Where does the UJ approach yield the most insight or value of information?
Existing expertise: Where do we have field expertise on the UJ team? This will help us commission stronger evaluations.
"Feedback loops": Could this research influence concrete intervention choices? Does it predict near-term outcomes? If so, observing these choices and outcomes and getting feedback on the research and our evaluation can yield strong benefits.

Consideration/discussion: How much should we include research with indirect impact potential (theoretical, methodological, etc.)?

B. Sustainability: funding, support, participation

Relevance to stakeholders and potential supporters
Clear connections to impact; measurability
Support from relevant academic communities
Support from open science

Consideration/discussion: What will drive further interest and funding?

C. Credibility, visibility, driving positive institutional change

Interest and involve academics—and build the status of the project.
Commission evaluations that will be visibly useful and credible.
‘Benchmark traditional publication outcomes’, track our predictiveness and impact.
Have strong leverage over research "outcomes and rewards."
Increase public visibility and raise public interest.
Bring in supporters and participants.
Achieve substantial output in a reasonable time frame and with reasonable expense.
Maintain goodwill and a justified reputation for being fair and impartial.

But some of these concerns may have trade offs

We are aware of possible pitfalls of some elements of our vision.

A focus on topics perceived as niche (e.g., the economics and game theory of AI governance and AI safety) may bring a similar tradeoff.

Data: what are we evaluating/considering?

We present and analyze the specifics surrounding our current evaluation data in this interactive notebook/dashboard here.

Proposed template

Title

One-click-link to paper
Link to any private hosted comments on the paper/project

Summary; why is this research relevant and worth engaging with?

Why does it need (more) review, and what are some key issues and claims to vet?

What sort of reviewers should be sought, and what should they be asked?

What types of expertise and background would be most appropriate for the evaluation? Who would be interested? Please try to make specific suggestions.

How well has the author engaged with the process?

Do they need particular convincing? Do they need help making their engagement with The Unjournal successful?

"Direct evaluation" track

Second track: Direct evaluation of prominent work

In addition to soliciting research submissions by authors, we directly prioritize unsubmitted research for evaluation, with a specific process and set of rules, outlined below.

Choose a set of "top-tier working paper series" and medium-to-top-tier journals.
This program started with the NBER working paper series. We expanded this beyond NBER to research posted in other exclusive working paper archives and to work where all authors seem to be prominent, secure, and established. See #direct-evaluation-eligibility-rules-and-guidelines.
Identify relevant papers in this series, following our stated criteria (i.e., relevance, strength, need for further review). For NBER this tends to include
- recently released work in the early stages of the journal peer-review process, particularly if it addresses a timely subject; as well as
- work that has been around for many years, is widely cited and influential, yet has never been published in a peer-reviewed journal.

We do this systematically and transparently; authors shouldn't feel singled out nor left out.

Notify the work's authors that The Unjournal plans to commission evaluations. We're not asking for permission, but
- making them aware of The Unjournal, the process, the benefits to authors, and the authors' opportunities to engage with the evaluation and publicly respond to the evaluation before it is made public;
- letting us know if we have the most recent version of the paper, and if updates are coming soon;
- letting the authors complete our forms if they wish, giving further information about the paper or e.g. adding a "permalink" to updated versions;
- asking if there are authors in sensitive career positions justifying a temporary "embargo"; and
- asking the authors if there is specific feedback they would like to receive.
Reaching out to and commissioning evaluators, as in our regular process. Considerations:
- Evaluators should be made aware that the authors have not directly requested this review, but have been informed it is happening.
- As this will allow us to consider a larger set of papers more quickly, we can reach out to multiple evaluators more efficiently.

The case for this "direct evaluation"

Public benefit: Working papers (especially NBER) are already influencing policy and debate, yet they have not been peer-reviewed and may take years to go through this process, if ever (e.g., many NBER papers are never published in peer-reviewed journals). However, it is difficult to understand the papers' limitations unless you happen to have attended an academic seminar where they were presented. Evaluating these publicly will provide a service.
- Specifically for NBER: This working paper series is highly influential and relied upon by policy makers and policy journalists. It'd an elite outlet: only members of NBER are able to post working papers here. Membership is prestigious and available only by invitation.
Fear of public evaluation (safety in numbers): There may be some shyness or reluctance to participate in The Unjournal evaluation process (for reasons to do so, see our benefits to authors discussion). It is scary to be a first mover, and it may feel unfair to be among the few people to have an evaluation of your work out there in public (in spite of the Bayesian arguments presented in the previous link). There should be "safety" in numbers: having a substantial number of prominent papers publicly evaluated by The Unjournal will ease this concern.
Passive evaluation may be preferred to active consent: Academics (especially early-career) may also worry that they will seem weird or rebellious by submitting to The Unjournal, as this may be taken as "rejecting mainstream system norms." Again, this will be less of a problem if a substantial number of public evaluations of prominent papers are posted. You will be in good company. Furthermore, if we are simply identifying papers for evaluation, the authors of these papers cannot be seen as rejecting the mainstream path (as they did not choose to submit).
Piloting and building a track record or demonstration: The Unjournal needs a reasonably large set of high-quality, relevant work to evaluate in order to help us build our system and improve our processes. Putting out a body of curated evaluation work will also allow us to demonstrate the reasonableness and reliability of this process.

Discussion: possible downsides and risks from this, responses

Responses:

Public engagement in prominent and influential work is fair and healthy. It is good to promote public intellectual debate. Of course, this process needs to allow constructive criticism as well as informative praise.
We will work to ensure that the evaluations we publish involve constructive dialogue, avoid unnecessary harshness, and provide reasons for their critiques. We also give authors the opportunity to respond.
We are focusing on more prominent papers, with authors in more secure positions. Additionally, we offer a potential "embargo" for sensitive career situations, e.g., those that might face early-career researchers.

2. Less author engagement: If authors do not specifically choose to have their work evaluated, they are less likely to engage fullly with the process.

Response: This is something we will keep an eye on, weighing the benefits and costs.

3. Evaluator/referee reluctance: As noted above, evaluators may be more reluctant to provide ratings and feedback on work where the author has not instigated the process.

Aside: in the future, we hope to work directly with working paper series, associations, and research groups to get their approval and engagement with Unjournal evaluations. We hope that having a large share of papers in your series evaluated will serve as a measure of confidence in your research quality. If you are involved in such a group and are interested in this, please reach out to us ().

Direct evaluation: eligibility rules and guidelines

NBER

All NBER working papers are generally eligible, but watch for exceptions where authors seem vulnerable in their career. (And remember, we contact authors, so they can plead their case.)

CEPR

Journal-published papers (i.e., 'post-publication evaluation')

Papers or projects posted in any other working paper (pre-print) series

These are eligible (without author permission) if all authors

have tenured or ‘long term’ positions at well-known, respected universities or other research institutions, or
have tenure-track positions at top universities (e.g., top-20 globally by some credible rankings), or
are clearly not pursuing an academic career (e.g., the "partner at the aid agency running the trial").

A possible exception to this exception: If the PhD student or untenured academic is otherwise clearly extremely high-performing by conventional metrics; e.g., an REStud "tourist" or someone with multiple published papers in top-5 journals. In such cases the paper might be considered eligible for direct evaluation.

Project submission, selection and prioritization

Submission/evaluation funnel

Our procedures for identification and prioritization

Authors' permission: sometimes required

Communicating: "editors'" process

Summary: why is it relevant and worth engaging with?

Why does it need (more) review? What are some key issues or claims to vet?

To what extent is there author engagement?

What research to target?

High-level considerations for prioritizing research

Ultimate goals: what are we trying to optimize?

Prioritizing research to achieve these goals

A. Direct impact (‘score goals now’)

B. Sustainability: funding, support, participation

C. Credibility, visibility, driving positive institutional change

Data: what are we evaluating/considering?

Title

Summary; why is this research relevant and worth engaging with?

Why does it need (more) review, and what are some key issues and claims to vet?

What sort of reviewers should be sought, and what should they be asked?

How well has the author engaged with the process?

What specific areas do we cover?

Fields, methods, and approaches

Why this field/method focus?

Ex.: work we included/excluded based on field/method

Outcomes, focus areas, and causes

Fields

Examples of work we chose to prioritize or de-prioritize based on focus area

Process: prioritizing research

Prioritization ratings: discussion

So what goes into this "prioritization rating"; what does it mean?

Some key points

One possible way of considering the rating criteria

Key attributes/factors

Prestigious work

Less prestigious work

Suggesting research (forms, guidance)

Paths to suggest research

1. Through our

Review The Unjournal's Guidelines

Fill out the Suggestion Form

2. For Field Specialists and managers: via Airtable

Further guidance

Notes for field specialists/Unjournal Team

"Direct evaluation" track

Second track: Direct evaluation of prominent work

Direct evaluation: eligibility rules and guidelines

NBER

CEPR

Journal-published papers (i.e., 'post-publication evaluation')

Papers or projects posted in any other working paper (pre-print) series

"Applied and Policy" Track

Why have an “Applied & Policy Stream”?

What should be included in the Policy stream?

How should our (evaluation etc.) policies differ here?

'Conditional embargos' & exceptions

Some examples of possible embargos (need approval)

Prioritization ratings: discussion

So what goes into this "prioritization rating"; what does it mean?

Some key points

One possible way of considering the rating criteria

Key attributes/factors

Prestigious work

Less prestigious work

What research to target?

High-level considerations for prioritizing research

Ultimate goals: what are we trying to optimize?

Prioritizing research to achieve these goals

A. Direct impact (‘score goals now’)

B. Sustainability: funding, support, participation

C. Credibility, visibility, driving positive institutional change

Data: what are we evaluating/considering?

Title

Summary; why is this research relevant and worth engaging with?

Why does it need (more) review, and what are some key issues and claims to vet?

What sort of reviewers should be sought, and what should they be asked?

How well has the author engaged with the process?

What specific areas do we cover?

Fields, methods, and approaches

Why this field/method focus?

Why have an “”?

Why have an “”?