Why are we seeking these pivotal questions to be 'operationalizable'?
This is in line with our own focus on this type of research
The Unjournal mainly focuses on evaluating (largely empirical) research that clearly poses and answers specific impactful questions, rather than research that seeks to define a question, survey a broad landscape of other research, open routes to further inquiry, etc.
I think this will help us focus on fully-baked questions, where the answer is likely to provide actual value to the target organization and others (and avoid the old â trap).
It offers potential for benchmarking and validation (e.g., using prediction markets), specific routes to measure our impact (updated beliefs, updated decisions), and informing the weâre asking from evaluators (see footnote above).
However, as this initiative progresses we may allow a wider range of questions, e.g., more open-ended, multi-outcome, non-empirical (perhaps ânormative), and best-practice questions.
The Unjournal commissions public evaluations of impactful research in quantitative social sciences fields. We are seeking âpivotal questionsâ to guide our choice of research papers to commission for evaluation. We are reaching out to organizations that aim to use evidence to do the most good, and asking: Which open questions most affect your policies and funding recommendations? For which questions would research yield the highest âvalue of informationâ?
Our main approach has been to search for papers and then commission experts to publicly evaluate them. (For more about our process, see here). Our field specialist teams search and monitor prominent research archives (like NBER), and consider agendas from impactful organizations, while keeping an eye on forums and social media. Our approach has largely been to look for research that seems relevant to impactful questions and crucial considerations. We're now exploring turning this on its head and identifying pivotal questions first and evaluating a cluster of research that informs these. This could offer a more efficient and observable path to impact. (See our for context.)
The Unjournal will ask impact-focused research-driven organizations such as GiveWell, Open Philanthropy, and Charity Entrepreneurship to identify specific quantifiable questions. that impact their funding, policy, and research-direction choices. For example, if an organization is considering whether to fund a psychotherapeutic intervention in a LMIC, they might ask âHow much does a brief course of non-specialist psychotherapy increase happiness, compared to the same amount spent on direct cash transfers?â Weâre looking for the questions with the highest value-of-information (VOI) for the organizationâs work over the next few years. We have some requirements â the questions should relate to The Unjournalâs coverage areas and engage rigorous research in economics, social science, policy, or impact quantification. Ideally, organizations will identify at least one piece of publicly-available research that relates to their question. But we are doing this mainly to help these organizations, so we will try to keep it simple and low-effort for them.
The Unjournal team will then discuss the suggested questions, leveraging our field specialistsâ expertise. Weâll rank these questions, prioritizing at least one for each organization. Weâll work with the organization to specify the priority question precisely and in a useful way. We want to be sure that 1. evaluators will interpret these questions as intended, and 2. the answers that come out are likely to be actually helpful. Weâll make these lists of questions public and solicit general feedback â on the relevance of the questions, on their framing, on key sub-questions, and on pointers to relevant research.
Where practicable, we will operationalize the target questions as a claim on a prediction market (for example, Metaculus) to be resolved by the evaluations and synthesis below.
Where feasible, post these on public prediction markets (such as Metaculus)
If the question is well operationalized, and we have a clear approach to 'resolving it' after the evaluations and synthesis, we will post it on a reputation-based market like or . Metaculus is offering 'minitaculus' platforms such as to enable these more flexible questions.
We will ask (and help) the organizations and interested parties to specify their own beliefs about these questions, aka their 'priors'. We may adapt the Metaculus interface for this.
Once weâve converged on the target question, weâll do a variation of our usual evaluation process.
For each question we will prioritize roughly two to five relevant research papers. These papers may be suggested by the organization that suggested the question, sourced by The Unjournal, or discovered through community feedback (see note).
As we normally do, weâll have âevaluation managersâ recruit expert evaluators to assess each paper. However, weâll ask the evaluators to focus on the target question, and to consider the target organizationâs priorities.
Weâll also enable phased deliberation and discussion among evaluators. This is inspired by the, and some evidence suggesting that the (mechanistically aggregated) estimates of experts after deliberations perform better than their independent estimates (also mechanistically aggregated). We may also facilitate collaborative evaluations and âlive reviewsâ, following the examples of , , and others.
We will contact both the research authors (as per our standard process) and the target organizations for their responses to the evaluations, and for follow up questions. Weâll foster a productive discussion between them (while preserving anonymity as requested, and being careful not to overtax peopleâs time and generosity)
Weâll commission one or more evaluation managers to write a report as a summary of the research investigated.
These reports should synthesize âWhat do the research, evaluations, and responses say about the question/claim?â They should provide an overall metric relating to the truth value of the target question (or similar for the parameter of interest). If and when we integrate prediction markets, they should decisively resolve the market claim.
Next, we will share these synthesis reports with authors and organizations for feedback.
Weâll put up each evaluation on our page, bringing them into academic search tools, databases, bibliometrics, etc. Weâll also curate them, linking them to the relevant target question and to the synthesis report..
We will produce, share, and promote further summaries of these packages. This could include forum and blog posts summarizing the results and insights, as well as interactive and visually appealing web pages. We might also produce less technical content, perhaps submitting work to outlets like, , or .
At least initially, weâre planning to ask for questions that could be definitively answered and/or measured quantitatively, and we will help organizations and other suggesters refine their questions to make this the case. These should approximately resemble questions that could be posted on forecasting platforms such as or . These should also somewhat resemble the we currently request from evaluators.
We give detailed guidance with examples below:
Why do we want these pivotal questions to be 'operationalizable'?
Weâre still refining this idea, and looking for your suggestions about what is unclear, what could go wrong, what might make this work better, what has been tried before, and where the biggest wins are likely to be. Weâd appreciate your feedback! (Feel free to email to make suggestions or arrange a discussion.)
If you work for an impact-focused research organization and you are interested in participating in our pilot, please reach out to us at contact@unjournal.org to flag your interest and/or complete . We would like to see:
A brief description of what your organization does (your âabout usâ page is fine)
A specific, , high-value claim or research question you would like to be evaluated, that is within our scope (~quantitative social science, economics, policy, and impact measurement)
A brief explanation of why this question is particularly high value for your organization or your work, and how you have tried to answer it
Please also let us know how you would like to engage with us on refining this question and addressing it. Do you want to follow up with a 1-1 meeting? How much time are you willing to put in? Who, if anyone, should we reach out to at your organization?
Remember that we plan to make all of this analysis and evaluation public.
If you donât represent an organization, we still welcome your suggestions, and will try to give feedback.
(Note on 'bounties'.)
Please remember that we currently focus on quantitative ~social sciences fields, including economics, policy, and impact modeling (see for more detail on our coverage). Questions surrounding (for example) technical AI safety, microbiology, or measuring animal sentience are less likely to be in our domain.
If you want to talk about this first, or if you have any questions, please send an email or with David Reinstein, our co-founder and director.
At least initially, weâre planning to ask for questions that could be definitively answered and/or measured quantitatively, and we will help organizations and other suggesters refine their questions to make this the case. These should approximately resemble questions that could be posted on forecasting platforms such as or . These should also somewhat resemble the we currently request from evaluators.
is particularly relevant. As
If possible, a link to at least one research paper that relates to this question
Optionally, your current beliefs about this question (your âpriorsâ)
if you handed your question to a genuine clairvoyant, could they see into the future and definitively tell you [the answer]? Some questions like âWill the US decline as a world power?â...âWill an AI exhibit a goal not supplied by its human creators?â struggle to pass the Clairvoyance Test⌠How do you tell one type of AI goal from another, and how do you even define it?... In the case of whether the US might decline as a world power, youâd want to get at the theme with multiple well-formed questions such as âWill the US lose its #1 position in the IMFâs annual GDP rankings before 2050?â.... These should also somewhat resemble the we currently request from evaluators.
Metaculus and Manifold: claim resolution.
Some questions are important, but difficult to make specific, focused, and operationalizable. For example (from ):
âWhat can economic models ⌠tell us about recursive self improvement in advanced AI systems?â
âHow likely would catastrophic long-term outcomes be if everyone in the future acts for their own self-interest alone?â
âHow could AI transform domestic and mass politics?â
Other questions are easier to operationalize or break down into several specific sub-questions. For example (again from ):
Could advances in AI lead to ? Is it the most likely source of such risks?
I rated this a 3/10 in terms of how operationalized it was. The word âcouldâ is vague. âCouldâ might suggest some reasonable probability outcome (1%, 0.1%, 10%), or it might be interpreted as âcan I think of any scenario in which this holds?â âVery bad outcomesâ also needs a specific measure.
However, we can reframe this to be more operationalized. E.g., here are some fairly well-operationalized questions:
What is the risk of a catastrophic loss (defined as the death of at least 10% of the human population over any five year period) occurring before the year 2100?
How does this vary depending on the total amount of money invested in computing power for building advanced AI capabilities over the same period?
Here are some highly operationalizable questions developed by the :
What percentage of plant-based meat alternative (PBMA) units/meals sold displace a unit/meal of meat?
What percentage of people will be [vegetarian or vegan] in 20, 50, or 100 years?
And a few more posed and addressed by :
How much of global greenhouse gas emissions come from food? ()
What share of global COâ emissions come from aviation? ()
However, note that many of the above questions are descriptive or predictive. We are also very interested in causal questions such as
What is the impact of an increase (decrease) in blood lead level by one ânatural log unitâ on childrenâs learning in the developing world (measured in standard deviation units)?