At least initially, weâre planning to ask for questions that could be definitively answered and/or measured quantitatively, and we will help organizations and other suggesters refine their questions to make this the case. These should approximately resemble questions that could be posted on forecasting platforms such as Manifold Markets or Metaculus. These should also somewhat resemble the 'claim identification' we currently request from evaluators.
Phil Tetlockâs âClairvoyance Testâ is particularly relevant. As Metaculus explain it:
if you handed your question to a genuine clairvoyant, could they see into the future and definitively tell you [the answer]? Some questions like âWill the US decline as a world power?â...âWill an AI exhibit a goal not supplied by its human creators?â struggle to pass the Clairvoyance Test⌠How do you tell one type of AI goal from another, and how do you even define it?... In the case of whether the US might decline as a world power, youâd want to get at the theme with multiple well-formed questions such as âWill the US lose its #1 position in the IMFâs annual GDP rankings before 2050?â.... These should also somewhat resemble the we currently request from evaluators.
Metaculus and Manifold: claim resolution.
Some questions are important, but difficult to make specific, focused, and operationalizable. For example (from ):
âWhat can economic models ⌠tell us about recursive self improvement in advanced AI systems?â
âHow likely would catastrophic long-term outcomes be if everyone in the future acts for their own self-interest alone?â
âHow could AI transform domestic and mass politics?â
Other questions are easier to operationalize or break down into several specific sub-questions. For example (again from ):
Could advances in AI lead to ? Is it the most likely source of such risks?
I rated this a 3/10 in terms of how operationalized it was. The word âcouldâ is vague. âCouldâ might suggest some reasonable probability outcome (1%, 0.1%, 10%), or it might be interpreted as âcan I think of any scenario in which this holds?â âVery bad outcomesâ also needs a specific measure.
However, we can reframe this to be more operationalized. E.g., here are some fairly well-operationalized questions:
What is the risk of a catastrophic loss (defined as the death of at least 10% of the human population over any five year period) occurring before the year 2100?
How does this vary depending on the total amount of money invested in computing power for building advanced AI capabilities over the same period?
Here are some highly operationalizable questions developed by the :
What percentage of plant-based meat alternative (PBMA) units/meals sold displace a unit/meal of meat?
What percentage of people will be [vegetarian or vegan] in 20, 50, or 100 years?
And a few more posed and addressed by :
How much of global greenhouse gas emissions come from food? ()
What share of global COâ emissions come from aviation? ()
However, note that many of the above questions are descriptive or predictive. We are also very interested in causal questions such as
What is the impact of an increase (decrease) in blood lead level by one ânatural log unitâ on childrenâs learning in the developing world (measured in standard deviation units)?