Promoting open and robust science

TLDR: Unjournal promotes research replicability/robustness

Unjournal evaluations aim to support the "Reproducibility/Robustness-Checking" (RRC) agenda. We are directly engaging with the Institute for Replication (I4R) and the repliCATS project (RC), and building connections to Replication Lab/TRELiSS and Metaculus.

We will support this agenda by:

  1. Promoting data and code sharing: We request pre-print authors to share their code and data, and reward them for their transparency.

  2. Promoting 'Dynamic Documents' and 'Living Research Projects': Breaking out of "PDF prisons" to achieve increased transparency.

  3. Encouraging detailed evaluations: Unjournal evaluators are asked to:

    • highlight the key/most relevant research claims, results, and tests;

    • propose possible robustness checks and tests (RRC work); and

    • make predictions for these tests.

  4. Implementing computational replication and robustness checking: We aim to work with I4R and other organizations to facilitate and evaluate computational replication and robustness checking.

  5. Advocating for open evaluation: We prioritize making the evaluation process transparent and accessible for all.

Research credibility

While the replication crisis in psychology is well known, economics is not immune. Some very prominent and influential work has blatant errors, depends on dubious econometric choices or faulty data, is not robust to simple checks, or uses likely-fraudulent data. Roughly 40% of experimental economics work fail to replicate. Prominent commenters have argued that the traditional journal peer-review system does a poor job of spotting major errors and identifying robust work.

Supporting the RRC agenda through Unjournal evaluations

My involvement with the SCORE replication market project shed light on a key challenge (see Twitter posts): The effectiveness of replication depends on the claims chosen for reproduction and how they are approached. I observed that it was common for the chosen claim to miss the essence of the paper, or to focus on a statistical result that, while likely to reproduce, didn't truly convey the author's message.

Simultaneously, I noticed that many papers had methodological flaws (for instance, lack of causal identification or the presence of important confounding factors in experiments). But I thought that these studies, if repeated, would likely yield similar results. These insights emerged from only a quick review of hundreds of papers and claims. This indicates that a more thorough reading and analysis could potentially identify the most impactful claims and elucidate the necessary RRC work.

Indeed, detailed, high-quality referee reports for economics journals frequently contain such suggestions. However, these valuable insights are often overlooked and rarely shared publicly. Unjournal aims to change this paradigm by focusing on three main strategies:

  1. Identifying vital claims for replication:

    • We plan to have Unjournal evaluators help highlight key "claims to replicate," along with proposing replication goals and methodologies. We will flag papers that particularly need replication in specific areas.

    • Public evaluation and author responses will provide additional insight, giving future replicators more than just the original published paper to work with.

  2. Encouraging author-assisted replication:

    • The Unjournal's platform and metrics, promoting dynamic documents and transparency, simplify the process of reproduction and replication.

    • By emphasizing replicability and transparency at the working-paper stage (Unjournal evaluations’ current focus), we make authors more amenable to facilitate replication work in later stages, such as post-traditional publication.

  3. Predicting replicability and recognizing success:

    • We aim to ask Unjournal evaluators to make predictions about replicability. When these are successfully replicated, we can offer recognition. The same holds for repliCATS aggregated/IDEA group evaluations: To know if we are credibly assessing replicability, we need to compare these to at least some "replication outcomes."

    • The potential to compare these predictions to actual replication outcomes allows us to assess the credibility of our replicability evaluations. It may also motivate individuals to become Unjournal evaluators, attracted by the possibility of influencing replication efforts.

By concentrating on NBER papers, we increase the likelihood of overlap with journals targeted by the Institute for Replication, thus enhancing the utility of our evaluations in aiding replication efforts.

Other mutual benefits/synergies

We can rely on and build a shared talent pool: UJ evaluators may be well-suited—and keen—to become robustness-reproducers (of these or other papers) as well as repliCATS participants.

We see the potential for synergy and economies of scale and scope in other areas, e.g., through:

  • sharing of IT/UX tools for capturing evaluator/replicator outcomes, and statistical or info.-theoretic tools for aggregating these outcomes;

  • sharing of protocols for data, code, and instrument availability (e.g., Data and Code Availability Standard);

  • communicating the synthesis of "evaluation and replication reports"; or

  • encouraging institutions, journals, funders, and working paper series to encourage or require engagement.

More ambitiously, we may jointly interface with prediction markets. We may also jointly integrate into platforms like OSF as part of an ongoing process of preregistration, research, evaluation, replication, and synthesis.

Broader synergies in the medium term

As a "journal-independent evaluation" gains career value, as replication becomes more normalized, and as we scale up:

  • This changes incentive systems for academics, which makes rewarding replication/replicability easier than with the traditional journals’ system of "accept/reject, then start again elsewhere."

  • The Unjournal could also evaluate I4rep replications, giving them status.

  • Public communication of Unjournal evaluations and responses may encourage demand for replication work.

In a general sense, we see cultural spillovers in the willingness to try new systems for reward and credibility, and for the gatekeepers to reward this behavior and not just the traditional "publication outcomes".

Last updated

#536:

Change request updated