Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
TLDR: Unjournal promotes research replicability/robustness
Unjournal evaluations aim to support the "Reproducibility/Robustness-Checking" (RRC) agenda. We are directly engaging with the Institute for Replication (I4R) and the repliCATS project (RC), and building connections to Replication Lab/TRELiSS and Metaculus.
We will support this agenda by:
Promoting data and code sharing: We request pre-print authors to share their code and data, and reward them for their transparency.
Promoting 'Dynamic Documents' and 'Living Research Projects': Breaking out of "PDF prisons" to achieve increased transparency.
Encouraging detailed evaluations: Unjournal evaluators are asked to:
highlight the key/most relevant research claims, results, and tests;
propose possible robustness checks and tests (RRC work); and
make predictions for these tests.
Implementing computational replication and robustness checking: We aim to work with I4R and other organizations to facilitate and evaluate computational replication and robustness checking.
Advocating for open evaluation: We prioritize making the evaluation process transparent and accessible for all.
While the replication crisis in psychology is well known, economics is not immune. Some very prominent and influential work has blatant errors, depends on dubious econometric choices or faulty data, is not robust to simple checks, or uses likely-fraudulent data. Roughly 40% of experimental economics work fail to replicate. Prominent commenters have argued that the traditional journal peer-review system does a poor job of spotting major errors and identifying robust work.
My involvement with the SCORE replication market project shed light on a key challenge (see Twitter posts): The effectiveness of replication depends on the claims chosen for reproduction and how they are approached. I observed that it was common for the chosen claim to miss the essence of the paper, or to focus on a statistical result that, while likely to reproduce, didn't truly convey the author's message.
Simultaneously, I noticed that many papers had methodological flaws (for instance, lack of causal identification or the presence of important confounding factors in experiments). But I thought that these studies, if repeated, would likely yield similar results. These insights emerged from only a quick review of hundreds of papers and claims. This indicates that a more thorough reading and analysis could potentially identify the most impactful claims and elucidate the necessary RRC work.
Indeed, detailed, high-quality referee reports for economics journals frequently contain such suggestions. However, these valuable insights are often overlooked and rarely shared publicly. Unjournal aims to change this paradigm by focusing on three main strategies:
Identifying vital claims for replication:
We plan to have Unjournal evaluators help highlight key "claims to replicate," along with proposing replication goals and methodologies. We will flag papers that particularly need replication in specific areas.
Public evaluation and author responses will provide additional insight, giving future replicators more than just the original published paper to work with.
Encouraging author-assisted replication:
The Unjournal's platform and metrics, promoting dynamic documents and transparency, simplify the process of reproduction and replication.
By emphasizing replicability and transparency at the working-paper stage (Unjournal evaluations’ current focus), we make authors more amenable to facilitate replication work in later stages, such as post-traditional publication.
Predicting replicability and recognizing success:
We aim to ask Unjournal evaluators to make predictions about replicability. When these are successfully replicated, we can offer recognition. The same holds for repliCATS aggregated/IDEA group evaluations: To know if we are credibly assessing replicability, we need to compare these to at least some "replication outcomes."
The potential to compare these predictions to actual replication outcomes allows us to assess the credibility of our replicability evaluations. It may also motivate individuals to become Unjournal evaluators, attracted by the possibility of influencing replication efforts.
By concentrating on NBER papers, we increase the likelihood of overlap with journals targeted by the Institute for Replication, thus enhancing the utility of our evaluations in aiding replication efforts.
Should research projects be improved and updated 'in the same place', rather than with 'extension papers'?
Small changes and fixes: The current system makes it difficult to make minor updates – even obvious corrections – to published papers. This makes these papers less useful and less readable. If you find an error in your own published work, there is also little incentive to note it and ask for a correction, even if this were possible.
In contrast, a 'living project' could be corrected and updated in situ. If future and continued evaluations matter, they will have the incentive to do so.
Lack of incentives for updates and extensions: If academic researchers see major ways to improve and build on their past work, these can be hard to get published and get credit for. The academic system rewards novelty and innovation, and top journals are reluctant to publish 'the second paper' on a topic. As this would count as 'a second publication' (for tenure etc.), authors may be accused of double-dipping, and journals and editors may punish them for this.
Clutter and confusion in the literature: Because of the above, researchers often try to spin an improvement to a previous paper as very new and different. They do sometimes publish a range of papers getting at similar things and using similar methods, in different papers/journals. This makes it hard for other researchers and readers to understand which paper they should read.
In contrast, a 'living project' can keep these in one place. The author can lay out different chapters and sections in ways that make the full work most useful.
But we recognize there may also be downsides to _'_all extensions and updates in a single place'...
Some discussion follows. Note that the Unjournal enables this but does not require it.
By “Dynamic Documents” I mean papers/projects built with Quarto, R-markdown, or JuPyTer notebooks (the most prominent tools) that do and report the data analysis (as well as math/simulations) in the same space that the results and discussion are presented (with ‘code blocks’ hidden).
I consider some of the benefits of this format, particularly for EA-aligned organizations like Open Philanthropy: Benefits of Dynamic Documents
“Continually update a project” rather than start a “new extension paper” when you see what you could have done better.
The main idea is that each version is given a specific time stamp, and that is the object that is reviewed and cited. This is more or less already the case when we cite working papers/drafts/mimeos/preprints.
See #living-kaizend-research-projects, further discussing the potential benefits.
'Dynamic Documents' are projects or papers that are developed using prominent tools such as R-markdown or JuPyTer notebooks (the two most prominent tools).
The salient features and benefits of this approach include:
Integrated data analysis and reporting means the data analysis (as well as math/simulations) is done and reported in the same space that the results and discussion are presented. This is made possible through the concealment of 'code blocks'.
Transparent reporting means you can track exactly what is being reported and how it was constructed:
Making the process a lot less error-prone
Helping readers understand it better (see 'explorable explanations')
Helping replicators and future researchers build on it
Other advantages of these formats (over PDFs for example) include:
Convenient ‘folding blocks’
Margin comments
and links
Integrating interactive tools
Some quick examples from my own work in progress (but other people have done it much better)
The features of The Unjournal, and what the project offers beyond the traditional academic publication methods.
See sections below:
: Shows how The Unjournal's process reduces the traditionally high costs and 'games' associated with standard publication mechanisms.
: The Unjournal promotes research replicability/robustness in line with the RRC agenda.
: The Unjournal aims to enhance research reliability, accessibility, and usefulness through a robust evaluation system, fostering a productive bridge between mainstream and EA-focused researchers.
: Addresses possible information hazards in open research.
: The Unjournal's open evaluation model expedites and enhances research reviews by providing transparent, incentivized feedback and valuable, public metrics.
: Discusses our method of obtaining separate evaluations on various aspects of a research project—methodological, theoretical, and applied—from diverse expert groups, which leads to more comprehensive and insightful feedback.
: Explains the terms 'dynamic documents' and 'living projects' in relation to our model, and how they facilitate continuous growth in research projects.
: Why open dynamic documents (such as ) are better for research than 'PDF prisons', the conventional static PDF format that dominates research.
: Details these 'living projects' and how, under our approach, they can continuously evolve, receive evaluations, and undergo improvements within the same environment.
Claim: Rating and feedback is better than an ‘all-or-nothing’ accept/reject process. Although people like to say “peer review is not binary”, the consequences are.
“Publication in a top journal” is used as a signal and a measuring tool for two major purposes. First, policymakers, journalists, and other researchers look at where a paper is published to assess whether the research is credible and reputable. Second, universities and other institutions use these publication outcomes to guide hiring, tenure, promotion, grants, and other ‘rewards for researchers.’
Did you know?: More often than not, of the "supply of spaces in journals” and the “demand to publish in these journals”. Who is the consumer? Certainly not the perhaps-mythical creature known as the ‘reader’.
In the field of economics, between the ‘first working paper’ that is publicly circulated and the final publication. During that time, the paper may be substantially improved, but it may not be known to nor accepted by practitioners. Meanwhile, it provides little or no career value to the authors.
As a result, we see three major downsides:
Time spent gaming the system:
Researchers and academics spend a tremendous amount of time 'gaming' this process, at the expense of actually doing .
Randomness in outcomes, unnecessary uncertainty and stress
Wasted feedback, including reviewer's time
There is a lot of pressure, and even bullying, to achieve these “publication outcomes” at the expense of careful methodology.
The current system can sideline deserving work due to unpredictable outcomes. There's no guarantee that the cream will rise to the top, making research careers much more stressful—even driving out more risk-averse researchers—and sometimes encouraging approaches that are detrimental to good science.
However, researchers often have a very narrow focus on getting the paper published as quickly and in as high-prestige a journal as possible. Unless the review is part of a 'Revise and Resubmit' that the author wants to fulfill, they may not actually put the comments into practice or address them in any way.
Of course, the reviews may be misinformed, mistaken, or may misunderstand aspects of the research. However, if the paper is rejected (even if the reviewer was positive about the paper), the author has no opportunity or incentive to respond to the reviewer. Thus the misinformed reviewer may remain in the dark.
The other side of the coin: a lot of effort is spent trying to curry favor with reviewers who are often seen as overly fussy and not always in the direction of good science.
Traditional peer review is a closed process, with reviewers' and editors' comments and recommendations hidden from the public.
In contrast, a (along with authors' responses and evaluation manager summaries) are made public and easily accessible. We give each of these a separate DOI and work to make sure each enters the literature and bibliometric databases. We aim further to curate these, making it easy to see the evaluators' comments in the context of the research project (e.g., with sidebar/hover annotation).
Open evaluation is more useful:
to other researchers and students (especially those early in their careers). Seeing the dialogue helps them digest the research itself and understand its relationship to the wider field. It helps them understand the strengths and weaknesses of the methods and approaches used, and how much agreement there is over these choices. It gives an inside perspective on how evaluation works.
to people using the research, providing further perspectives on its value, strengths and weaknesses, implications, and applications.
Publicly posting evaluations and responses may also lead to higher quality and more reliability. Evaluators can choose whether or not they wish to remain anonymous; there are , but in either case, the fact that all the content is public may encourage evaluators to more fully and transparently express their reasoning and justifications. (And where they fail to do so, readers of the evaluation can take this into account.)
The fact that we are asking for evaluations and ratings of all the projects in our system—and not using "accept/reject"—should also drive more careful and comprehensive evaluation and feedback. At a traditional top-ranked journal, a reviewer may limit themselves to a few vague comments implying that the paper is "not interesting or strong enough to merit publication." This would not make sense within the context of The Unjournal.
We do not "accept or reject" papers; we are evaluating research, not "publishing" it. But then, how do other researchers and students know whether the research is worth reading? How can policymakers know whether to trust it? How can it help a researcher advance their career? How can grantmakers and organizations know whether to fund more of this research?
As an alternative to the traditional measure of worth—asking, "what tier did a paper get published in?"—The Unjournal provides metrics: We ask evaluators to provide a specific set of ratings and predictions about aspects of the research, as well as aggregate measures. We make these public. We aim to synthesize and analyze these ratings in useful ways, as well as make this quantitative data accessible to meta-science researchers, meta-analysts, and tool builders.
Feel free to check out our and (these are our pilot metrics, we aim to refine these).
These metrics are separated into different categories designed to help researchers, readers, and users understand things like:
How much can one believe the results stated by the authors (and why)?
How relevant are these results for particular real-world choices and considerations?
Is the paper written in a way that is clear and readable?
How much does it advance our current knowledge?
We also request overall ratings and predictions . . . of the credibility, importance, and usefulness of the work, and to help benchmark these evaluations to each other and to the current "journal tier" system.
However, even here, the Unjournal metrics are also precise in a sense that "journal publication tiers" are not. There is no agreed-upon metric of exactly how journals rank (e.g., within economics' "top-5" or "top field journals"). More importantly, there is no clear measure of the relative quality and trustworthiness of the paper within particular journals.
In addition, there are issues of lobbying, career concerns, and timing, discussed elsewhere, which make the "tiers" system less reliable. An outsider doesn't know, for example:
Was a paper published in a top journal because of a special relationship and connections? Was an editor trying to push a particular agenda?
Was it published in a lower-ranked journal because the author needed to get some points quickly to fill their CV for an upcoming tenure decision?
In contrast, The Unjournal requires evaluators to give specific, precise, quantified ratings and predictions (along with an explicit metric of the evaluator's uncertainty over these appraisals).
Of course, our systems will not solve all problems associated with reviews and evaluations: power dynamics, human weaknesses, and limited resources will remain. But we hope our approach moves in the right direction.
We want to reduce the time between when research is done (and a paper or other research format is released) and when other people (academics, policymakers, journalists, etc.) have a credible measure of "how much to believe the results" and "how useful this research is."
Here's how The Unjournal can do this.
Public evaluations and ratings: Rather than waiting years to see "what tier journal a paper lands in," the public can simply consult The Unjournal to find credible evaluations and ratings.
I (Reinstein) have been in academia for about 20 years. Around the departmental coffee pot and during research conference luncheons, you might expect us to talk about theories, methods, and results. But roughly half of what we talk about is “who got into which journal and how unfair it is”; “which journal should we be submitting our papers to?”; how long are their “turnaround times?”; “how highly rated are these journals?”; and so on. We even exchange on how to
A lot of ‘feedback’ is wasted, including the . Some reviewers write ten-page reports critiquing the paper in great , even when they reject the paper. These reports are sometimes very informative and useful for the author and would also be very helpful for the wider public and research community to understand the nature of the debate and issues.
John List (Twitter : "We are resubmitting a revision of our study to a journal and the letter to the editor and reporters is 101 pages, single-spaced. Does it have to be this way?"
of the process and timings at top journals in economics. report an average of over 24 months between initial submisson and final acceptance (and nearly three years until publication).
See also .
Early evaluation: We will evaluate potentially impactful research soon after it is released (as a working paper, preprint, etc.). We will encourage authors to submit their work for our evaluation, and we will the evaluation of work from the highest-prestige authors.
We will pay evaluators with further incentives for timeliness (as well as carefulness, thoroughness, communication, and insight). that these incentives for promptness and other qualities are likely to work.
See
Our theory of change is shown above as a series of possible paths; we indicate what is arguably the most "direct" path in yellow. All of these begin with our setting up, funding, communicating, and incentivizing participation in a strong, open, efficient research evaluation system (in green, at the top). These processes all lead to impactful research being more in-depth, more reliable, more accessible, and more useful, better informing decision-makers and leading to better decisions and outcomes (in green, at the bottom).
Highlighting some of the key paths:
(Yellow) Faster and better feedback on impactful research improves this work and better informs policymakers and philanthropists (yellow path).
(Blue) Our processes and incentives will foster ties between mainstream/prominent/academic/policy researchers and global-priorities or EA-focused researchers. This will improve the rigor, credibility, exposure, and influence of previously "EA niche" work while helping mainstream researchers better understand and incorporate ideas, principles, and methods from the EA and rationalist research communities (such as counterfactual impact, cause-neutrality, reasoning transparency, and so on.)
This process will also nudge mainstream academics towards focusing on impact and global priorities, and towards making their research and outputs more accessible and useable.
(Pink) The Unjournal’s more efficient, open, and flexible processes will become attractive to academics and stakeholders. As we become better at "predicting publication outcomes," we will become a replacement for traditional processes, improving research overall—some of which will be highly impactful research.
Rigorous quantitative and empirical research in economics, business, public policy, and social science has the potential to improve our decision-making and enable a flourishing future. This can be seen in the research frameworks proposed by 80,000 Hours, Open Philanthropy, and The Global Priorities Institute (see discussions here). This research is routinely used by effective altruists working on global priorities or existential risk mitigation. It informs both philanthropic decisions (e.g., those influenced by GiveWell's Cost-Effectiveness Analyses, whose inputs are largely based on academic research) and national public policy. Unfortunately, the academic publication process is notoriously slow; for example, in economics, it routinely takes 2–6 years between the first presentation of a research paper and the eventual publication in a peer-reviewed journal. Recent reforms have sped up parts of the process by encouraging researchers to put working papers and preprints online.
However, working papers and preprints often receive at most only a cursory check before publication, and it is up to the reader to judge quality for themselves. Decision-makers and other researchers rely on peer review to judge the work’s credibility. This part remains slow and inefficient. Furthermore, it provides very noisy signals: A paper is typically judged by the "prestige of the journal it lands in"’ (perhaps after an intricate odyssey across journals), but it is hard to know why it ended up there. Publication success is seen to depend on personal connections, cleverness, strategic submission strategies, good presentation skills, and relevance to the discipline’s methods and theory. These factors are largely irrelevant to whether and how philanthropists and policymakers should consider and act on a paper’s claimed findings. Reviews are kept secret; the public never learns why a paper was deemed worthy of a journal, nor what its strengths and weaknesses were.
We believe that disseminating research sooner—along with measures of its credibility—is better.
We also believe that publicly evaluating its quality before (and in addition to) journal publication will add substantial additional value to the research output, providing:
a quality assessment (by experts in the field) that can decisionmakers and other researchers can read alongside the preprint, helping these users weigh its strengths and weaknesses and interpret its implications; and
faster feedback to authors focused on improving the rigor and impact of the work.
Various initiatives in the life sciences have already begun reviewing preprints. While economics took the lead in sharing working papers, public evaluation of economics, business, and social science research is rare. The Unjournal is the first initiative to publicly evaluate rapidly-disseminated work from these fields. Our specific priority: research relevant to global priorities.
The Unjournal’s open feedback should also be valuable to the researchers themselves and their research community, catalyzing progress. As the Unjournal Evaluation becomes a valuable outcome in itself, researchers can spend less time "gaming the journal system." Shared public evaluation will provide an important window to other researchers, helping them better understand the relevant cutting-edge concerns. The Unjournal will permit research to be submitted in a wider variety of useful formats (e.g., dynamic documents and notebooks rather than "frozen pdfs"), enabling more useful, replicable content and less time spent formatting papers for particular journals. We will also allow researchers to improve their work in situ and gain updated evaluations, rather than having to spin off new papers. This will make the literature more clear and less cluttered.
We acknowledge the potential for "information hazards" when research methods, tools, and results become more accessible. This is of particular concern in the context of direct physical and biological science research, particularly in biosecurity (although there is a case that specific open science practices may be beneficial). ML/AI research may also fall into this category. Despite these potential risks, we believe that the fields we plan to cover—detailed above—do not primarily present such concerns.
In cases where our model might be extended to high-risk research—such as new methodologies contributing to terrorism, biological warfare, or uncontrolled AI—the issue of accessibility becomes more complex. We recognize that increasing accessibility in these areas might potentially pose risks.
While we don't expect these concerns to be raised frequently about The Unjournal's activities, we remain committed to supporting thoughtful discussions and risk assessments around these issues.
Journal-independent review allows work to be rated separately in different areas: theoretical rigor and innovation, empirical methods, policy relevance, and so on, with separate ratings in each category by experts in that area. As a researcher in the current system, I cannot both submit my paper and get public evaluation from (for example) JET and the Journal of Development Economics for a paper engaging both areas.
The Unjournal, and journal-independent evaluation, can enable this through
commissioning a range of evaluators with expertise in distinct areas, and making this expertise known in the public evaluations;
asking specifically for multiple dimensions of quantitative (and descriptive) feedback and ratings (see especially #metrics-overall-assessment-categories under our Guidelines for evaluators); and
allowing authors to gain evaluation in particular areas in addition to the implicit value of publication in specific traditional field journals.