Skip to main content Accessibility help
×
Research transparency

This journal believes in the importance of transparent and reproducible research. We require authors to follow best practices in reporting their methodology, for example describing details of study design, sources used and their provenance, and selection procedures used.

 

Authors are required to deposit their quantitative reproduction materials and other documentation related to their research process in the APSR Dataverse. In cases where authors are unable to provide such data (e.g., due to ethical or privacy concerns or legal restrictions by data providers), we require that authors explain why relevant data are not available for readers. In cases where data cannot be shared, we still require that authors share code and scripts used to produce their results and an explanation of how readers can secure access to the data directly from data providers. We use Data Availability Statements to describe where readers can find additional evidence.

 

When sharing materials, whether qualitative or quantitative, we require use of the journal’s dedicated repository. The APSR Dataverse is a repository that provides permanent identifiers and has robust preservation policies, which helps to ensure long-term integrity of published research.

 

We also expect authors to cite materials and data they have used in their research, alongside literature citations, to recognise the importance of all kinds of research outputs.

 

If you have any questions about this policy, please contact the editorial office at [email protected].

Reproducibility Guidelines

The APSR requires authors of conditionally accepted manuscripts to submit a reproducibility package to the APSR Dataverse. We review these packages to verify that we can use the submitted materials to reproduce the manuscript’s tables and figures, and to check that the authors have documented the research process well enough that future researchers will be able to benefit from it. This document provides requirements, advice, and instructions for authors to satisfy our pre-publication requirements.

Objectives

Preparing reproducibility packages requires valuable time. There are three reasons why the APSR requires authors of conditionally accepted manuscripts to deposit reproducibility packages prior to publication:

 - Quality control: Papers published in the APSR should carry the assurance that the results are sound. When possible, we want to help authors catch errors before a paper is published.

- Comprehensibility: Readers should be able to understand exactly how a paper’s results were produced through some combination of reading the paper, reading the appendices, and working with the reproducibility package.

- Extensibility: Other scholars should be able to expand, in future work, on the published article.

Requirements for reproducibility packages

Overview of what makes a successful reproducibility package

A reproducibility package is successful if someone attempting to reproduce the results in the paper can do the following:

  • Open a README in the root folder of the package and find a summary overview of all materials in the reproducibility package
  • Follow a clear set of instructions in the README to run the code required to produce all tables and figures in the paper
  • For every table and figure in the paper,
    • Locate the table/figure in the output produced by step 2
    • Locate the place in the code that produces the table/figure

Scope of reproducibility

The reproducibility package must produce, starting from data in as raw a form as possible, all computations reported in the manuscript’s tables and figures, as well as any other material that the authors may have offered to make available at the time of submission.

 When a manuscript uses a secondary dataset (i.e., a dataset made available by others), the reproducibility package must include the raw dataset and include any code that was used to transform it (and/or describe in detail any manual transformations that were made). This way, other researchers can understand and assess the author’s transformations.

 When a paper includes original data collection, the reproducibility package must also include the instruments used to collect the data, e.g., the survey questionnaire including experimental treatments if any, webscraping/API code for retrieving online data, etc.. These allow future scholars to reproduce the data collection process.

 For each dataset, authors must provide documentation that allows others to use the data for purposes other than simply reproducing the paper’s tables and figures. This means including

  • A codebook with a clear description of each variable, or
  • A reference (in the README) to publicly available documentation for the dataset

  The reproducibility package should also produce any important computations that do not appear in tables and figures in the manuscript.

Data to be included

Whenever possible, reproducibility packages should provide datasets in “raw” form (i.e., before the data has been cleaned or transformed by the author’s code). We recognize that “raw data” may be tricky to define in some cases. 

 When the data pipeline is time consuming, uses uncommon software, or relies on very large files, authors must include (in addition to the raw data files) an “analysis dataset” that can be used to produce the paper’s tables and figures without running the whole data pipeline. This allows other researchers to assess the robustness of the main results without having to take time to generate the analysis dataset themselves.

 Authors are responsible for ensuring that they have permission to share the data they include in their reproducibility package, and that by sharing the data they do not violate legal or ethical rights of research subjects and/or the dataset’s creators.

 Data that authors are unable to share

If analysis relies on data that cannot be shared for ethical, legal, or other reasons, authors must

  • Provide instructions in the README on exactly how others can obtain the data, and
  • Unless other researchers can easily obtain the data by following the instructions in (1), provide synthetic data, i.e. data that resembles the unavailable data but can be shared. Providing synthetic data allows other researchers to use the code in the reproducibility package to understand exactly how the analysis was produced, even if the output of the reproducibility package does not exactly match the published results.

Data citation

Authors must cite the datasets they use in their manuscript. The Social Science Data Editors website provides guidance on how to do this. When using multiple related datasets (e.g., several years of the American National Election Study), create a single composite citation for the bibliography and list individual datasets in an appendix.

README file

All reproducibility packages should include a file in the root directory named README in an open file format (e.g. TXT, PDF, markdown, HTML). Authors should assume this is the first file we will examine. The README file should include, at a minimum:

  • Table of contents: a brief description of every file in the replication folder
    • Documentation files: e.g. for a codebook, what dataset is being described?
    • Code files: What does the file do?
    • Data files: What data is contained in the file? How/where was the data acquired?
  • Instructions for running the code
  • Notes for each table and figure: a short list of where replicators will find the code needed to reproduce all parts of the publication.
  • Software dependencies: Instructions that will allow our team to reproduce your software environment and run the submitted code. To make it easier to diagnose issues that arise, these instructions should include the operating system (e.g. Windows 10, OSX 12.1) and version of the computing software (e.g. R 4.1.2, Stata 17) used to conduct the paper’s analysis, as well as a list of installed packages (with version numbers/dates).
    • R: Information on R version and loaded libraries can be found by typing sessionInfo().
    • STATA: A list of all add-ons installed on a system can be found by typing ado dir.
  • Estimated runtime for long-running computations: If any part of the data pipeline or analysis requires more than a few seconds to run on a typical laptop, include this information in the README.
  • Seed locations: If any of the analysis relies on (pseudo-)randomness (e.g. Monte Carlo simulations, bootstrapped standard errors), then authors should set seeds in their code and note in the README where seeds are set.

 For more detailed guidance, authors are encouraged to follow the README template provided by Social Science Data Editors: https://social-science-data-editors.github.io/template_README/. A README that follows those guidelines will satisfy our requirements.

Suggestions to authors preparing reproducibility packages

Read our instructions carefully and use the checklist below, which is also the checklist we will use in assessing whether to accept your package or send it back for revision.

 If a reproducibility package includes multiple scripts, include a master script that runs each of these scripts in the appropriate order.

 Give names to all files that will be easy for a replicator to understand, e.g. 01_data_cleaning.R, figure2.pdf.

 Use comments in your code to make it easy for future scholars (including yourself!) to understand what the code is doing.

 Unless the reproducibility package is very simple (with e.g. one script and one dataset, few outputs), we encourage you to use a directory structure that separates code, data, and results. A common pattern is:

 README.txt

  • master.R
  • data/
    • raw/
      • CSES.csv
      • county_level_covariates.csv
    • analysis/
      • for_regressions.csv
  • code/
    • 01_data_processing.R
    • 02_simulation.R
    • 03_analysis.R
  • results/
    • table1.tex
    • table2.tex
    • figure1.pdf
    • figure2.pdf
    • figure3.pdf

 To upload a package with a directory structure to the APSR Dataverse, select all files and subfolders in your directory on your computer and add them to a . zip file. Upload the zipped file to your dataset. Dataverse will automatically unzip the uploaded files.

 After uploading your file(s) to the APSR Dataverse and saving the result (but before submitting), download the replication package from Dataverse and make sure it runs and produces the expected output. (To download the package, click the  “Access Dataset” button then “Download ZIP”.) Ideally, you should save the package to a location in your computer’s directory structure that is different from the location where you developed the replication package; that way, you will catch any absolute paths in your code. Better yet, download it to a different computer entirely and run it there.

 We recognize that with some computational approaches the same code can produce slightly different results in successive runs, even when random seeds and software are perfectly harmonized. If this is the case for your project, please explain this in the README so that we do not try to investigate small discrepancies.

 We encourage authors to use resources designed to aid in reproducibility and durability of code (such as CodeOcean, Nuvolos, Whole Tale, Github Codespaces, conda, poetry, venv, docker). If you make use of one of these resources, it should be simple to submit a replication package to the APSR Dataverse that fulfills our requirements. It may also be easier for us (and for you) if we verify your submission by incorporating that resource into our workflow. If you think that might be the case, please contact us after receiving your conditional acceptance.

 More generally, if you have questions about how our requirements apply to your study, please contact us.

Reproducibility package checklist

  • README describes each file in the package
  • README contains instructions for running the code
  • README indicates where each table and figure can be found in the output
  • README lists base dependencies and additional dependencies
  • README contains estimated runtime for any long-running computations
  • If analysis requires randomness, README indicates where in the code seeds are set
  • Code runs and produces output
  • Using instructions in README, every table and figure in the paper can be found in output
  • Content of every table and figure matches what is in the paper
  • Every secondary dataset in the replication package is cited in the paper (or appendix for multiple related datasets)
  • Every secondary dataset is included in its raw form, i.e. without author transformations
  • Every dataset has a codebook or a reference in the README to publicly available documentation
  • Data collection instruments are included for any original dataset
  • If the data pipeline takes a long time, relies on large datasets, or requires downloading of unusual software, an analysis dataset is included

  Instructions for submitting reproducibility packages

 Sign in (after signing up, if necessary) to Dataverse

  • Go to the APSR Dataverse. Click the "Add Data" button and select “New Dataset” in the dropdown menu. Important: Please make sure to add your dataset to the APSR Dataverse and not anywhere else in the Harvard Dataverse repository.
  • Fill in the form to describe your data file(s), such as title, author name(s), abstract, year, citation to article, etc. The minimum information should include (a) title ("Replication Data for: [paper title]"), (b) author name(s) and (c) contact information, (d) description (abstract of the paper and/or description of the replication package), (e) subject (Social Sciences), (f) related publication ("Forthcoming, American Political Science Review" for the initial submission).
  • Scroll down to the “Files” section and click on “Select Files to Add” to upload your replication package, either as separate files or (preferred) a zip file containing the directory substructure. Click “Save Dataset” when upload is complete. This creates your “dataset” on Dataverse, but the result is not yet published.
  • Recommended: Download your package and test that the code produces the desired output, as described in the “Suggestions” section above. 
  • When the replication package is ready, click "Submit for review" to submit the draft version of the dataset for replication.

 

Once the package has been submitted, we will review it. We will contact you if revisions are required. When the package has been approved, we can proceed with publication

Reproducibility and Reliability

The Journal follows COPE guidelines scrupulously, which means that any errors discovered after publication may entail a retraction, corrigendum, or expression of concern. The Journal also publishes replications and encourages comments on each article's publication DOI on Cambridge Core, both of which are intended to encourage post-publication discussion of work in our pages.

This upshot is this: If your paper is published in the APSR there is a very good chance it will be scrutinized in a high-profile fashion by the academic community. To save time and potential embarrassment, authors should carefully consider the reproducibility and reliability of their work, prior to submission. Can the findings be reproduced? How robust are they to different choices in design (measurement, sample, specification, estimator)? Are weaknesses, assumptions, and limitations openly acknowledged?

We hope that post-publication scrutiny leads to better practices, and not to intellectual temerity. It is not our intention to discourage exploratory work. We also hope to normalize the process of post-publication debate and discussion, which means bringing honest mistakes to the fore without shame or recriminations. Intellectual activity is always risky, and sometimes errors offer the quickest path forward. Engagement is always preferred to withdrawal so long as it is in the service of truth and not undertaken in an ad hominem fashion