💡
EA market testing (public)
  • Introduction/overview
    • Introduction & explanation
    • 👋Meet the team
    • 📕Content overview
    • Progress/goals (early 2023)
      • EAMT progress & results
      • Goals, trajectory, FAQs
  • 🤝Partners, contexts, trials
    • Introduction
    • Giving What We Can
      • Pledge page (options trial)
      • Giving guides - Facebook
      • Message Test (Feb 2022)
      • YouTube Remarketing
    • One For the World (OftW)
      • Pre-giving-tues. email A/B
        • Preregistration: OftW pre-GT
    • The Life You Can Save (TLYCS)
      • Advisor signup (Portland)
    • Fundraisers & impact info.
      • ICRC - quick overview
      • CRS/DV: overview
      • 📖Posts and writings
    • University/city groups
    • Workplaces/orgs
    • Other partners
    • Related/relevant projects/orgs
  • 🪧Marketing & testing: opportunities, tools, tips
    • Testing Contexts: Overview
    • Implementing ads, messages, designs
      • Doing and funding ads
      • Video ads/Best-practice guidelines
      • Facebook
      • Targeted ad on FB, with variations: setup
    • Collecting outcome data
      • Facebook ads interface
        • Pivot tables
      • Google analytics interface
      • Google A/B, optimize interface
      • Reconciling FB/GA reports
      • Survey/marketing platforms
    • Trial reporting template
  • 🎨Research Design, methodology
    • Methods: Overview, resources
    • "Qualitative" design issues
    • Real-world assignment & inference
      • Geographic segmentation/blocked randomization
      • Difference in difference/'Time-based methods'
      • Facebook split-testing issues
    • Simple quant design issues
    • Adaptive design/sampling, reinforcement learning
    • 'Observational' studies: issues
    • Analysis: Statistical approaches
  • 🧮Profiling and segmentation project
    • Introduction, scoping work
    • Existing work/data
      • Surveys/Predicting EA interest
      • Awareness: RP, etc.
      • Kagan and Fitz survey
      • Longtermism attitudes/profiling
      • Animal welfare attitudes: profiling/surveying
      • Other data
    • Fehr/SOEP analysis... followup
      • Followup with Thomas Ptashnik
    • Further approaches in progress
      • Profiling 'existing traffic'
  • 📋(In)effective Altruistic choices: Review of theory and evidence
    • Introduction...
    • The challenge: drivers of effective/ineffective giving
      • How little we know...
    • Models, theories, psych. norms
    • Tools and trials: overview
      • Tools/interventions: principles
      • Outcomes: Effective gift/consider impact)
        • (Effectiveness information and its presentation)
        • (Outcome: Pledge, give substantially (& effectively))
          • (Moral duty (of well-off))
        • Give if you win/ conditional pledge
      • Academic Paper Ideas
  • Appendix
    • How this 'gitbook' works
      • Other tech
    • Literature: animal advocacy messaging
    • Charity ratings, rankings, messages
    • "A large-scale online experiment" (participants-aware)
  • Innovationsinfundraising.org
Powered by GitBook
On this page
  • Summary of trial and results
  • General idea and main hypothesis
  • Background and context
  • Participant universe and sample size
  • Key treatment(s)
  • Treatment assignment procedure
  • Outcome data
  • Ex-post: Reporting results (brief)
  • Implementation and data collection
  • Basic results/outcomes
  • Quick interpretation
  • Intuitive interpretation
  • Caveats

Was this helpful?

Edit on GitHub
Export as PDF
  1. Partners, contexts, trials
  2. Giving What We Can

Pledge page (options trial)

PreviousGiving What We CanNextGiving guides - Facebook

Last updated 2 years ago

Was this helpful?

The presentation of options on GWWC's '' were randomly varied at the individual browser level over a certain period to see which option increased pledges.

A summary of this has been shared as a on the EA Forum.

Notes on content/building this

This follows the Trial reporting template, edited slightly for public reading.

We intend to redo and augment much of this analysis in a more transparent way; directly importing the data and doing our own analyses ....rather than Google's built-in tools. We intend to put this within the

Summary of trial and results

Giving What We Can (GWWC) has three giving pledge options, displayed in the 'Original presentation version' below.

From April-July 2021 they ran a trial presenting its 'pledge page' options in three slightly different ways. Considering 'clicks on any button' as the outcome, and a Bayesian 'preponderance of evidence' standard...

  • "Separate Bullets for Other Pledges" was the most successful presentation. It only showed a box for "The Pledge", with the other options given in less prominent bullet points below. This had about a 20% higher incidence rate than the Original presentation.

  • "Pledge before Try Giving" was the least successful presentation this was like the one displayed above, but with "Try Giving" in the central position. This had about a 23% lower incidence rate than the Original presentation.

Getting people to take the GWWC pledge may be seen as an important outcome on its own. It on getting people engaged in the Effective Altruism community and other EA activities, such as EA career impact decisions.

General idea and main hypothesis

GWWC: How can we present pledge options to maximize positive outcomes (pledges, fulfillment)?

General: For those considering making substantial giving pledges (of a share of their income), how does the presentation of these 'pledge options' matter?

Theories and mechanisms to consider:

  • Too many options may lead to 'indecision paralysis'

  • The signaling power of choice; e.g., if there's a 'more virtuous choice' I may feel that my 'middle choice' looks less good by comparison

Background and context

1. "Try Giving" (1% of income),

2. "The Pledge" (10% of income)

3. The "Further Pledge" (donate all income above a living allowance).

Three versions of this page were randomly presented (between 19-21 April and 10 July 2021)

The content of the key 'choice button' part varied between these three versions

  1. "Original:" A block of three (in the order of commitment) 'The Pledge' (10%) in the center and highlighted (see above)

  2. "Pledge before TryGiving": A block of 3 with "Try Giving" (1%) in the center and highlighted

  3. "Separate Bullets for Other Pledges": A single block for 'The Pledge' (10%), with the other pledges given as clickable bullet points below (as well as a bullet for the 'company pledge' ... which had a different presentation in other versions)

The version presented stayed constant according to an individual's IP cookie tracking.

Points of contact, Timing of trial, Digital location of project/data, Environment

Points of contact

Julian Hazell (julian.hazell at givingwhatwecan.org), Luke Freeman

'Academic' contact: David Reinstein.

Timing of trial (when will it/did it start and end, if known)

Start: 19 April 2021 (or 21 April)? End: 10 July 2021 (Source: Google Analytics)

Digital location where project 'lives'

(Planning, material, data)

The present document is currently (11 May 2022) the only writeup.

Environment/context for trial

https://www.givingwhatwecan.org/pledge/ ... see above

Participant universe and sample size

  • 'Everyone going to the above page' within the above time duration.

  • People interested in GWWC pledges'

Sample size: see below, from Google Analytics

Key treatment(s)

  1. "Original" (Block of 3 in order of commitment, Middle Pledge in Center)

2. "Pledge before TryGiving" ... as above but with Try Giving and The Pledge swapped, and Try Giving (in the center) highlighted

3. "Separate Bullets for Other Pledges" (see below)

Treatment assignment procedure

  • Three versions of this page were randomly presented

  • Equal likelihood of

The non-exact balance below seems an imbalance in 'sessions' not in 'participants'.

Our analysis should focus on outcomes per participant; thus, the figures below may need some adjusting (although at first pass, the results go in the same direction). This doesn't seem to be adaptive assignment. In Google's help on 'create an A/B test' they state:

All variants are weighted equally by default in Optimize. A visitor who is included in your experiment has an equal chance of seeing any of your variants.

The version presented stayed constant

Outcome data

Statistics on Google Analytics: This records only 'pressed any button' (any pledge) as the successful outcome.

Ideally, for future trials, this would include...

One entry per page view over the interval, detailing

  • Whether pledged

  • Which pledge

  • Time and date of view, Time spent on page, Other clicks, Location of user, Any other information about user

Most importantly:

  • Number of page views over the interval, by treatment

  • Number of pledges over the interval

    • by treatment

    • by type of pledge

  • Follow-up donations etc (if connectable)

Ex-post: Reporting results (brief)

Implementation and data collection

See Google A/B, optimize interface for details on data extraction from the interface

  1. From shared image from Google Analytics:

'Experiment sessions' (observations) by treatment (as labeled on Google Analytics shared image):

Original: 2588

Pledge before Try Giving: 2686

Separate Bullets for Other Pledges: 2718

Total: 7992 sessions (=2588+2686+2718)

3. Where is the data stored ... [noted above]

Basic results/outcomes

Quick interpretation

The "separate bullets for other pledges" seems to have been the most successful, with an 0.49% higher (percentage point) incidence rate than the 'Original', i.e., a 22% higher rate of pledging (2.69 vs 2.20).

These differences seem unlikely to be statistically significant in a conventional sense. Still, Google analytics (presumably a reasonable Bayesian) model states an 80% chance that this is the best treatment, and this seems useful and informative.

If anything, these result for 'separate bullets' seems potentially understated...

Note that GA is reporting conversions based on sessions (contiguous use periods) and not users. We can reasonably assume that a roughly equal number of users were assigned to each treatment (as per the design). As a result, we assume that roughly equal shares 'viewed the relevant page at least once' (because of the law of large numbers). However, the most successful treatment, the 'Separate block', is recording more sessions. Thus, the relative conversion rate, as a share of users, would be even higher than the one reported here, relative to the baseline.

__

Aside on statistics

Optimize uses Bayesian inference to generate its reports... Optimize chooses its priors to be quite uninformed.

DR: But this still doesn't tell us what these priors are. There's a lot of sensitivity to this choice, in my experience.

The "Pledge Before Try giving" treatment performed substantially worse than the original.

The poor performance of ‘pledge before try giving’ ...

The poor performance of ‘pledge before try giving’ appears even more substantial than the strength of ‘Separate Block’. It even seems to border on conventional statistical significance … I expect that in a standard comparison of the latter two treatments, we’d find conventional statistical significance.

These differences are meaningful–consider the 'posteriors':

Downloading the 'Analytics data' behind the above graphs, we see:

Variant
2.5th Percentile Modeled Improvement
25th Percentile Modeled Improvement
Modeled Improvement
75th Percentile Modeled Improvement
97.5th Percentile Modeled Improvement

Original

0%

0%

0%

0%

0%

Pledge Before Try Giving

-50%

-33%

-23%

-11%

18%

Separate Bullets For Other Pledges

-18%

4%

20%

36%

76%

This suggests it is very reasonable to think that 'Separate Bullets' is substantially better

Our 'posterior' probability thus infers that

  • a 2.5% chance that 'Separate Bullets' (SB) has an 18% (or more) lower conversion rate than 'Original'

  • a 22.5% chance on SB being between 18% worse and 4% better

  • a 25% chance of SB being 4-20% better

  • a 25% chance of SB being 20-36% better

  • A 22.5% chance of SB being 36-76% better

  • A 2.5% chance of SB being more than 76% better

We can also combine intervals, to make statements like ...

  • a 50% chance of being 4-36% better

  • a 50% chance of being 20-76% better

For 'Pledge before...' (PB) we can state, e.g.,

  • PB has a 75% chance of being at least 11% worse than Original

  • and a 50% chance of being at least 23% worse than Original

Intuitive interpretation

Perhaps giving people more options makes them indecisive. They may be particularly reluctant to choose a “relatively ambitious giving pledge” if a less ambitious option is highlighted.

This could also involve issues of self and social signaling. If the 'main thing' to do is a 10% pledge (as in "separate bullets"), then this may seem a straightforward way of conveying 'I am generous'. On the other hand, if the 'Further pledge' is fairly prominent, perhaps the signal feels less positive. And if the '1% pledge' is made central, 10% might seem more than a necessary signal.

The "pledge before try giving" may perform the worst because it makes the 'Try Giving' pledge a particularly salient alternative option. (In contrast, the "Original" at least makes 'The 10% Pledge' the central and the middle option.)

But in this case, why should the overall pledge rate (any button-press) be lower with more options (Original vs 'separate bullets'), and lower still when Try Giving is made central?

It's hard to say too much if we don't know the composition of the pledges people make.

Still, it might be that people mainly came in with the desire to take The Pledge (10%), as this is most heavily promoted. In such a case, making other pledge possibilities prominent may A. Cause people to rethink their choices and delay a decision (perhaps never returning) and/or B. Feel less comfortable with the overall 'signal' their pledge will send. This doesn't mean that the 'multiple boxes' environment are worse overall, but it may perform worse for those people coming here, as these were the people particularly attracted by the '10% is the main thing' signaling environment.

Caveats

I am assuming that the 'outcome being measured here' is whether the person 'clicked on any giving pledge'; this is what Luke has conveyed to me

I assume this is 'conversions ever from this IP', and 'sessions' represents 'how many different IPs came to the treatment'. If it's something else (e.g., each 'session' is a 'visit' from an individual), this could reflect these people converting in fewer sessions but not necessarily being more likely to convert overall. Even if this is 'by IP' the alternative interpretation 'not converting now but maybe later' may still have some weight if people are entering through multiple devices.

We should try to focus more carefully on 'whether this is having any effect on ultimate pledge-taking and pledge-follow-through behavior'.

I would be surprised if a moderate difference in the framing of a particular page should have such a large (2.69-1.71/1.71 = 57%) impact on the incidence of such a large life choice, involving at least tens of thousands of dollars. However, I still expect the incidence of 'click this button' to be likely related to that ultimate outcome, thus I suspect these results are still informative and useful as they stand.


These results may only apply narrowly to the GWWC pledge case, and even here, we have some . However, it loosely suggests that when making a call to action, it may be most effective to present the most well-known and expected option most prominently, and not to emphasize the range of choices (see further below).

Tendency to choose 'middle options'

GWWC has three distinct pledge options, as shown

(link from October 2020).

Statistics are available on Google Analytics/Optimizely. Reinstein has access to this and, is planning to input into R for more detailed analysis, to be reported in the .

Dillon: there is possibly a more sophisticated approach to this than what Google is doing ... the better prior is an 'empirical Bayes' approach (but it may be controversial). See to empirical Bayes

🤝
(Simonson and Tversky 1992)
here
analysis web book
this guide
caveats
discussion
above
pledge page
post
EAMT Analysis web-book here.
Pledge page "Original"
performance of three versions, shared from Google Optimize