(In)effective Altruistic choices: Review of theory and evidence
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Appendix
Loading...
Loading...
Loading...
Loading...
Loading...
Introduction & explanation
Late-2024 update: This project is on hiatus/moved
Note from David Reinstein: The EA Market Testing team has not been active since about August 2023. Some aspects of this project have been subsumed by Giving What We Can and their Effective Giving Global Coordination and Incubation (lead: Lucas Moore).
Nonetheless, you may find the resources and findings here useful. I'm happy to answer questions about this work.
I am now mainly focused on making a success. I hope to return to some aspects of the EAMT and effective giving research projects in the future. If you are interested in engaging with this, helping pursue the research and impact, or funding this agenda, please contact me at daaronr@gmail.com.
See also
Our public reports of trials and analysis in the
The living web book
What is the "EA Market Testing Team"?
We are a . This project is organized by , who maintains this wiki/Gitbook and other resources.
We aim topromote awareness and understanding of thecore ideas of Effective Altruism(EA), and"to make giving effectively and significantly a cultural norm." We consider marketing campaigns, charitable appeals, events, and public communication, working both with our and in independent surveys and trials. We want to improve the design and messaging of organizations like and to improve their outreach methods and maximize their impact.
Measuring and testing 'what works and when': While helping these organizations do marketing and communication we are also testing and analyzing this rigorously. We help run and track careful data collection and rigorous controlled trials, as well as helping to organize the reporting of less rigorous trials. We robustly analyze the results to better understand which approaches tend to have a more positive impact.
How to and test marketing campaigns?
to promote EA?
What ?
We strive to be transparent. We want to report and share our data, procedures, code, and evidence without overselling the results.
We believe this is the first organized collaboration of its kind. We aim to...
Coalesce our understanding and evidence on barriers and facilitators of effective altruism, effective giving, and effective action
Run a broad set of high-powered trials (large samples, high-stake real-world contexts, substantial differences between conditions)
... to gather evidence on what works best to promote meaningful actions in specific
What is our mission? Updating...
As EAMT has progressed, we have encouraged others to do work and pursue initiatives in the 'space' of studying EA messaging, and marketing EA and effective giving. We hope that the resources we have provided, and the connections we have made have contributed to this. As the space changes, the EAMT mission, scope, and activities are adjusting as well.
We are moving towards a heightened focus on
Advising, proposing, and helping to design and coordinate experiments, trials, and initiatives.
of the results, rigorous statistical analysis
Synthesizing, sharing, and communicating this knowledge and skills base
This work provides substantial, whose benefit is shared among the partner organizations and the EA community.
Other relevant/new organizations and initiatives
(including User Friendly, Good Impressions, and Altruistic Agency)
Giving What We Can '' (a project we have encouraged and advised)
We believe the EA Market Testing Team is the first organized collaboration of its kind.
Goals and FAQ
(below in detail, scroll outside margin to skip past it)
What have we accomplished?
For an overview of our progress and ongoing work, see the we are building. (Below in detail, scroll outside margin to skip past it)
Note that we cannot publicly share details of ongoing and upcoming trials.We aim to share the results when it is possible. We aim to integrate shareable aspects of
For a data-driven dynamic document covering (some of) our trials and evidence see
"Testimonials"
Luke Freeman, Executive Director of Giving What We Can
"The EA Market Testing team has been very helpful in helping us to pursue our mission of creating a world where giving effectively and significantly is a cultural norm. They have helped us at each stage along the process of ideation through to analysis so that we can base our outreach activities on sound theory and strong evidence. This is at a particularly important time as we have been scaling up our marketing activities to reach and engage new audiences with effective giving and the ideas of effective altruism more broadly. We look forward to an ongoing collaboration with EAMT so that we can continue to iterate and increase our impact.”
Grace Adams, Head of Marketing, Giving What We Can:
It’s been extremely useful to hear what others in EA, individuals and orgs are doing and sharing learnings between us. I hope that we can develop a set of tactics that we know successfully convert people and get them more involved in EA. A reliable set of best practices for marketing EA would be a great outcome.
Greg Gianoupolis, Charity Elections
"As a quick testimonial relevant to this stage of the process, David [Reinstein]'s support has been critical to the Charity Elections team's development of plans for marketing and program evaluation. Our first ad campaign was particularly impactful, generating one click to the on the Giving What We Can website per $0.01 spent on the campaign. We will continue to incorporate his advice into our advertising to spread awareness of the Charity Elections program among high school students and teachers."
How to get involved?
If you are interested in getting involved with our project or have feedback for us, contact David Reinstein at daaronr AT gmail.com.
Next, check out the .
(For an explanation of this Gitbook's structure, content, and aims.
This quote comes from the 2022 .
However, we are also careful to be efficient, recognizing the tradeoffs between rigorous experimental design and practical marketing.
Including , guides and
Our regularly updated 'data analysis report' on all the trials and evidence, which you can as a protected zip file (need to request password, permission granted with consent of participating organizations)
;
Communication:
We track, organize, and share what we have learned with the EA community, building and organizing resources and a knowledge base. This will address questions such as:
What most likely to be interested in effective altruism?
cases,
... while aiming at generalizable principles and approaches
Do , activities and trials, building evidence on 'which types of people' are most responsive to effective giving messages and appeals
Share our results, data, and tools, with the relevant EA and research-interested communities. This will enable more and better outreach, promotion, testing, and insight.
Opinion: Digital marketing is under-utilized in EA: JS Winchell Post
Videos - Facebook
Youtube seminars
Email from JS:
The YouTube team holds quarterly workshops to explain how best to build and use your organic (not paid) YouTube channel. Based on previous discussions it sounds like this is something that might be of interest to your orgs.
Note: This is aimed at beauty and fashion brands but I'd imagine 80% of it would apply to GWWC/80k/1ftw
Agenda for the workshop:
Explain why YouTube is crucial for your brand identity
How to claim your narrative on this platform
Reach and engage new and existing audiences through content
Register your interest for future workshops .
A review of channel best practices
Enhance your channel's search and discovery potential
Develop an always on strategy
JS: My YouTube video best practices are here, but note YouTube is a very different platform than FB/IG (sound is on 98% of the time, no scrolling, you have at least 5 seconds to hook their attention, ads are much longer)
Below, we give one example from a relevant context, illustrating (with screenshots) what choices you might make, what it would look like, and how to implement it.
Updates/general advice: (Sep 2022)To do 'any good tracking and optimization through 'Facebook, you should set up the Meta Pixel and Conversion API as soon as possible.
You may want to jump to the (WIP) section.
Getting started
"Meta Business Suite"() is the starting point of your ad campaign. If you have a Facebook Business account, you should have a "Meta Business Suite":
Next, click on "Ads manager" (See the megaphone on the left).
Link a page?
You have to link a "Facebook Page" or "Instagram Account" to your ad campaign to have a visible front ground of your business that users could connect with the ad. You can create a new page or manage access to an existing page or Instagram account:
The next step is to select "Create a campaign" and choose an "objective"... the interface gives you some idea of what these aim for:
Budget optimization
When creating a new "Traffic campaign" ('cold traffic campaign' referenced ) there are a lot of options to help you optimize your delivery while minimizing your expenses.
You need to opt-in to these tools by ticking "create A/B test" and "Budget Optimization" on the first page of your "ad campaign manager." Since there is no downside (we would like to learn which ad design works best), we decide to opt-in to each of these.
Budget optimization is closely related to the choice of the target group. In general, the larger the target group, the cheaper it becomes to reach a certain amount of "link clicks".
Targeting the ad
Suppose we wish to create a targeted ad for a particular Facebook audience. For example, we might wish to put an ad...
in the 'feed' of US Americans who are interested in charity or volunteering or philosophy
giving them a link to a page encouraging them to learn about EA
Targeting example
Here, I chose "Get more website Visitors". ... Then "Edit Audience". Below, I chose people in the US over age 18 who are interested in any of a set of things related to charity, volunteering, or philosophy. This is a very broad audience, with about 80 million potential people
Facebook estimates that spending $5 per day over 5 days will lead 358-1000 people seeing the ad and 72-208 clicks. That implies a cost of between 12 cents and 34 cents per click
We can use the "schedule and duration" function not only to automate the timing of our campaign, but also to estimate its cost. For example, we assume that we need 800 participants to click-through to start the 20 fundraisers (i.e., a rate of 2.5%).
Below, we see that FB estimates 172-497 link clicks per day for 10 Euros per day for (a different_ case.
Benchmarking these numbers
These numbers seem over-optimistic in general, we've seen figures of $1-2 per click elsewhere. Some potentially reliable figures below (sources "" and re-reporting of Wordstream )
From a recent relevant experience in our group's context...
The last campaign based on clicks I ran got 461 clicks for $244 USD over 2 weeks with 113k impressions. [i.e., $0.50 per click]
Note that (maybe obviously) 'clicking on a Facebook ad' is a rare thing for people to do. In the quote above, thats about 4 clicks per 1000 impressions.
Narrower targeting in the 'ads manager'
It seems you can target more carefully in the "Ads Manager".
Create a saved audience
You can specify
Demographics
Interests
Behaviors
"Include" seems to be the default when specifying these ... it 'expands the audience'. You can click 'narrow further' to constrain the audience.
Don't forget to use the search tool within 'browse' to find ways to do careful targeting
During this process, you can see a concise statement of your choices, and the estimated audience size further up on the page:
How should we (EA, effective giving) target ads?
We have some evidence that narrower targeting helps. An obvious candidate is
Traffic choice
The next big choice is 'where do you want to drive traffic?'. You'll enter more details about the destination later.
Since we want people to click our web app, we chose "website".
Version testing
We may have several versions of the ad we want to try out, and we want Facebook to iterate towards the one that is more successful using their algorithm. Ideally, we would like to learn as much as we can about 'which ads perform better on which audiences'.
We can set up Facebook's ("meta") algorithm to dynamically optimize 'over which will get the most clicks.'
"Dynamic creative"
"Dynamic Creative" is an option to enhance this process. It takes multiple media (images, videos) and multiple ad components (such as images, videos, text and calls-to-action) and then mixes and matches them in new ways to improve your ad performance.
"Dynamic creative" can be either switched on or off. (Given that we want to optimize over several versions, I see no downside to this feature. Thus, we switch it on.)
Where do we actually specify, enter, and style our ad content?
Finally, we have to decide which delivery we want to optimize.
We may want the ad that gets the most "conversions traffic to our page". Therefore, we choose the option "link clicks".
However, we might instead want FB to optimize the ad presentation in terms of which ad not just leads to the most 'clickthroughs' but leads to the most "conversions" or some other action taken on our page
To do that we need to set up a "meta pixel". See
Cost and cost controls
DR: In my past experience, you ended up paying Facebook based on the number of "clicks" you got not simply on how long your ad was up. But it's probably a combination of these, and there are probably different pricing plans. You can tell Facebook to put a limit on either of these do not go "over budget". Facebook will aim to spend your entire budget and get the most link clicks using the lowest cost bid strategy.
Currently EUR 315 is the max for new users ... but for our present pilot we may want less than this (check: how much do we expect to pay for 800 clicks, let's split this up into ... first 100 clicks, next 300 clicks,.. to see if its going OK )
Designing your ad
Finally, you enter the third and last page of the ad creation process. Here you have to verify your ID and Facebook page and choose the actual design of your ad versions. ["of which the most important one is whether you want to have a video or single image." (?) ]
The last step before publication is to specify the destination for your campaign.
We chose a website and simply copy the URL into the mask to make sure the ad is linking people to the right destination.
Payment (and monitoring)
Optimizing and pixels
Setting up the pixel
The pixel includes content from Facebook that needs to be integrated into your website/page of interest. (To do: link instructions for this).
Adding pixel 'events' to your web page
One simple way of doing this: "Events setup tool"
Once you are in the ads manager for an ad, go to the 'Events Manager':
"Add events", choose "from the pixel"
"Events setup tool"
Put the URL for your site in and 'Open website'
As seen below, this opens our page, and show what things have already been associated with a Pixel. Here the "create fundraiser" button on this page has been associated with a button on this page with the "Initiate Checkout". (We use default names Facebook is familiar with, even though there is no 'checkout' in this case).
("Facebook Pixel Helper" extension in Chromium might be helping here, but I'm not sure how).
"Track new button" lets you see what click options you could associate with a pixel.This highlights clickable things you can do this with. ('Create fundraiser' is not highlighted, probably because it's already been assigned).
For example, I could click 'who are we' on a page and associate it with 'view content'
I could 'add a value' to this, if it makes sense.
Can I use this later to have FB optimize for 'net value' of a user generated on the page? This might be a useful way to assign greater importance to certain things, even if they aren't actually monetized.
After this 'finish setup' ... it gives you the chance to see what you have asked it to do and confirm or cancel it.
Using the pixel events for Facebook ad optimization
Once you have nice pixels set up, you can use this in helping Facebook decide which versions of ads to serve, which audiences to serve them to, etc. You set up your ad, define an objective etc...
Define your goal as 'conversion', and define what 'conversion' corresponds to in terms of pixels:
Here we're choosing 'initiate checkout', which we defined as clicking on a 'create fundraiser' button on the first page of our site (early in the funnel)
The warning below might not matter as we
The warning below might not matter as we haven't had our page up for a while. But we have also been told elsewhere that before you can get the ad to optimize for conversions ...
you first need to have the pixel set up and the ad running, optimizing for views. So this might still be a concern.
Facebook tracks people for a while. So in optimizing, you can change 'what time period of outcomes it attributes to which (version of the ad)':
I assume that the same 'conversions' target defined above is used in optimizing the 'dynamic creative' if you turn that on.
Facebook estimates that spending $5 per day over 5 days will lead 358-1000 people seeing the ad and 72-208 clicks. That implies a cost of between 12 cents and 34 cents per click
If you don't have an existing contact list or comparison group, you may prefer to simply specify characteristics. That is "Create a Saved Audience".
For example, you can specify age groups and then 'detailed targeting' categories, including, e.g., Schools (including universities):
More detailed targeting
Write a captionYou can specify
Demographics
Interests
Behaviors
"Include" seems to be the default when specifying these ... it 'expands the audience'. You can click 'narrow further' to constrain the audience.Don't forget to use the search tool within 'browse' to find ways to do careful targeting Exit with⌘↩
Example: "Hours spent promoting" vs "number of fellows"
Consider a study where
EA groups are asked to voluntarily participate (with no direct compensation)
to report the 'time spent on each recruiting activity',
and to ask their fellows/members 'how did you hear about our group?'
Suppose this finds
'per hour spent by the organizers, far fewer people report "tabling" as the source, relative to 'a direct email'.
Should we interpret this as
'direct emails are a more efficient use of time than tabling, thus groups should spend less time doing tabling and more time sending emails?'
Maybe, but we should be careful; there are other explanations and interpretations we should delve into. Some of these could be partially addressed through survey design, others through careful analysis. Other 'causality' issues may require an experiment/trial/test to get at.
Statistical inference: chance and selection/selectivity
Random variation: With a small sample of groups, these numbers may be particularly high or low (for tabling, for emails, etc) by chance; the averages for a 'typical group' may turn out to be very different.
This is the standard issue of statistical inference about a population from a sample.
The issue of 'misrepresenting the population' tends to be worse with smaller samples (here small number of groups, and small numbers of observed outcomes in each group; e.g., only a few fellows)
However, 'as Bayesians know' you can still draw valuable decision-relevant inferences from small samples. IMO (Reinstein) the "problem of small samples" tends to be overstated because we mainly learn about statistics designed for a particular scientific frequentist approach.
Selection/selectivity: The groups that 'opted in' to be part of this survey may not be a 'random draw' from the population of relevant groups. It may represent more careful or more enthusiastic groups, perhaps groups that are particularly analytical and not so good socially, etc. If some of the 'fellows' within the groups don't complete the survey, this could add another 'selection bias'.
'Marketing causality' issues
Attribution with multiple sources: “How did you hear about this program?” This could be interpreted in several ways, probably “how did you first hear”. But in marketing sometimes people hear about something multiple times, and it’s hard to know which of these are pivotal in getting them to take action. (We could probably do something to make this question a bit more informative.)
“Lift”: some people might have signed up anyways even without the activities identify as ‘how they heard about it’. Other people may have been harder to reach, and for the latter (e.g.) tabling ‘Spoke to us while we were tabling’ may be pivotal.
Decisionmaking implications
Costs hours: the cost of these activities may not be fully proportional to the times spent … e.g., writing a professor may be mentally costly and possibly cost some other social capital. On the other hand tabling may be fun and social, and also generate interesting feedback (and other benefits that are harder to measure, like links with other groups also doing tabling)
Diminishing returns/hard limits on some activities … e.g., there may be only so many professors (or students) to email. After a few hours of this
=
Further approaches in progress
Models, theories, psych. norms
An overall characterization of widely-cited and 'conventional-wisdom' evidenced background drivers barriers to effectiveness X charitable giving
Barriers/obstacles to effective giving: classification
We focus on the 'barriers' or 'hurdles to giving effectively' among individuals who already engage in some charitable giving and other-regarding acts. Loosely, a donor would need to "jump over all of these hurdles" and cross each of these barriers in order to be giving effectively.
Conceptual breakdown (I)
A conceptual breakdown of barriers:
Base values may be (non) utilitarian: People are optimising their own 'X', which does not coincide with impact --> no puzzle?
Avoiding information about effectiveness: Even if people want to optimise impact, they may specifically dislike and avoid gathering information about effectiveness in a charitable giving setting
Presenting effectiveness information may backfire: E.g., if it switches off the 'generous part of the brain', gets people to think in a more 'market' mode, or makes people indecisive
I present and discuss this breakdown, a more practical breakdown, and specific examples in each category in the part of the synthesis. I go into further detail an (work to) present evidence on each of these in later sections of that site. (I plan to go through that work and extract only the key, most practical elements).
These barriers are also mapped, and connected with tools and evidence in .
See 'barriers' View
Judgement/cognition failures, quantitative biases, information failure: People try but fail to optimize, and/or have persistent incorrect beliefs
Emotion overrides cognition: Our brain serves two masters, those decisions are not consistent
Identity and signalling: Effectiveness in giving clashes with our self-image/self-beliefs, or with how we want to appear to others
Systemic factors (and inertia): social systems leading to pressure and incentives from others to give to local or less-effective causes. Even if impact is a goal these systems take a long time to adjust.
Giving What We Can's mission is to make giving effectively and significantly a cultural norm. GWWC has updated their 2022 strategy. They are looking to significantly increase their marketing activity by producing videos, funding ads, and conducting systematic and robust research. As such there will be a large crossover between our work and theirs. This section highlights our collaborative efforts.
Presentation: overview
Ideas and opportunities
We want to learn from existing work, run tests on the GWWC platform, and support research into this.
Stages of the funnel:
Awareness & Consideration
Increase casual visitors and raise curiosity
Conversion & Acquisition
Donate or pledge to donate
Some key questions
“What should the call to action be for the casual person in the funnel?”
Testing all parts of funnel/pledge journey; website, welcome messages/welcome packages, reminders and thank-you's
Completed studies: See sections below
Retention
Fulfill and report pledge
Advocacy
Promoting GWWC to others
Pledge page (options trial)
The presentation of options on GWWC's 'pledge page' were randomly varied at the individual browser level over a certain period to see which option increased pledges.
A summary of this has been shared as a post on the EA Forum.
We intend to redo and augment much of this analysis in a more transparent way; directly importing the data and doing our own analyses ....rather than Google's built-in tools. We intend to put this within the
Summary of trial and results
Giving What We Can (GWWC) has three giving pledge options, displayed in the 'Original presentation version' below.
From April-July 2021 they ran a trial presenting its 'pledge page' options in three slightly different ways. Considering 'clicks on any button' as the outcome, and a Bayesian 'preponderance of evidence' standard...
"Separate Bullets for Other Pledges" was the most successful presentation. It only showed a box for "The Pledge", with the other options given in less prominent bullet points below. This had about a 20% higher incidence rate than the Original presentation.
"Pledge before Try Giving" was the least successful presentation this was like the one displayed above, but with "Try Giving" in the central position. This had about a 23% lower incidence rate than the Original presentation.
These results may only apply narrowly to the GWWC pledge case, and even here, we have some . However, it loosely suggests thatwhen making a call to action, it may be most effective to present the most well-known and expected option most prominently, and not to emphasize the range of choices(see further below).
Getting people to take the GWWC pledge may be seen as an important outcome on its own. It may have a causal impact on getting people engaged in the Effective Altruism community and other EA activities, such as EA career impact decisions.
General idea and main hypothesis
GWWC: How can we present pledge options to maximize positive outcomes (pledges, fulfillment)?
General: For those considering making substantial giving pledges (of a share of their income), how does the presentation of these 'pledge options' matter?
Theories and mechanisms to consider:
Tendency to choose 'middle options'
Too many options may lead to 'indecision paralysis'
The signaling power of choice; e.g., if there's a 'more virtuous choice' I may feel that my 'middle choice' looks less good by comparison
Background and context
GWWC has three distinct pledge options, as shown
1. "Try Giving" (1% of income),
2. "The Pledge" (10% of income)
3. The "Further Pledge" (donate all income above a living allowance).
These can be seen on the 'pledge page' (link from October 2020).
Three versions of this page were randomly presented (between 19-21 April and 10 July 2021)
The content of the key 'choice button' part varied between these three versions
"Original:" A block of three (in the order of commitment) 'The Pledge' (10%) in the center and highlighted (see above)
"Pledge before TryGiving": A block of 3 with "Try Giving" (1%) in the center and highlighted
"Separate Bullets for Other Pledges": A single block for 'The Pledge' (10%), with the other pledges given as clickable bullet points below (as well as a bullet for the 'company pledge' ... which had a different presentation in other versions)
The version presented stayed constant according to an individual's IP cookie tracking.
Points of contact, Timing of trial, Digital location of project/data, Environment
Points of contact
Julian Hazell (julian.hazell at givingwhatwecan.org), Luke Freeman
Participant universe and sample size
'Everyone going to the above page' within the above time duration.
People interested in GWWC pledges'
Sample size: see below, from Google Analytics
Key treatment(s)
"Original" (Block of 3 in order of commitment, Middle Pledge in Center)
2. "Pledge before TryGiving" ... as above but with Try Giving and The Pledge swapped, and Try Giving (in the center) highlighted
3. "Separate Bullets for Other Pledges" (see below)
Treatment assignment procedure
Three versions of this page were randomly presented
Equal likelihood of assignment
The non-exact balance below seems an imbalance in 'sessions' not in 'participants'.
Our analysis should focus on outcomes per participant; thus, the figures below may need some adjusting (although at first pass, the results go in the same direction).
This doesn't seem to be adaptive assignment. In Google's help on 'create an A/B test' they state:
All variants are weighted equally by default in Optimize. A visitor who is included in your experiment has an equal chance of seeing any of your variants.
The version presented stayed constant for each individual across visits.
Outcome data
Statistics on Google Analytics: This records only 'pressed any button' (any pledge) as the successful outcome.
Ideally, for future trials, this would include...
One entry per page view over the interval, detailing
Whether pledged
Ex-post: Reporting results (brief)
Implementation and data collection
See for details on data extraction from the interface
From shared image from Google Analytics:
'Experiment sessions' (observations) by treatment (as labeled on Google Analytics shared image):
Original: 2588
Pledge before Try Giving: 2686
Separate Bullets for Other Pledges: 2718
Total: 7992 sessions (=2588+2686+2718)
3. Where is the data stored ... [noted above]
Basic results/outcomes
Quick interpretation
The "separate bullets for other pledges" seems to have been the most successful, with an 0.49% higher (percentage point) incidence rate than the 'Original', i.e., a 22% higher rate of pledging (2.69 vs 2.20).
These differences seem unlikely to be statistically significant in a conventional sense. Still, Google analytics (presumably a reasonable Bayesian) model states an 80% chance that this is the best treatment, and this seems useful and informative.
If anything, these result for 'separate bullets' seems potentially understated...
Note that GA is reporting conversions based on sessions (contiguous use periods) and not users. We can reasonably assume that a roughly equal number of users were assigned to each treatment (as per the design). As a result, we assume that roughly equal shares 'viewed the relevant page at least once' (because of the law of large numbers). However, the most successful treatment, the 'Separate block', is recording more sessions. Thus, the relative conversion rate, as a share of users, would be even higher than the one reported here, relative to the baseline.
__
Aside on statistics
Optimize uses Bayesian inference to generate its reports... Optimize chooses its priors to be quite uninformed.
DR: But this still doesn't tell us what these priors are. There's a lot of sensitivity to this choice, in my experience.
Dillon: there is possibly a more sophisticated approach to this than what Google is doing ... the better prior is an 'empirical Bayes' approach (but it may be controversial).
See to empirical Bayes
The "Pledge Before Try giving" treatmentperformed substantially worse than the original.
The poor performance of ‘pledge before try giving’ ...
The poor performance of ‘pledge before try giving’ appears even more substantial than the strength of ‘Separate Block’. It even seems to border on conventional statistical significance … I expect that in a standard comparison of the latter two treatments, we’d find conventional statistical significance.
These differences are meaningful–consider the 'posteriors':
Downloading the 'Analytics data' behind the above graphs, we see:
Variant
2.5th Percentile Modeled Improvement
25th Percentile Modeled Improvement
Modeled Improvement
75th Percentile Modeled Improvement
97.5th Percentile Modeled Improvement
This suggests it is very reasonable to think that 'Separate Bullets' is substantially better
Our 'posterior' probability thus infers that we should put
a 2.5% chance that 'Separate Bullets' (SB) has an 18% (or more) lower conversion rate than 'Original'
a 22.5% chance on SB being between 18% worse and 4% better
a 25% chance of SB being 4-20% better
We can also combine intervals, to make statements like ...
a 50% chance of being 4-36% better
a 50% chance of being 20-76% better
For 'Pledge before...' (PB) we can state, e.g.,
PB has a 75% chance of being at least 11% worse than Original
and a 50% chance of being at least 23% worse than Original
Intuitive interpretation
Perhaps giving people more options makes them indecisive. They may be particularly reluctant to choose a “relatively ambitious giving pledge” if a less ambitious option is highlighted.
This could also involve issues of self and social signaling. If the 'main thing' to do is a 10% pledge (as in "separate bullets"), then this may seem a straightforward way of conveying 'I am generous'. On the other hand, if the 'Further pledge' is fairly prominent, perhaps the signal feels less positive. And if the '1% pledge' is made central, 10% might seem more than a necessary signal.
The "pledge before try giving" may perform the worst because it makes the 'Try Giving' pledge a particularly salient alternative option. (In contrast, the "Original" at least makes 'The 10% Pledge' the central and the middle option.)
But in this case, why should the overall pledge rate (any button-press) be lower with more options (Original vs 'separate bullets'), and lower still when Try Giving is made central?
It's hard to say too much if we don't know the composition of the pledges people make.
Still, it might be that people mainly came in with the desire to take The Pledge (10%), as this is most heavily promoted. In such a case, making other pledge possibilities prominent may A. Cause people to rethink their choices and delay a decision (perhaps never returning) and/or B. Feel less comfortable with the overall 'signal' their pledge will send. This doesn't mean that the 'multiple boxes' environment are worse overall, but it may perform worse for those people coming here, as these were the people particularly attracted by the '10% is the main thing' signaling environment.
Caveats
I am assuming that the 'outcome being measured here' is whether the person 'clicked on any giving pledge'; this is what Luke has conveyed to me
I assume this is 'conversions ever from this IP', and 'sessions' represents 'how many different IPs came to the treatment'. If it's something else (e.g., each 'session' is a 'visit' from an individual), this could reflect these people converting in fewer sessions but not necessarily being more likely to convert overall. Even if this is 'by IP' the alternative interpretation 'not converting now but maybe later' may still have some weight if people are entering through multiple devices.
We should try to focus more carefully on 'whether this is having any effect on ultimate pledge-taking and pledge-follow-through behavior'.
I would be surprised if a moderate difference in the framing of a particular page should have such a large (2.69-1.71/1.71 = 57%) impact on the incidence of such a large life choice, involving at least tens of thousands of dollars. However, I still expect the incidence of 'click this button' to be likely related to that ultimate outcome, thus I suspect these results are still informative and useful as they stand.
'Academic' contact: David Reinstein.
Timing of trial (when will it/did it start and end, if known)
Start: 19 April 2021 (or 21 April)? End: 10 July 2021 (Source: Google Analytics)
Digital location where project 'lives'
(Planning, material, data)
Statistics are available on Google Analytics/Optimizely. Reinstein has access to this and, is planning to input into R for more detailed analysis, to be reported in the analysis web book.
The present document is currently (11 May 2022) the only writeup.
Environment/context for trial
https://www.givingwhatwecan.org/pledge/ ... see above
Variation in the presentation of the pledge options
Which pledge
Time and date of view, Time spent on page, Other clicks, Location of user, Any other information about user
Most importantly:
Number of page views over the interval, by treatment
In this section, you will find reports of the trials we have run with organizations, including Giving What We Can and One For the World.
2) 🪧
Here we share tools to implement planned trials, as well as tips relevant to 'doing marketing'. We answer questions like how to set up campaigns and track outcomes on various platforms. See especially "..." and "..."
3) 🎨
We discuss qualitative and quantitative research design and methodology issues that are relevant to the trials we are running. Pages in this section will be linked in reports when relevant to a particular trial.
4) 🧮
Our profiling project aims to help better understand what sorts of people are amenable to EA-related ideas and to taking EA-favored actions.
5) 📋
We've done a review of existing literature: to inform the trials we are running, and to identify important research topics. This includes and .
You can find references, tech support, and other resources in the .
What is this 'Gitbook' meant for?
The three key aims of this are to:
Conveywho we are, what have accomplished, and the scope of our work to funders, people in the broader EA community, and people not yet involved in the project who would be interested in joining
Share tools and knowledge withpeople in the EA/global priorities community who will apply it to their work. We are building a knowledge base. Content in the public gitbook can inform and support a diverse set of projects (i.e., implementing marketing campaigns, fundraising initiatives, academic research)
Advisor signup (Portland)
TLYCS ran a campaign in a single city involving 'donation advice'
In December 2021, TLYCS ran a YouTube advertising campaign in Portland Oregon, involving ‘donation advice’. The top 10% household-income households were targeted with (one of) three categories of videos. One of the ultimate goals was to get households to sign up for a 'concierge' personal donor advising service
Quick takeaways
There were very few signups for the concierge advising service. (About 16 in December 2021 , only 1 from Portland.)
We consider a 'difference in difference', to compare the year-on-year changes in visits to TLYCS during this period for Portland vs other comparison cities.
This comparison yields a 'middle estimate cost' of $37.7 per additional visitor to the site. This seems relatively expensive. We could look into this further to build a more careful model and consider statistical bounds, if such work was warranted.
General idea and main questions
Specific goal of TLYCS promotion: To get people to click on the ad and go to the 'landing page' of TLYCS. Here, they will fill out to request an appointment with a donation advisor. We will simultaneously be raising awareness for TLYCS.
General questions:
Can we get people to sign up for donation advice using videos in YouTube Ads?
How many sign-up and what sorts of people?
Do these ads boost engagement with TLYCS in net? (E.g. donations, website activity, book downloads)
Background and context
Participant universe and sample size
Location: Portland, OR
Audience: Top 10% of household income
People living in Portland, Oregon in the top 10% of household income (approximated by Google) will get an in-stream ad (ad plays before video user intended to watch)
Key treatment(s)
Exposure to a sequence of nine versions of YouTube ad videos. Frequency cap: 6/weeks
Three main 'theme/header' variations (similar, slightly different phrasings)
these variations were crossed with...
Three categories of videos within each theme:
"Bravery": Charlie Bresler explains how 'you can save lives without being brave' with small amounts of money for bednets, nutrient micro-doses, etc.
$10: Man giving out money to poverty-stricken people in Capetown. Text narrative overlaid describes that $5 can buy a slice of pizza, or an interocular lens to treat cataracts, etc. Leans towards 'identified victims/recipients'.
"I want to do good": Colorful puppets sing about giving and donating to save lives. Counters common arguments about 'breeding dependency', fear of administrative waste, etc.
These are organized and linked .
Note/limitation:
Unfortunately, we were not able to track 'which video got more clicks'.
Each video comes with a site-link extension with a Call to Action:
Treatment assignment procedure
We assigned the particular video treatments to audiences using a YouTube/Google optimization algorithm. This chose videos to maximize the probability that a user chose 'Speak to an Advisor' and filled out the linked form.
Outcome data
How long people watched the videos for
Whether they 'clicked through'
Whether they filled out the form for advising (Algorithm is serving to optimize this)
Results (simple analysis)
Note: we present some more in-depth analyses and graphs in the Quarto , along with a code and data pipeline
Cost per user (first-pass)
A first pass and upper bound on impact and (lower bound on) cost/session
Assumptions/data interpretations
The numbers used in our data come from meaningful sessions from unique users
The 'date range' is the relevant one for being affected by the advertisements of interest
The 'comparison cities' are approximately randomly selected
Most optimistic (unrealistic) bound
Guiding assumption: a counterfactual 0 visits from Portland in this season
306 Portland Users (389 Portland site visits) in relevant 2021 period.
If these were all driven by the advertisement (and counterfactual was 0 visits), this is +306 Users and +389 visits
Cost $4k
Year-on-Year (maybe reasonable) optimistic bound
Guiding assumption: a counterfactual 'sam as last year' in Portland
306 Portland Users (389 Portland site visits) in relevant 2021 period.
144 Portland Users (189 Portland site visits) in relevant 2020 period.
--> 306 - 144 =162 users uptick,
Difference in Differences comparison to other cities
Guiding assumptions:
The cities used are fairly representative
'Uptick as a percentage' is unrelated to city size/visits last year
All the cities in the comparison group are 'informative to the counterfactual' in proportion to their total number of sessions
This yields
112.5% visits uptick (Year on Year) for Portland in 2020
For all North American cities other than Portland (with greater than 250 000 people):
The average is 46.5 users in the 2020 period and 64.5 users in the 2021 period, an uptick of about 38.8%. This is very similar to the result if we look at all cities which has an uptick of 43.1%.
38.8% uptick multiplied by 144 users = 55.9 (‘counterfactual uptick’ in users for Portland)
162 - 55.9 = 106 (uptick relative to counterfactual)
USD 4000 /106 = 37.7 USD cost per additional user through this ad
Note this is a midpoint estimate, we have not yet given statistical bounds.
In the graph below (pasted from the Quarto ), we show these year-on-year upticks in context.
Other outcomes
There were very few signups for the concierge advising service. Only about 16 in December 2021 globally, only 1 of which was from Portland.
Notes
Other detailed notes are in our private Gitbook. More formal and detailed analysis could be done if it seems merited.
Giving guides - Facebook
Along with GWWC, we tested marketing and messaging themes on Facebook in their Effective Giving Guide Facebook Lead campaigns. Across four trials we compared the effectiveness of different types of (1) messages, (2) videos, and (3) targeted audiences.
A summary of this has been shared as a post on the EA Forum:
... [with text and rich content promoting effective giving and a "giving guide" -- links people to asking for their email in exchange for the guide]
Objective: Test distinct approaches to messaging, aiming to get people to download our Giving Guide. A key comparison: "Charity research facts" vs. "cause focus".
Also informative about costs and the 'value of targeting different groups' in this context.
Key findings:
The cost of an email address via a Facebook campaign during Giving Season was as low as $8.00 across campaigns.
“Only 3% of people give effectively,” seems to be an effective message for generating link clicks and email addresses, relative to the other messages.
Key caveats
Specificity and interpretation: All comparisons are not for 'audiences of similar composition' but for 'the best audience Facebook could find to show the ads, within each group, according to its algorithm'. Thus, differences in performance may combine 'better targeting' with 'better performance on the targeted group'. See our . I.e., we can make statements about "what works better on Facebook in this context and maybe similar contexts", but not about "which audience, as defined, is more receptive", as the targeting within each audience may differ in unobserved ways.
The outcome is 'click to download the giving guide'.
Previous writeup and results
to the previous Gdoc report
Preregistration: OftW pre-GT
Academic-linked authors: David Reinstein, Josh Lewis, potentially others going forward
Implementation and management: Chloe Cudaback, Jack Lewars
1) Have any data been collected for this study already?
No, no data have been collected for this study yet.
2) What's the main question being asked or hypothesis being tested in this study?
Are effectiveness-minded (EA-adjacent) donors and pledgers more motivated to donate by
"A": A (non-quantitative) mention of impact and effectiveness (in line with the standard OftW pitch)
"B": Emotional appeals and 'identified victim' images
Framing this in terms of the psychology, social science, and philanthropy literature:
"Does the Identifiable Victims Effect (see e.g., meta-analysis by Lee and Feeley, 2016) also motivate the most analytical and committed donors?"
3) Describe the key dependent variable(s) specifying how they will be measured.
d_don_specific: Whether the person receiving the series of emails makes an additional 'one time gift' following the link at OftW, within the OftW interface, during the 'Giving Season', a time-period that (for this preregistration) we declare to begin on receipt of this first email and end on 15 January 2022.
don_specific: The total amount donated through the above
4) How many and which conditions will participants be assigned to?
Two conditions (treatments):
A. "Impact"
B. "Story/Emotion"
Assignment details
Participants (c 4000 people at various points in the One for the World pledge process) will be split into groups (blocks) by previous donation behavior or point in the process. (OftW have mentioned, pledgers still in school, active donors, and lapsed donors).
Within each group, they will be randomized (selection without replacement to ensure close-to-exact shares) into equal shares in treatments A and B.
A series of three emails will be sent, with participants remaining in the same treatment across all three emails.
See actual texts for design and timing
Example content differences, from email 1:
A. Impact version:
As of 2021, One for the World has had a tremendous impact on the lives of those that are helped by our charity Top Picks programs:
[IMPACT SINCE 2021 GRAPHIC]
B. Story/Emotion version:
Here’s our first story this season from Eunice of Kenya. When asked how her life changed when she received the first cash transfer from our partner organization, GiveDirectly, she responded”
“I have been able to make new goals and achieve them since I started receiving this money [from GiveDirectly]. I have been able to buy a piece of land that would have taken [me] many years to earn [enough to buy the land]. I was also able to buy livestock, like goats. I have even managed to dress my family properly by buying them decent clothing. Lastly, I have even been able to [pay my children’s] school fees without any strain.” (Source GiveDirectlyLive)
[PICTURE OF EUNICE]
5) Specify exactly which analyses you will conduct to examine the main question/hypothesis.
We will report all of the following analyses, with our preferred method in bold:
Binary outcomes:
Fisher's exact test
Bayesian Test of Difference in Proportions (as in ), with an informative beta distribution for the prior over the incidence rate in each treatment, with a parameter based on the incidence rates for similar campaigns in the prior 2 years.
Continuous outcomes:
Standard rank-sum tests (Mann–Whitney U test)
Simulation/permutation based tests for whether the mean (including 0's) is higher in group A or B (including 0's)
... same for median, but medians will almost always be 0, we anticipate
All tests will be 2-sided.
We will also report Bayesian credible intervals and other Bayesian measures for the proportion tests. We may also explore Bayesian approaches for the continuous outcomes, e.g., Bayesian beta regression.
We also anticipate reporting multiple-hypothesis-test corrections, but we are not pre-registering a method. Our approach to this is likely to follow that of List et al (2017), which this paper applied to a similar domain (charitable giving experiments with multiple donation-related outcomes).
We will report confidence intervals on our results as well as Bayesian credible intervals under flat and weakly informative priors. Where we have a 'near-zero' result, we will try to put reasonable bounds on it to convey the extent of our certainty that the true effect or parameter was fairly small.
Where situations arise that have not been anticipated in our preregistration and pre-analysis plan, we will try to follow the Don Green lab unless there is a very strong reason to deviate from this, which we will specify.
6) Describe exactly how outliers will be defined and handled, and your precise rule(s) for excluding observations.
Included: All individuals who received this mailing.
\
We will not exclude any observations from the sample, unless they make it clear to us that they are aware of this trial.
We will not Windsorise or exclude outliers.
7) How many observations will be collected or what will determine sample size?
A series of three campaign emails will be sent out by OftW to their regular email lists, to roughly 4000 participants, as described above
Targeted dates: November 10, November 18, November 23, all in 2021, but these may be delayed for feasibility
Other
Anything else you would like to pre-register? (e.g., secondary analyses, variables collected for exploratory purposes, unusual analyses planned?)
Exploratory and secondary hypotheses/questions/analyses
Secondary hypotheses and questions
Which treatment motivates a higher rate of...
Email open rates (note, as we have three obs per participant, we will need random effects or clustered standard errors). and
Use click rates (with same caveat)?
We consider these as secondary because the click and open rates do not necessarily strongly relate to outcomes of interest, particular among this set of already effectiveness-minded donors. These outcomes may simply reflect attention or curiosity about the content.
\
Exploratory: what factors (especially gender, university/student status, university subject) predict which treatment leads to greater donation (incidence and amount)
Note that our partner is planning to use this trial to inform future trials and experiments, particular for the 'Giving Tuesday' season itself.
Power calculations
We did not have time to do even simple power calculations before the start date of this experiment. However, we will try to conduct these before we obtain any of the data, and update this preregistration.
Meet the team
We are a group of researchers and practitioners across a range of fields (Economics, Psychology, Marketing, Statistics) and organizations, particularly those interested in effective charitable giving and effective altruism. This is outlined in the Airtable(invite link), with embedded views below.
This project is organized by David Reinstein, who maintains this wiki/Gitbook and other resources.
As individuals and organizations, we are goal-driven and impact-driven: we are in this to improve the world, particularly through directing funds and support to the most effective rercauses and interventions. Because we share these common goals, we are better aligned for collaboration than typical academics and charitable organizations. We have an unprecedented opportunity to collaborate, learn what works, and 'move the needle'.
We are actively collaborating with the following organizations (links indicate publicly reportable trials)
One For the World (OftW)
Chloë Cudaback is the lead contact (communications manager). (Previously Jack Lewars)
Background on OftW
How does OftW differ from others in this space?
Chloe: Focus on youth and university students at a pivotal point in their life
Accessible messaging, more of a starting point, less gatekeeping
David: 1% is 'more manageable' as a starting point perhaps
Luke: Narrow focus on one type of charities: global health and poverty
OftW has a donor base of ~700 active donors, ~1650 pledged donors (who pledged but haven't started donating yet) and ~2000 lapsed donors.
80% (of donors?) are in the USA
Focus on global health charities
Some key goals
Reinstein/Lewars conversation notes
Activating more donors who took the pledge at university, so their donations actually start;
Retaining donors for longer once they activate;
Upselling donors to give more over time (either more as a raw amount, e.g. 'keeping pace' at 1% of their income; or more as a percentage, e.g. 'graduating' to take the 10% GWWC pledge)
Who/what/how to test, learn, and adapt
Pipeline/groups/segments
Pledgers
Active donors, i.e., "Activated pledgers" (Chloe is thinking of segments to this and how to appeal to them)
Second tier -- people who have given each month for 12+ months; "Legacy donors" (DR: maybe 1x per year high-value donors should be in this group)
Another group worth considering: 'pledge-curious supporters'
Goals/actions
'Activating' Pledgers as donors (pledged but not donated)
Active donors
Retain
Interested in knowing more about
Content -- expand our ability to tell stories about the beneficiaries
Ways to tell these stories
Frequency (of comms with supporters)
Communications contexts
Platforms: Social media, email flows
Telling stories in a corporate context
Typical audiences have been students and young professionals, but there is interest in corporate outreach
Zoom and lunchtime talks in corporate contexts (How many? Seems very promising!)
How many people are activating/pledging following these lunch+learn?
Typical donor journey:
We are in the process of creating these homepages and setting up conversion tracking. As OFTW has ~0 organic sign ups currently, we are testing for a variety of conversion routes, including: [Todo: clarify this]
university campus, someone I like tells me they are involved in OftW, asks me to come along with free food
at some point I take the pledge
It is not a highly controlled process
Some rough numbers
650 active donors
1500 people in pipeline (pre-activation date)
750 new people a year are recruited... thinks it would be 2-2.5k
OFTW has a donor base of ~700 active donors,
~1650 pledged donors (who pledged but haven't started donating yet) and
~2000 lapsed donors.
Ongoing/completed/upcoming experiments
Email upsell emotion/impact message trial (see below)
University experiment - redacted as being prepared
Homepage message testing
Activation trial
Message Test (Feb 2022)
Summary
Main Question: Do some message themes work better than others for drawing visitors to Giving What We Can’s landing page?
Main findings: 'Social proof messages' on Facebook ads were most effective at generating landing page views per dollar compared to other message themes (effectiveness, services, giving more, and values).
Future directions: There were significant differences in 'link clicks per dollar' on the different messages by age. We recommend a systematic test to determine if age makes a difference in the relative effectiveness of social proof and values messages. Future studies could explore why the social proof message was more effective in this study than the previous giving guide study and the importance of the message to “join” the movement as social proof.
Possible connection between this trial and the : Note that the two best-performing messages both prompted the user to “join” a movement or a group of people (perhaps an elite group); but beware .
to report below.
Pre-trial reporting template
General idea, main 'hypothesis' (if there is one)
In this test, we are aiming to find out if one 'theme' of messages resonates better with our target audience than others.
If we knew which 'themes' were most effective with our advertising, then we could create more ads on this theme and improve our conversion.
Specifically, which of the following themes resonate with our target audience the most:
effectiveness
giving more
social proof
On choosing an objective of this test, originally I planned to use link clicks, but this is not the most high quality indicator of conversion, and when I tried to use newsletter signups Facebook warned me that I might not see any conversions at all... So instead, the campaign will optimise for landing page views, which is slightly better than a link click and will generate enough conversions that we should [see?] we statistically significant results.
Point of contact (at organization running trial)
Grace Adams
Timing of trial (when will it/did it start and end, if known)
Trial will run for 7 days on GWWC's ad account, from 9.30am AEDT Friday 25 Feb to 9.30am AEDT Friday 4 Mar.
Digital location where project 'lives' (planning, material, data)
Working document can be found but all important details will be listed in this brief
Environment/context for trial
This test will take place on Meta platforms including Facebook and Instagram
Participant universe and sample size
We are targeting a "Philanthropy and Lookalikes (18-39)" audiences, based in UK, US or Netherland
Estimates from Facebook: Reach is expected to be 1.4K-4.1K per day (7 days) per ad set (5 ad sets) = 49K-143K
Estimates from Facebook: Conversion is expected to be 10-30 landing page views per day (7 days) per ad set (5 ad sets) = 350-1050
Key treatment(s)
We are using the GWWC Brand Video by Hypercube as the creative across all tests. Although it did not perform as well as our other ads in the Giving Guide campaign, I think that it will interfere less with our messages we aim to test.
We are going to test a set of messages for each theme, please see them in the
Mock up of ad:
Treatment assignment procedure
This test has been set up as an A/B test through Facebook, testing each campaign head to head, each campaign covers one theme, with the different ads as a child.
This will allow us to test which theme was better, not just which individual ads
A/B testing on facebook will ensure that the audiences fall into an individual treatment group
Outcome data
Primary measure will be cost per landing page view, but secondary measures such as CPC, 3 second video plays, email sign ups will also be tracked
Data will live on Meta ads platform
Progress/goals (early 2023)
Late-2024 update: This project is on hiatus/moved
Note from David Reinstein: The EA Market Testing team has not been active since about August 2023. Some aspects of this project have been subsumed by Giving What We Can and their Effective Giving Global Coordination and Incubation (lead: Lucas Moore).
Nonetheless, you may find the resources and findings here useful. I'm happy to answer questions about this work.
I am now mainly focused on making a success. I hope to return to some aspects of the EAMT and effective giving research projects in the future. If you are interested in engaging with this, helping pursue the research and impact, or funding this agenda, please contact me at daaronr@gmail.com.
Introduction
Late-2024 update: This project is on hiatus/moved
Note from David Reinstein: The EA Market Testing team has not been active since about August 2023. Some aspects of this project have been subsumed by Giving What We Can and their Effective Giving Global Coordination and Incubation (lead: Lucas Moore).
Nonetheless, you may find the resources and findings here useful. I'm happy to answer questions about this work.
I am now mainly focused on making a success. I hope to return to some aspects of the EAMT and effective giving research projects in the future. If you are interested in engaging with this, helping pursue the research and impact, or funding this agenda, please contact me at daaronr@gmail.com.
For a data-driven dynamic document covering (some of) our trials & evidence see
In the Partner Organizations and Trials section, you will find reports of the trials we have run with organizations, including and .
These trials are also cataloged in our Airtable: (that is...); links, categorization provided.
Our primary approach and goals
We want to identify the most effective and scalable strategies for marketing EA and EA-adjacent ideas and actions. To do this, we believe that running real-world marketing trials and experiments with EA-aligned organizations will provide the best evidence to act upon. By systematically varying the messaging, framing, and contexts, we can map out 'what works better where'.
We believe this approach is likely to be the most fruitful because:
Using naturally-occurring populations in real-world settings with meaningful costly choices and outcomes will lead to more relevant findings. In comparison to convenience samples of undergraduates or professional survey participants who are aware that they are doing a research study, we anticipate greater:
Internal validity: our results are less likely to be influenced by biases, such as acquiescence bias and hypothetical decision-making.
Key themes, priorities, and 'high-value questions'
This project primarily aims at:
Robust and generalizable insights that improve communication and messaging
Meaningful and relevant long-run outcomes, such as;
Creating new, strong EAs by getting people more interested and involved in EA ideas, actions, and the community
In the document below (), we consider the shared goals, paths, and questions that are valid across organizations. Specifically, these are actionable and promising themes and projects that can be implemented, measured, and communicated fluidly throughout the EA network.
Pre-giving-tues. email A/B
Context: Donation 'upsell' to existing pledgers
Question: Are effectiveness-minded (EA-adjacent) donors and pledgers more motivated to donate by
"A": (non-quantitative) presentation of impact and effectiveness (as in standard OftW pitch)
"B": Emotional appeals and 'identified victim' images
Further information on experiment and outcomes in in-depth replicable analysis, organized in dynamic document
General idea, main 'hypothesis'
Are effectiveness-minded (EA-adjacent) donors and pledgers more motivated to donate by
"A": (non-quantitative) presentation of impact and effectiveness (as in standard OftW pitch)
"B": Emotional appeals and 'identified victim' images
In the context of One for The World's (OFTW) 'giving season upselling campaign', potentially generalizable to other contexts.
Academic framing: "Does the Identifiable Victims Effect (see e.g., the meta-analysis by Lee and Feeley, 2016) also motivate the most analytical and committed donors?"
Background and context
One for The World's (OFTW) 'giving season upselling campaign''
10 emails total over the course of November were sent in preparation for GivingTuesday
Point of contact (at organization running trial)
Academic-linked authors: David Reinstein, Josh Lewis, and potentially others
Timing of trial
Targeted dates: November 10, 18, 23, all in 2021, but may be delayed for feasibility
Digital location where project 'lives' (planning, material, data)
Present Gitbook, Google doc linked below, preregistration (OSF), and github/git repo
Environment/context for trial
Emails
... to existing OftW pledgers (asking for additional donations in Giving Season)
All 10 emails had the same CTA: make an additional $100 donation for the giving season/GivingTuesday on top of their recurring monthly pledge donation.
Participant universe and sample size
Roughly 4000 participants, as described.
A series of three campaign emails will be sent out by OftW to their regular email lists, to roughly 4000 participants, as described.
Key treatment(s)
Basically:
A list of ~4500 contacts (activated pledgers) was split into two treatment groups.
Treatment Group A received emails that were focused on the contact's impact
while Treatment Group B received emails that were focused on individual stories of beneficiaries
See
Treatment assignment procedure
See preregistration
Outcome data
Targeting: Donation incidence and amount in the relevant 'giving season' and over the next year, specifically described in prereg under
Data storage/form:
MailChimp data (Chloe is sharing this),
Reports on donations (Kennan is gathering this)
Optional/suggested additions
Planned analysis methods, preregistration link
Cost of running trial/promotion: Time costs only (as far as I know)
Proposed/implementing design (language)
(
Pre-registration work
Pre-registered on OSF in 'AsPredicted' format, content incorporated here
Preliminary results
Overview:
The Emotion treatment leads to significantly fewer people opening emails, but more people clicking on the in-email donation link (relative to the standard Impact information treatment). However, we are statistically underpowered to detect a difference in actual donations. More evidence is needed.
Chloe: those emails that appealed to emotional storytelling performed better (higher in-email click rate) than those that were impact-focused.
DR, update: I confirm that this is indeed the case, and this is statistically significant in further analysis.
Evidence on donations
(preliminary; we are awaiting further donations in the giving season) ...
This is 'hard-coded' below. I intend to replace this with a link or embed of a dynamic document (Rmarkdown).The quantitative analysis itself, stripped of any context and connection to OftW, is hosted
Note: We may wish to treat the 'email send' as the denominator, as the differing subject seemed to have led to a different number of opens
Treatment 1 (Impact): We record
1405 unique emails listed as opening a ‘control’ treatment email
29 members clicking on the donation link in an email at least once (2.1% of openers)
15 members making some one-time donation in this period (about 0.11% of openers, 0.075% of total)
Treatment 2 (Emotional storytelling):
1190 unique emails listed as opening an email (a significantly lower 'open rate', assuming the same shares of members were sent each set of treatment email)
56 members clicking on the donation link in an email at least once (4.7% of openers)
11 members making some one-time donation in this period (about 0.9% of openers, about 0.055% of total)
Note: We may wish to treat the 'email send' as the denominator, as the differing subject seemed to have led to a different number of opens
‘Initial impressions of preliminary outcomes’
The conversion rates are rather low (0.5%) … but maybe high enough to justify sending these emails? I’m not sure.
While people are more likely to O_pen_ at least one Impact email, they are more likely to Click to donate at least once if assigned the Emotion email
But we can't say much for actual donations.
The figure above seems like a good summary of the ‘results so far’ on ‘what we can infer about relative incidence rates’, presuming I understand the situation correctly …I plot
Y-axis: ’how likely would a difference in donations ‘as small or smaller in magnitude’” than we see in the data between the incidence … against
X-axis: if the “true difference in incidence rates” were of these magnitudes
Implementation and management: Chloe Cudaback, Jack Lewars
Our data is consistent with ‘no difference’ (of course) … but it's also consistent with ‘a fairly large difference in incidence’
E.g., even if one treatment truly lead to ‘twice as many donations as the other’, we still have a 33% chance or so of seeing a difference as small as the one we see
We can reasonably ‘rule out’ differences of maybe 2.5x or greater
The Life You Can Save (TLYCS)
Leads: Bilal Siddiqi, Neela Saldhana; Other partner contact: Jon Behar (Giving Games)
We have completed various trials in conjunction with , the most recent being the city-level YouTube test. There are a number of additional proposed trials and tests, however, at the moment these considerations are limited to the private Gitbook.
Note that in the past TLYCS has worked with the Graduate Policy Workshop School of Public and International Affairs at Princeton University, who produced the report embedded below.
Seek feedback on our work. This includes technical and industry feedback on implementation and academic expertise (literature reviews and frameworks to consider, methodology, and experimental design).
(Grouped by organizational partner.)
We include background information on each organization and its priorities for testing.
don_general_gs: (If observable), the amount the person donates during the 'Giving Season', as observed through the OftW/donational/Plaid network
don_general_1yr: (If observable), the amount the person donates during the 'Giving Season' and for the following year (ending 15 January 2023) as observed through the OftW/donational/Plaid network
d_continue_pledge_1yr: Whether the person is still an active OftW pledger a year after the current giving season (15 January 2023)
External generalizability/environmental relevance: the context we are testing is similar or identical to the context we care about.
We will "learn by doing" by encountering unanticipated obstacles and learning about practical implementation issues involved with advertising, promotion, and communication.
We can share what we learn with relevant EA organizations and audiences. They then can build on our findings, rather than having to repeatedly make mistakes themselves.
The trials themselves should also have a direct positive value in promoting EA.
There is limited downside risk. We are generally not testing risky messages and are careful to avoid diluting or misrepresenting EA's core ideas.
Having people consider and identify with key values and practices, such as making meaningful altruistic choices, considering effectiveness and impact in doing so, strong analytical and epistemic practices, and broad (or carefully considered) moral circles
Substantial impactful donation behavior and choices
Across a range of EA causes and groups (longtermism, global health, animal welfare)
8 members emails donating (likely) through the link (0.057%/0.04%)
9 unique emails donating (likely) through the link (0.08%/0.045%)
Given the low conversion rates we don’t have too much power to rule out ‘proportionally large’ differences in conversion rates (or average amounts raised) between treatments …
Main point: given the rareness of donations in this context, our sample size doesn’t let us make very strong conclusions in either direction about donations
Reinstein and others work with charity partners, some of which are not EA-aligned (but perhaps moderately effective), which inform EA giving. Several trials focus on the 'impact of impact information'
https://app.gitbook.com/u/WrM9GjKWCyRyoIjCKt7f0ddJwCr1's research (along with others) considers 'how do potential donors respond to (different presentations of) impact information'. Reinstein and his academic partners ran several experiments, working with (and on) mainstream charities and fundraising platforms.
See work:
and discussion:
Other work is ongoing and cannot be publicly shared yet (see private gitbook if you have access).
July 20, 2021: GWWC launched a YouTube remarketing campaign. That means that when someone goes to the GWWC website, leaves, and then goes to YouTube we show them one of the following videos:
Algorithm decides which video to present to people.
Q: Is each video assigned to a different situation or are videos randomly chosen to be displayed? If the latter, you could randomize videos by location and see if the different videos were more or less effective. Alternatively, just randomizing the whole campaign seems like a good idea to me....
A: Videos are selected based on the likelihood of the user watching >30 seconds (by the algorithm) ... randomization by individual will be hard because users don't click and act right away. Instead I think we have to randomize by geography
Results summary (Early, JS Winchell; may need update)
Most important takeaway: It costs $1 to get a website visitor to watch 1h of your videos!
High level metrics
Cost: $205
Views: 6,071 (a view is when a user chooses to watch >30s of an ad)
Total watch time: 223 hours (~$1/h)
Interesting observations\
Efficiency has significantly improved over 3 weeks
Cost per view has gone down from $0.05 per >30s view to $0.02 per >30s view
Views have increased 75% without increasing budget (from 220/day to start to 386 yesterday)
2. 10% of the time people watched the full video!
\
You can see this data by video if you are interested to control for video length
E.g., 5% of people chose to watch the entire 13 minutes of _
3. Your best video had a view rate (% of time people choose to watch >30s) twice as good as your worst video
4. You can see view rate by age, gender, and device in the "Analytics" tab
For the , older people and men were more likely to choose to continue watching
Possible next steps
Could add "similar audiences" which is when we let Google use machine learning to find people similar to your website visitors and also show ads to them
Could walk David Reinstein and Joshua Lewis through the UI so they can get a sense of the metrics/reporting available and how it could be used for research
University/city groups
EA seeks to amplify its impact through movement-building. Organizations like 80,000 Hours and CEA are putting substantial resources into developing and expanding the EA community. Building EA groups has been at the core of this agenda, especially in elite and influential places (such as top universities). Key aims include 'creating highly engaged EAs' and encouraging people to pursue impactful careers.
Resources and tips
(in-progress)
Our collaborative goals
Currently, university EA groups operate in conjunction with the Centre for Effective Altruism, but with high levels of autonomy. There is only limited collaboration between groups. Such collaboration could allow them to achieve economies of scale and scope, run more systematic and powerful trials, and share insights and methods that increase student engagement.
The EAMT hopes to help coordinate this, consolidate the evidence, and provide accessible tools to newly-formed groups. We want to help avoid repeating errors and 'reinventing the wheel' each time.
EA groups at universities
The efforts and experience of individual EA groups can provide contextual evidence and insights. The EAMT aims to aggregate this knowledge, find generalizable principles, and disseminate this to the wider EA community. We are focused on meaningful medium-term outcomes, e.g.:
Membership and participation in EA organizations, and markers of post-university involvement
How career plans are impacted (focusing on particular programs and paths)
How research and discourse at universities can be influenced
Relevant organizations and programs
The programs below also aim for generalizable principles; e.g., their 'starter toolkits' are implemented across a range of cities, universities, and settings.
Centre for Effective Altruism
CEA has , passing funding and efforts on to Open Philanthropy. However, CEA is still involved in promotion through the (UGAP), which offers guidance and resources to newly formed groups. Furthermore, CEA's (CBG) Program helps develop national and city-based groups (outside of universities).
University Group Accelerator Program
may be the best current source of centralized knowledge for approaches to outreach methods. These have been summarized from different data points; some formal testing, some anecdotal, and some intuitive.
Community Building Grants Program
CBG focuses on supporting city groups, providing grants to support their activities and resources to help with expansion. These resources and support systems currently lack data supporting EA community building. (The identified this as a major bottleneck; we hope to collaborate to help them improve this.)
EA Group Organizers Survey
The is a collaboration between CEA and Rethink Priorities. It analyzes the changes in EA groups yearly, with two main components:
The growth and composition of EA groups and their activities
The opinions of the group's status from the organizer's point of view
The first component gives insight into priorities and progress. The second can help guide our research and provide insight into the tools required by group organizers to increase group interaction and outreach.
See especially:
Open Philanthropy
provides funding for part-time and full-time organizers helping with student groups focused on effective altruism, longtermism, rationality, or other relevant topics at any university (not just focus universities). This has replaced CEA's Campus Specialist and Campus Specialist Internship programs.
, a selective 2-year program that gives resources and support (including $100K+/year in funding) to particularly promising people early in their careers who want to work in areas that could improve the long-term future. (Intended partially for particularly strong Campus Specialist applicants.)
80,000 Hours
80,000 Hours is actively targeting university students and offering them guidance on high-impact career paths. (see private Gitbook, if you have access)
Further outreach
There are some further initiatives in this area but most of the material cannot be shared at the moment (see private Gitbook).
Independent group testing and coordination
In this section, we are putting together documents, trials, and knowledge currently being gathered by different EA groups. As we increase our collaboration with these groups, these trials, ideas and documents will become integrated with the Gitbook and EAMT's work, forming a basis for future work and testing.
Funnel Map
This is our basic understanding of the processes used to draw in new members to EA university groups and fellowships, and how members progress through different stages of engagement. Each stage gives us grounds for testing through the different variations of these approaches. This is not just about testing which methods work for attracting the highest number of new members (i.e., which 'call to action' to use at activity fairs, etc), but also increasing engagement and developing high-level EAs (i.e., fellowship program alternatives, discussion group topics, etc).
(Above: a preview of funnel map; for full description and work in progress)
Stanford University
Awaiting response from Stanford EA.
University of Chicago
Currently limited to private Gitbook.
MIT, MIT Alignment org
Currently limited to private Gitbook.
EA Israel
This discusses their strategy in-depth. A lot of their findings are not specific to Israel or country-wide EA groups. Useful as a resource for EA groups.
Useful findings will be synthesised and integrated here in the future.
Independent University Group Outreach
We have been independently contacting organizers that are known to be actively seeking to test outreach methods, and also publicly via a on the EA Forum. An important aspect of the work here is to bring together people who are active in this space but working independently.
The airtable below presents our current (non-exhaustive) list of groups or organisations that have relevant knowledge (strategy documents, marketing guides, etc), or have done some form of independent testing.
Writing credits for this page
Kynan Behan helped create and write this page.
Thus, we hope our efforts will be valuable to these initiatives and groups, by providing and sharing evidence on successful approaches to increasing engagement.
Note that the survey does not collect data from the group's *members*, although they do ask about the overall numbers of people who engaged with each group.
Workplaces/orgs
What's been/being done, what do we know?
"Innovations in fundraising" earlier work
Innovations in fundraising ‘knowledge base’ and … was the knowledge sharing I tried to get going (as you can see, not very much was shared)… as an academic, with very limited funds. I also worked a bit with George Howlett at CEA on his ‘Workplace Activism’ project.For this part of the project I was focusing on
How to get your organization to support effective charities (or at least, not limit their generosity to local causes)?
E.g., that offered giving incentives, and whether these were ’EA-promising”
How to get a fundraising event or giving game going within your org.
Consideration: Key obstacles/questions for workplace action
"Format questions:"
Which audiences
Next to consider: "Opportunities" (to 'do', measure, learn) ... we should make an inventory here
Discussion space
In case you don't like writing in this Gitbook, I created
Posts and writings
EA Forum posts
See the Google doc embedded below HERE and feel free to add comments or questions.
Related/relevant projects/orgs
Note 7 Mar 2023: I just started this page, it is far from complete
Workshop: a guided discussion ... why do you give etc. "I want to help more" etc. Workshop/worksheet. Philanthropic goals The 'five whys'? Keep asking why and they sometimes get down to base suffering. 'Guiding but not leading'.
Much of which is embedded into THIS Airtable view as well (which will have some further comments on the relevance, as well as organizations that are not-so-EA related, with discussion)
(a list of orgs in the 'EA effective giving' space; private gitbook atm)
Non or semi-EA initiatives
EA or 'effective giving' orgs working with foundations and wealthy donors
Research and information-gathering initiatives
innovationsinfundraising.org - "The IiF wiki collects and presents evidence on the most successful approaches to motivating effective and impactful charitable giving, and promotes innovative research and its application." This precedes and is partially integrated into the current resource
We are an academic collective and research non-profit, dedicated to providing public communication campaigns with cutting-edge research and rigorous tools for message development.
"Crowdsourcing" ... Recent research suggests that regular people can often be far more effective than experts at predicting which messages will best resonate with others in their community.
The challenge is that the “space” of messages for campaigns to decide between is enormous — there are very many things a campaign could say and many different ways to say them. Unfortunately, research shows that relying on theory and expert guidance about “what works” when designing campaign messages is unlikely to be effective by itself, because “what works” is difficult to predict and can change dramatically across contexts (e.g., see [1], [2], [3], [4]).
-->
Efficient message search. We design research pipelines that allow campaigns to explore the large space of potential messages more efficiently, and to quickly zero-in on the most impactful messaging strategies. Our methodology is based on a combination of large-scale adaptive online survey RCTs, Bayesian machine learning and surrogate metrics.
To test content in more depth than an A/B trial permits
Better control over 'who is participating' and how much attention they are paying
Things more towards 'basic background research'
Closer to a 'representative sample'
Some participant recruitment platforms
: Created specifically for (academic research). Our impression is that this is among the highest quality panels, although there is some controversy over this.
Contexts allowing individual randomization & tracking of medium-term outcomes
GWWC web site at point of email signup
Email lists
immediate: subject headers w/ 'open rates' as dependent variable
Contexts for 'Immediate outcomes' (clicks etc)
Facebook; But the targeting algorithm may frustrate randomization. (see .) Can it be switched off?
Contexts allowing
See
This is helpful if the important outcomes can be tracked by ZIP code/post code/address.
Online display advertising
Google search
YouTube
Testing Rich Content
How to test rich content?
We can use some of the same strategies as above to test "rich content", i.e., short or even long talks, book chapters, podcasts, and so forth.
However, we may also want richer more detailed 'qualitative' feedback...
Paid participants may allow richer feedback (see )
Emails might be an opportunity
What to test in 'rich content'
Does the messenger matter?
Does the messenger demographics and appearance matter?
Message customization (heterogeneity and targeted marketing)
We haven’t thought about this much but it seems important – it might be worth, for example, having different messaging for different cause areas and letting them be algorithmically targeted.
Imagery/non-content considerations
How many images to include on a page?
How much text to include in a page?
Targeting
See also
Question: If our aim is to change the culture of giving in general, what kind of people should we be targeting?
Influencers (People with lots of social influence)
Low-hanging fruit (i.e., people who are naturally predisposed towards effective giving, pledging, & EA)
Idea: Compare different outreach methods on the basis of "cost per pledge" (or per "whatever-metric-we-use"). (Outcomes: ... & ... )
Ideas/methods for targeting: platforms and audiences
Some audiences and approaches to targeting
Public lists of political donations (e.g, )
... donors to candidates sympathetic to a relevant cause area
Facebook
Cost of ads: benchmarks
Reinstein, FB ads tied to fundraisers.
Note: this information is subject to change; updated ~ Apr. 2022
My costs have been:\
about $0.01 per impression
about $0.50 - $1.20 per click
Targeting at Universities ... Facebook's estimates
The estimated cost per impression (?‘reach’) and per click varies with the targeted audience. In general, narrower is estimated to be more costly. I think this is about ‘a larger audience allows FB to serve the ads to a larger number of people who tend to be click-happy’
Some data points:
For Oxford, ‘In College’, living in the UK: They claim we will get 4-18 clicks per day for $50 per day over 2 days (29 Mar 2022 check on FB ads manager)
If I put in Birmingham instead I get a fairly similar figure.
If I remove the only-one-university narrowing, it gets cheaper. They claim I’ll be able to get 86-250 clicks per day for the same cost …
Meta allies
Research advancement manager:
EA groups (employees) within Meta
If you run a lot of ads FB will assign you external consultant helpers. They are somewhat helpful, but they don't seem to know everything.
FB tips for charities/fundraisers
28 Nov 2022, Zoroob:
Meta has just released a recorded series of videos () to help non-profit organizations meet their year-end fundraising goals. Some of these materials may also be helpful for researchers using Meta ads (e.g., materials on designing effective ad creatives), so I am passing the info along. Blurb below.
The three-part series of virtual webinars provides nonprofits with advertising training and best practices around how to use Meta technologies to further their missions:
: Get started with Meta advertising with our today.
The session also features that enable donation transactions within the Facebook app.
: Learn what great nonprofit creative can look like with best practices from Meta.
Consider saving our to learn more about the five key creative considerations that apply to cause-driven campaigns.
: Introduce yourself to measurement best practices on Facebook and Instagram! Afterwards, explore split testing, lift measurement, and the experiments tool, on our
Other links and issues
(collecting data)
Seems particularly useful but access is limited; they hope to make it more generally available some thing like mid- spring 2023.
Doing and funding ads
Guidelines and resources on how to get ads and marketing going, how to finance it, tips on how to do it right
For a trial to yield insight, we need to be able to track and measure meaningful outcomes, and connect these to the particular 'arm' of the trial the person saw ... (if they saw any arm at all)
Notes from conversations (need explanation)
In this section we discuss how to see the results of your promotions and trials, and how to access data sets of these results that you can analyze.
and
and
Aside notes on modes of testing and tracking
Tracking: See a page, track action afterwards
Putting ads on Youtube and testing click through
Pivot tables
Pivot tables
You may want to see or export crosstabs of one outcome, user feature, or design feature, by another. Sometimes you just want to see these quickly, but this might also be a way to extract the 'raw data' you wish to analyze elsewhere.
Start new pivot table
From within Ads ManagerFrom 'ads reporting' (3 Aug 2022 updated interface)
Click "Create Report" --> Pivot table
2. As before, make sure you've selected the right date range, and (redo) any relevant filters
Here I add a filter for 'campaign name ' contains 'general'. Because I'm specifically trying to pull down some information on 'which video people saw' in this group (which needs a special setting to access... as noted below)
3. "Customize pivot table" – "Breakdowns" ... the things you want this to disaggregate across (sums and averages within groups)
the 'campaigns', the 'ad names'
timing, demographics
Drill down to "Custom breakdowns", "Dynamic Creative Asset", to get it broken down by the text linked to the ads:
However, some breakdowns are not compatible with other breakdowns (maybe for privacy reasons?)
For example, if I tick 'Gender' I cannot have it broken down by 'Image, video, and slideshow', at least in the present case ... (perhaps because it narrows down too few observations?)
4. "Customize pivot table" – "Metrics"
Select the things you want reported, and deselect things that are not interesting or irrelevant to this case (like 'website purchases') or numbers that can be easily computed on your own
Normally, I'd suggest leaving out the redundant 'Cost per Result' but it's probably good to have as at least one sanity and data check.
Other stuff like 'video play time' could sometimes be very relevant, but I'll leave it out for now
I added a few features I thought might be interesting or useful. Was anyone drawn in to pledge? When did each campaign start/end (doublecheck)? How many unique link clicks?
5. (Optional) Conditional formatting
This could also be helpful if you are using the Ads Manager tools in situ, but obviously this has no value for downloading.
6. Save report for later use, share
If you think the report is useful in-situ, you can also share a link
7. Export the data
As in ...
(or consider direct import into R using tools like the rfacebookstat package)
Methods: Overview, resources
Sections
"Qualitative" design issues: How to design the 'content' of experiments and surveys to have internal validity and external generalizability
Clicking on a particular 'experience' in the 'container'...
(if you have been granted read and analyze permission), will open the useful 'Optimize Report' (which Google explains )
Optimize report: top
The overall start/end and 'sessions' are given first. What are "sessions"? The short answer: 'Sessions' are the number of 'continuously active' periods of an individual user. So individual users may have multiple sessions! (see below). Here, there have been 7992 such 'sessions' over 81 days.
I am not sure where we can learn 'how many users there were'.
("View full chart" can give you a day-by-day breakdown of the number of sessions.)
OR: Conversion rates section
The next section compares 'sessions' and 'conversions' by treatment, and does a Bayesian analysis. This seems the most useful part:
Relative conversion rates, analysis
Above, the 'Separate block' (SB) seems to be the best performing treatment. Google calculates a 2.69% conversion rate for this (here, presumably the rate of people checking 'any' of the follow-on boxes).
Considering the Analysis, Google Optimize "uses Bayesian inference to generate its reports... [and] chooses its priors to be quite uninformed." The exact priors are not specified (we should try to clarify this).
But if we take this seriously, we might say something like ...
if our initial priors gave each treatment an equal chance of having the highest conversion rate ('being best'), and assumed a [?beta] distributed conversion rate for each, centered at the overall mean conversion rate ...
then, ex-post, our posterior should be that the SB treatment has an 80% chance of being best, our 'Original' has a 17% chance of being the best ...
Google also gives confidence intervals for the conversion rates for each treatment, with boxplots and (95%) credible interval statistics:
The grey bar for the baseline is mirrored in all rows. The 95% CI for the 'improvement over the baseline' is given on the right. But this is a rather wide interval. More informatively, if we hover over the image, we are given more useful breakdowns:
Although this does not exactly tell us the 50% interval 'improvement over the baseline' (this would need a separate computation), we can approximately infer this.
But fortunately it is reported in data we can download; see below "Download (top right)".
From that data, we get:
Variant
2.5th Percentile Modeled Improvement
25th Percentile Modeled Improvement
Modeled Improvement
75th Percentile Modeled Improvement
97.5th Percentile Modeled Improvement
Our 'posterior' probability thus infers (assuming symmetry, I think) that we should put (considering odds ratios, not percentage points)
a 2.5% chance of SB having an 18% (or more lower rate of conversion than 'Original'
a 22.5% chance on SB being between 18% worse and 4% better
a 25% chance of being 4-20% better
We can also combine intervals, to make statements like ...
a 50% chance of being 4-36% better
a 50% chance of being 20-76% better
We report on this further, for this particular case, under
There is some repetition (can we 'mirror blocks'?)
Session balance
Above, even though the treatment has been assigned randomly (presumably a close-to-exact 1/3, 1/3, 1/3 split), the number of 'sessions' differs between the treatments ('variants').
Why? As far as I (DR) understand,
while each individual user (at least if they are on the same machine and browsing with cookies allowed) is given the same treatment variant each time...
the same users may 'end' a session (by leaving or being inactive for 30+ minutes), and return later, getting the same treatment but tallying another 'session'. This suggests that users in the "Separate Block" (SB) treatment are returning the most (but also see 'entrances' below).
Breakdown over time
The final section gives the day to day breakdown of the performance of each treatment, presumably, along with confidence intervals. This seems relevant for 'learning and improving while doing' but possibly less relevant for our overall comparison of the pages/treatments.
Download (top right)
The 'Analytics data' gives us sessions and conversions by day and by treatment.
(Where no session occurs in a day for a treatment, it is coded as blank).
Clicking on 'view in analytics'
... this gives some other information, mainly having to do with the user experience.
"Unique page views" represent "the number of sessions during which that page was viewed one or more times." ... Recall "sessions" are periods of continuous activity.
"Entrances" seem potentially very important. According to Google:
Sessions are incremented with the first hit of a session, whereas entrances are incremented with the first pageview hit of a session.
In the present context, this suggests that the 'Separate block' page is inspiring users to come back more often, and to spend more time on average.
Sessions vs. Users
As noted, essentially: 'Sessions' are the number of 'continuously active' periods of an individual user
Analytics measures both sessions and users in your account. Sessions represent the number of individual sessions initiated by all the users to your site. If a user is inactive on your site for 30 minutes or more, any future activity is attributed to a new session. Users that leave your site and return within 30 minutes are counted as part of the original session.
The initial session by a user during any given date range is considered to be an additional session and an additional user. Any future sessions from the same user during the selected time period are counted as additional sessions, but not as additional users.
medium-term: all outcomes tied to email
LinkedIn
Facebook (presumably)
Surveys with professional participants
Surveys with undergraduates
Here generalizability may be a challenge, particularly extending inference from convenience samples to larger and more general populations. "Might be good to think of creative ways of doing that though, e.g., looking at which content creates the most extreme enthusiasm."
Does it depend on the audience?
What’s the optimal length?
How many buttons?
How many choice options?
The 'mysterious sauce' ... JS knows about (Video ads/Best-practice guidelines)... we don't always have a "theory" but it might be meaningful.
Internet activity ... those who watch/read/search for:
Videos relevant to a cause area
Reddit threads relevant to a cause area
Magazines/news sites relevant to a cause area
Search/visiting webpages about charity effectiveness/merit (e.g., Charity Navigator) 👍
Education
Courses/degrees/majors relevant to a cause area
(e.g., development econ/studies, animal behavior, AI)
People at high-status institutions (future influencers/policymakers)
Exploiting social network structure
Targeting "influencers" and "central" people (on the basis of "number of followers" / friends / etc.)
Key search terms (google 'effective giving' etc)
Podcast listeners (philanthropy, economics, development & global health ...)
Other approaches include time-series and difference in difference in response to switching on or off the ad, trial, or page content.
Nick: Branch and Amplitude/apply/segment ... to track someone throughout the whole funnel
See public 'open science' work in progress and preliminary results HERE
'Thanksgiving email' trial run in 2 subsequent years
Super-overoptimistic information (2018), Moderately overoptimistic information (2019)
Other partners
"Charity Elections" (in schools): trials in preparation, extensive consultation
80000 hours: trials in progress, preparation, and analysis; some work joint with Rethink Priorities. Note: we have limited permission to report on these trials
The Life You Can Save: Trials run and in preparation, limited permission to share
(HIP): Advising on surveys and approaches
GiveWell (discussions and consultation)
... And other organizations that didn't want us to report on this publicly
ICRC - quick overview
See public 'open science' work in progress and preliminary results HERE
Then search and select your desired ‘metrics’ (outcomes) of interest. “Users” and “sessions” seem pretty important, for example.
Next you can break this down by another group such as “city”. You can put in 'filters' too, if you like, but so far I don't see how to filter on outcomes, only on the dimensions or groups.
I don't know an easy way to tell it to “get all the rows on this at once.” but if you scroll to the bottom you can set it to show the maximum of 5000 rows.
Next, scroll up to the top and select export. I chose to Export it as an Excel spreadsheet., as this imports nicely into R and other statistical/data programs.
We were able to do this in two goes, but for larger datasets this would be really annoying. I imagine there is some better way of doing this., maybe a way of using an API interface for Google Analytics to just pull all of this down.
A partial workaround fix is to do a ‘filter’ to discard rows you don’t need… click ‘advanced’ at the top and…
Reconciling FB/GA reports
Facebook's Ad Manager and Google Analytics often report results that seem to have discrepancies. Below, one particular case, and possible explanations.
What is going on 'in our latest trial'?
Facebook: We have 50k+ unique impressions, and 1335 clicks
Google Analytics records only 455 page views, 403 users
And only about 20 doing any sort of Engagement like scroll or click (if we read it correctly)
1. Where do the other 600 clicks end up? Ad blockers? Do they click the ad and shut down before the page comes up?
JS: main reasons [DR: slightly edited[
1. "Do they click the ad and shut down before page comes up?" Yup! Closing the page before the redirect fully loads. Facebook will be as generous as possible with their click reporting.
2. ... If a user clicks on the FB ad twice within 30 minutes, then Google Analytics would record that only as a single user and a single session.
3. If a user has JavaScript disabled or doesn’t accept cookies, then Google Analytics doesn’t track.
Leticia at Facebook: can be mistaken clicks, this is common.. need a pixel to fix this ..., can change it to 'landing page view'
2,. How is it possible that 455 people come to the page and only 20 (under 5%) of them actually even do anything on the page?
Facebook ads interface
How to get data from trials of Facebook ad
Using Meta ads manager reporting suite
Extracting simple results
Go to "the reporting suite in Meta ads manager"
How to get to the 'reporting suite' in Meta ads manager view as above?
URL should look like:
Go to
2. Specify some filters:
This gets us the screen below
3. Specify the date range.
Here “Effective Giving Guide Lead Generation campaign … ran late November 2021 - January 2022"
(Careful in specifying the dates; the interface is weird)
After specifying these dates, more information comes up in the basic columns:
5. Export simple results for Campaigns
Click 'Reports' ... upper right.
We can 'create a custom report', which saves this for later tweaking, or merely 'export table data'. I will do the latter for now:
csv or xls?
.csv and .xls formats are about equally good; R and other software can import either one. I'll choose csv because it's a tiny bit simpler... but in other contexts, xls might be useful for exporting multiple sheets.
Note: I chose CSV and do not include summary rows, to avoid confusion later.
Exploring alternative: direct input into R
See tools like the rfacebookstat package; docs here
Now I import this data into R (I usually use code but let's do it the interactive way for illustration)...
It seems that the option 'include summary row' was probably not wanted here, and that row with blank 'campaign name' could cause confusion.
It seems to have removed the "bid strategy" column, and added 'reporting starts' and ...'ends' from the filter. Otherwise, everything else seems the same as in the ad manager view, although some labels have changed.
Campaigns, ad sets, ads
What's the difference between these?
FB/Meta gives some explanation , although it leaves some open questions.
You set the advertising objective at the campaign level. Here you decide the end goal for your ads, like driving more likes to your Page. At the ad set level, you define your targeting strategy by setting up parameters like targeting, budget and schedule. Finally, your ads are creative visuals, like pictures or videos, that drive the audience to what you are trying to promote.
We see three tabs
Campaigns
Ad sets for 1 campaign
Ads for 1 campaign
Campaigns
Here we have 7 campaigns, each with separate budgets, and start and end dates (although these mainly overlap).
It looks like some campaigns were set up for direct comparison or "A/B" perhaps, with the exact same budgets and end dates, and similar names:\
Ad sets
Here, there are 52 total 'ad sets' across all campaigns.
I'm going to export this as a csv too, in case it's useful.
Ads
There are also 52 "ads"; it seems in this case, one per ad set:
Ad sets with multiple ads?
In theory ad sets could contain multiple ads. I wonder when/whether/why it would be worth doing this.
__
Luke: In the Giving Guides trial ... we used a smart ad format where you upload lots of creatives (images, videos, post text etc) and it tests them all as a single ad. That particular ad format has a 1:1 relationship with the ad set, and then you investigate the success by pulling other specific reports for the attributes (e.g. “Post Text” or “Image or Video”)
The information in the 'ads' table seems the same as in the 'ad sets table' ... other than a link to preview the ad content itself (which I don't seem to have access to atm).
"Qualitative" design issues
Discussion of issues in designing experiments/studies that are not specifically 'quantitative', but are important for gaining clear and useful inference
Naturalness of setting versus 'pure' treatments
Academics usually try to make each treatment differ in precisely one dimension, these treatments are meant to represent the underlying model or construct as purely as possible. This can lead to setups that appear strange or artificial, which itself might bring responses it will not be representative or generalizable.
For example, in my '' (lab) work we had a trial that was (paraphrasing) 'we are asking you to commit to a donation that may or may not be collected. If the coin flips heads, we will collect the amount you commit, otherwise no donation is made'. It was meant to separate the component of the "give if you win effect" driven by the uncertain nature of the commitment rather than the uncertain nature of the income. However when we considered bringing this to field experiments, there was no way to do it without it making it obvious that this was an experiment or a very strange exercise.
When we consider an experiment providing 'real impact information' to potential donors, we might be encouraged to use the exact write-up from Givewell's page, for naturalness. However, this may not present the "lives per dollar" information in exactly the same way between two charities of interest, and the particular write-up may suggest certain "anchors" (e.g., whole numbers that people may want to contribute). Thus if we use the exact GW language we may not be 100% confident that the provision of the impact of information is driving any difference. We might be tempted to change it; but at a possible cost of naturalness and direct applicability.
There are very often tradeoffs of this sort.
Awareness of testing can affect results
In the present context, we have posted about our work, in general terms, on a public forum (). Thus the idea that ‘people are running experiments to promote effective giving and EA ideas’ is not a well-kept secret. If participants in our experiments and trials are aware of this it may affect their choices and responses to treatments. This general set of problem is referred to in various ways, referring to different aspects of this; see 'experimenter demand', 'desirability bias', 'arbitrary coherence/coherent arbitrariness', observer bias (?), etc.
Mitigating this, in our context, most of our experiments will be conducted in subtle ways (e.g., small but meaningful variations in EA-aligned home pages), and individuals will only see one of these (with variation by geography or by IP-linked cookies). Furthermore, we will conduct most of our experiments targeting non-EA-aligned audiences unlikely to read posts like this one. (People reading the EA forum post are probably ‘already converted’.)
Incentives and 'more meaningful responses'?
Other issues to consider
(To be fleshed out in more detail)
Universe (population) of interest, representativeness
Design study to measure 'cheap' behavior like 'clicks' (easier to observe, quicker feedback) versus meaningful and long-run behavior (like donations and pledges)
Abstract .... While effective, this geo-based regression (GBR) approach is less applicable, or not applicable at all, for situations in which few geographic units are available for testing (e.g. smaller countries, or subregions of larger countries) These situations also include the so- called matched market tests, which may compare the behavior of users in a single control region with the behavior of users in a single test region. To fill this gap, we have developed an analogous time-based regression (TBR) approach for analyzing geo experiments. This methodology predicts the time series of the counterfactual market response, allowing for direct estimation of the cumulative causal effect at the end of the experiment. In this paper we describe this model and evaluate its performance using simulation.
Geo experiments” where only a single geo is targeted for a treatment seem fairly common in practice. You ‘try something in a single market 1x only and see what it does’.\
This is probably reinventing the wheel some existing thing in Econometric (difference in difference, event studies?), but what?
Trial reporting template
For each proposed/ongoing/past trial, we should report the following minimal details, with links (proposed template)
For each proposed/ongoing/past trial, let's try to report the following minimal details, with links (proposed template) If you don't have time and you have another clear presentation of most of this, please link or embed it.
"Concise reporting template"
Please keep your answers brief -- if you want to give more detail (which is not necessary) please link a later section or external page. _
Short version of this template (link copy-opens a new version for you to work in)
General idea, main 'hypothesis' (if there is one)
Firstly, what is this promotion trying to do (e.g., 'encourage signups for giving pledges')?
But more importantly, what are you trying to learn here... What might you have a better understanding of after the trial than you did before the trial?
E.g.,
Specifically:
Does the opportunity we offer to sign up for an 'accountability partner' increase or decrease the rate at which people DO XXX activity?
Does it lead to greater overall XXXlinked donations per visitor over the next 1 year interval?
Generally:
(Optional: brief on background theory and previous evidence)
Point of contact (at organization running trial)
You can enter more than 1 person here, including an external organizer (like JS Winchell), but ideally, also someone inside the organization.
Add 'academic/research lead' here if there is one
Timing of trial (when will it/did it start and end, if known)
Digital location where project 'lives' (planning, material, data)
The present Gitbook/and a nested Github repo folder could be ideal. Please give a precise link so others could access it.
Environment/context for trial
(Is it on a web page, a google advertisement, a physical mailing, etc)
Participant universe and sample size
Who will be targeted or who do you expect to be part of the trial?
(Somewhat optional) How many people (or 'units') do you expect to be involved (median guess)?
(Optional): How many do you expect will have a 'positive outcome' (e.g., a 'conversion')?
Key treatment(s)
Description, link exact language/content if possible
Treatment assignment procedure
At what level is it varied? (individual visitors, postal codes, days of the week, etc)
How are treatments assigned ('blocked randomization', 'adaptive/Thomson sampling', etc.)
If you are using a 'set Google, Facebook etc algorithm', just input the settings you used here, and/or link the (Google, FB, etc) explanation
Outcome data
What measures (outcomes, other features) will be collected?
When and how
Where will the data be stored, who will have access
Optional/suggested additions
Planned analysis methods, preregistration link, IRB link, connection to other projects and promotions
Ex-post: Reporting results (brief)
Implementation and data collection
Did it go as planned? Any departures? (Timing, randomization, design changes, etc)
How much/what data was collected? How many observations?
Where is the data stored (also link/adjust the above), who has it, and under what conditions?
Basic results/outcomes
"Partners and stakeholders opinions": were they happy with the trial? Did they seem to think it was a success?
Simplest statement (e.g., "3% donated in the treatment versus 2.2% in the control, with an average amount raised of $4.3 in the treatment and $3.1 in the control')
Preliminary interpretation, with statistical test if possible (e.g., 'google Optimize states an 80% chance that the treatment outperformed the control', a Fisher's exact test yields a p=0.06 that a positive donations was more likely in the treatment than the control)
Adaptive design/sampling, reinforcement learning
Overview: conversation with Dillon Bowen
Dillon writes: I've run some very promising MTurk pilots using my adaptive experimentation software. Compared to traditional random assignment, it increases statistical power, identifies higher-value treatments, and results in more precise estimates of the effectiveness of top-performing treatments. From simulations, I estimate that the gains from adaptive experimentation are approximately equivalent to increasing your sample size by 2x-8x (depending on the distribution of effect sizes).
This would allow us to run studies like Eric Schwitzgebel + Fiery Cushman's study on philosophical arguments to increase charitable giving much more effectively
Overview: conversation with DB
Dillon Bowen: End of 3rd year of decision processes in Wharton PHd.
Here is a stats package for estimating effect sizes in multi-armed experiments.
Adaptive experimentation software: Hemlock
I just made a getting started video:
Adaptive experimentation (discussion)
...running experiments with many arms and winnowing out the 'best ones' to learn the most/best.
In our cases of the ‘options are discrete’, many knobs to turn, although some are discrete. There is a different version of this for discrete vs continuous
If we can order the different treatments (arms/knobs) as 'dimensions' we can infer more... Can do better thinking of them as a ‘multifactor experiment’ rather than 2 unrelated … several separate dimensions
"Model running in the background" trying to figure out ‘things about the effectiveness of the interventions you might use’
'Explore only' or 'explore & exploit' at the same time
“Ex-post regret versus cumulative regret” … latter suggests Thompson sampling (Does Thompson's sampling take into account the length of the future period?)
Learning and inference
Ex-post … Use machine learning to consider which characteristics matter and how much they matter … although he doesn’t know of papers that have looked at this, but assumes there are adaptive designs that incorporate this.
Statistical inference can be challenging with adaptive designs, but this is a ripe area of research
Dillon: has a paper on traditional statistical inference after an adaptive design.
Goals 'what kinds of inference':
The arm you using relative to (? the average arm?)
Which factors matter/joint distribution ….. Bayesian models
Notes: Implementing adaptive design on existing sites
We need a great web developer, a system so that a program Dillon writes is fed data on the factors (?) to assign a user to a treatment. Dillon will set up an ML model that is continuously updated … ‘next person clicking on this page gets this treatment … web dev makes sure it shows the recommended content’
We figure out what factors we want, what levels, have a basic web design … Dillon comes in and turns the ‘1000 dim treatment space and featurize it so his model can use it’.. Works with a dev to set up a pipeline.
Simple quant design issues
How many observations, how to assign treatments, etc.
Resources
Todo: Integrate further easy tools and guides, including those from Jamie Elsey
"Even a few observations can be informative"
Drawing from Lakens' excellent resource:
You are considering a new and an old message.
Suppose you are a ‘believer’ … your prior (light grey up to) is that ‘this new message nearly always performs better than the control treatment’
Suppose you observe only 20 cases and the treatment performs better only half the time. You move to the top black line posterior. You put very little probability on the new message performing much better than the control.
Now suppose you have the ‘Baby prior’, and think all of the following ten things are equally likely
less than 10% of people rate the new message better than the control
10-20% of people rate the new message better than the control
…
You run tests on 20 people, and you get 15 people preferring the new message.
Now you update substantially. From some calculations (starting from Lakens' code, pbeta(0.65, aposterior, bposterior)) you put about an 80% posterior probability that the new message is preferred by at least 65% of the population. (And only about 1.5% probability on the control being better)
So if I really ‘am as uncertain as described in the example above’ about which of two messages are better (and by how much)...
... then even 20 randomly-selected people assessing both messages can be very informative. How often does this ‘strong information gain’ happen? Well, under the "baby prior", you would get information at least this informative in one direction or the other about half the time.
Geographic segmentation/blocked randomization
Discussion of blocking/randomizing treatments by post/zip code or other region, allowing us to more accurately tie treatments to ultimate outcomes
Measurement needs are varied and come with a variety of limitations, e.g., data avail-ability, ad targeting restrictions, wide-ranging measurement objectives, budget availability,time constraints, etc
Kerman et al, 2017
Why 'Geo experiments'
In many contexts, the route to a meaningful outcome (e.g., GWWC pledge) is a long one. Attribution is difficult. An individual may have been first influenced by (1) YouTube ad while seeing a video on her AppleTV, and then (2) by a friend's post on Facebook, and then finally moved to act (3) after having a conversation at a bar and (4) visiting the GWWC web site on her telephone.
The same individual may not (or may) be trackable through 'cookies' and 'pixels' but this is already very limited and imprecise, and is being made harder by new legislation.
"Geographic targeting" of individual treatments/trials/initiatives/ads may help better track, attribute, and yield inference about 'what works'.
E.g., we might do a 'lift test':
select a balanced random set of US Zip codes for a particular repeated YouTube ad promoting GWWC, the "Treated group"
compare the rate of GWWC visits, email sign-ups, pledges, and donations in the next 6 months from these zip codes relative to all other zip codes. (Possibly throwing out or finding a way to draw additional inference from zip codes adjacent to the treated group)..
We could also do multi-armed tests (of several types of ad or other treatment, with a similar setup as above)
There are a few well-known and researched approaches:
(emphasis added)
Geo experiments (Vaver and Koehler, 2011, 2012) meet a large range of measurement needs. They use non-overlapping geographic regions, or simply “geos,” that are randomly, or systematically, assigned to a control or treatment condition. Each region realizes its assigned treatment condition through the use of geo-targeted advertising. These experiments can be used to analyze the impact of advertising on any metric that can be collected at the geo level. Geo experiments are also privacy-friendly since the outcomes of interest are aggregated across each geographic region in the experiment. No individual user-level information is required for the “pure” geo experiments, although hybrid geo + user experiments have been developed as well (Ye et al., 2016).
Matched market tests (see e.g., Gordon et al., 2016) are another specific form of geo experiments. They are widely used by marketing service providers to measure the impact of online advertising on offline sales. In these tests, geos are carefully selected and paired. This matching process is used instead of a randomized assignment of geos to treatment and control. Although these tests do not offer the protection of a randomization experiment against hidden biases, they are convenient and relatively inexpensive, since the testing typically uses a small subset of available geos. These tests often use time series data at the store level. Another matching step at the store level is used to generate a lift estimate and confidence interval.
Where and how can we geographically block treatments?
Context/location
Geographic blocking? (How)
What if we can only apply the treatment to one, or a few, of many groups?
We still mahy be able to make valuable inferences, under specified conditions, through 'difference in difference', 'event study', and 'Time based' approaches. We consider this in the next section:
Facebook serves each ad variation to the people it thinks are most likely to click on it.
Thus, in comparing one ad variation to another... you may learn:
"Which variation performs best on the 'best audience for that variation' (according to Facebook)"
But you don't learn "which variation performs better than others on any single comparable audience."
Update 4 Oct 2022: We may have found a partial solution to this, with ads targeting 'Reach' rather than optimizing for other measures like 'clicks'. We are discussing this further and will report back.
Researchers are interested in running trials using Facebook ads. However, inference can be difficult. Facebook doesn't give you full control of who sees what version of an advertisement.
With A/B split testing etc: They have their own algorithm, which presumably uses something like Thomson sampling to optimize for an outcome (clicks, or a targeted action on the linked site with a 'pixel'). Statistical inference is challenging with adaptive designs and reinforcement learning mechanisms. As the procedure is not transparent, it is even more difficult to make statistical inferences about how one treatment performed relative to another.
Segmentation and composition of population: Facebook's 'PageRank' algorithm determines who sees an ad. I don't think you can turn this off.
Divergent delivery and "the A/B test deception"
Further notes
Orazi, D. C., & Johnston, A. C. (2020). Running field experiments using Facebook split test. Journal of Business Research, 118, 189-198.
"Haven’t heard of an update since. They do something to mitigate the effects of targeting different audiences with the different treatments, but it’s still not quite random assignment"
"Bottom line: good news, bad news. I'm confirming that you're right: The "latest best possible settings" are still not giving you results that reflect the random experiment that a researcher in consumer psychology or advertising would be expecting. But the problems are worse than they may have seemed to you initially."
Notes on Facebook “Lift tests/Lift Studies” with ’Multiple Test Groups”
Do Facebook “Lift tests/Lift Studies” with ’Multiple Test Groups” give us the freedom we want to …
Randomize/balance different ad content ‘treatments’ to comparable groups?
I find it strange/suboptimal that they aggregate across the Geos in the control group, throwing important variation here … that might tell us something about how much things ‘typically vary by without treatments’. I wonder if there’s another approach that brings that variation back?
Maybe this is 'because this is an easy extract to get from Google Analytics'? How do we get it?
The package is 5 years old with no recent updates … ages in this world; is there something better to use instead
Does 'social accountability' help to encourage XXX activities and promises and the fulfillment of these? Does the 'fear of being held accountable' discourage people from making commitments?
How many/what shares are assigned to each treatment?
"Full analysis"
Who/what when will it be done?
Link to 'where' it will be done (both the 'follow up the pre-analysis plan, and the full write-up, if applicable)
Possibly: Briefly characterize the overall confusions/state of analysis here (state the date last updated)
Keep in mind that a campaign can include multiple ad sets, each with different targeting, scheduling and budgeting options selected.
Some things are still unclear:
Can multiple 'ad sets' use the same 'ads'? (I think so)
Why do we seem to see budget and schedule choices listed under 'campaign' in the ads manager?
Note that RP is not a 'part of this Market Testing team', but we want to coordinate with them and benefit from the survey and profiling work they are doing/have done. I try to map/link the space here.
RP: "How many people have heard of EA" survey
Asks respondents to tick terms and people that they are familiar with (EA/non, real/rare/fake). If they have heard of EA, we follow-up with open-ended questions to detect actual understanding. We also ask about socio-demography and politics. Administered to a ‘national sample’.
(We will follow up with attitude surveys among those who have heard of EA.)
We use Bayesian models to generate the posterior distributions of
share who know/understand EA within different groups,
weighted to be nationally representative (of each group).
\
Wild Animal Welfare/Suffering attitudes
Various survey projects ongoing
EA Attitudes and Longtermist Attitudes
Developing measures of attitudes towards EA/Longtermism
Conducting large national surveys looking at predictors of these attitudes (including differences across groups)
Standard ‘message testing’ (what arguments/framings work best for outreach (including differences across groups)
__
How to get at unique users for the key stats
Date filters don't seem to work on home page graphs ... choosing custom dates doesn't change it
No.
****Josh: "what it says is something importantly different: you can compare the number of people who do the action you are interested in ... according to whether or not they see a given ad. So, you do have random assignment when comparing the effect of an ad to the effect of no ad. ... if we compare the lift for two different treatments (What these multi-cell lift tests are doing), we are doing almost exactly the same thing as we were without the lift functionality...
A and B are displayed to different audiences, so this test does not have random assignment."
Essentially this allows you to get the correct 'lift' of A and B, on their own distinct audiences, by getting the counterfactual audiences for each of these correct. But you cannot compare the lift of A and B on any comparable audience.
To help understand the context... "Facebook often randomizes the whole audience into different cells and THEN targets the ad WITHIN that audience. So there is random assignment at the initial stage, but that's irrelevant, because not everyone in the potential audience sees each ad"\
DONATE TODAY: your donation can supply food parcels to a Syrian family
DONATE TODAY: your donation can supply food parcels (ca. 17CHF/parcel for one month) to a Syrian family
DONATE 50CHF TODAY: your donation can supply 3 food parcels (ca. 17CHF/parcel for one month) to a Syrian family
DONATE 150CHF TODAY: your donation can supply 9 food parcels (ca. 17CHF/parcel for one month) to a Syrian family
DONATE 50CHF TODAY: your donation can supply food parcels to a Syrian family
DONATE 150CHF TODAY: your donation can supply food parcels to a Syrian family
DONATE TODAY:
- With 50CHF you offer 4 Hygiene kits to Syrian families
- With 100CHF you offer 14 school kits to Syrian students
- With 150CHF you offer 9 Food parcels to Syrian families
Bayesian Credible intervals for 'impact of impact information' on probability of donating
Kagan and Fitz survey
Sample, Design, & Measures. We recruited a national online sample of 530 Americans. Participants read and reflected on an introduction to evidence based giving, and then completed our main outcomes of effective giving. Participants then completed a series of measures of their beliefs, behaviors, values, traits, sociodemographics, etc. The instrument, measures, and data are available upon request.
Was this a 'representative sample'? How were they recruited?
Note they 'read about EA first' ... perhaps making them vulnerable to demand effects?
DR: I've requested this data, but I think the authors are having trouble finding the time to dig this up
Primary Measures. To measure effective giving, we assessed several attitudes and behaviors; this summary presents results from a novel 7-item scale, the Support for Effective Giving scale (SEGS) [ ⍺ = .92], and an effective giving behavior allocation.
The items in SEGS assess general interest, desire to learn more, support for the movement, and willingness to share information with others, identify as an effective altruist, meet others who support the movement, and donate money based on effective giving principles. To approximate giving behavior, we presented participants with short descriptions of three causes Deworm the World Initiative, Make a Wish Foundation, and a local high school choir and had them allocate $100 between these groups and/or keeping it themselves.
Was the allocation purely hypothetical or incentivized in some way, perhaps 'one response was chosen'?
Secondary Measures.
To measure beliefs, behaviors, and traits of people who endorse effective giving, we employed measures of: perceived social norms, charitable donation beliefs and behaviors, self perceptions, empathy quotient ( EQ ) , empathic concern & personal distress ( IRI ), the five moral foundations ( MFQ 20 ) , the five factor personality model ( TIPI ), goal & strategy maximization ( MS S ), updated cognitive reflection tests ( CRT ), sociodemographics (e.g., age, gender & racial identity, income), politics & religion, familiarity with ‘the effective altruism ’ movement , and state residence
So far, the best overall model predicts 41% of the variance in support for effective giving.
Summarized in posts...
.... After participants read a general description of EA, they completed measures of their support for EA (e.g., attitudes and giving behaviors). Finally, participants answered a collection of questions measuring their beliefs, values, behaviors, demographic traits, and more.
The results suggest that the EA movement may be missing a much wider population of highly-engaged supporters. For example, not only were women more altruistic in general (a widely replicated finding), but they were also more supportive of EA specifically (even when controlling for generosity). And whites, atheists, and young people were no more likely to support EA than average. If anything, being black or Christian indicated a higher likelihood of supporting EA.
Moreover, the typical stereotype of the “EA personality” may be somewhat misguided. Many people – both within and outside the community – view EAs as cold, calculating types who use rationality to override their emotions—the sort of people who can easily ignore the beggar on the street. Yet the data suggest that the more empathetic someone is (in both cognition and affect), the more likely they are to support EA. Importantly, another key predictor was the psychological trait of ‘maximizing tendency,’ a desire to optimize for the best option when making decisions (rather than settle for something good enough).
Other data
Who gives effectively? Unique characteristics of those who have taken the Giving What We Can Pledge
we focus on individuals who have taken the Giving What We Can Pledge: a pledge to donate at least 10% of your lifetime income to effective charities. In a global survey (N = 536) we examine cognitive and personality traits in Giving What We Can donors and compare them to country-matched controls. Compared to controls, Giving What We Can donors were better at identifying fearful faces, and more morally expansive. They were higher in actively open-minded thinking, need for cognition, and two subscales of utilitarianism (impartial beneficence and instrumental harm), but lower in maximizing tendency (a tendency to search for an optimal outcome). We found no differences between Giving What We Can donors and the control sample for empathy and compassion, and results for social dominance orientation were inconsistent across analyses.
Tangential: 'Omnibus' lab survey at University of Exeter
Includes real donation choice question(s), rich survey and psychometric data, including 'mind in the eyes' empathy judgements
Students and nonstudents (local town population)
Consider Lown and XX paper... MITE empathy moderates the impact of political attitude, or something ... dissonance resolution
Feldman, Ronsky, Lown
mturk + qualtrics
ended up manipulating whether aid was government or charity, and domestic vs foreig; thought those would be moderated by MITE depending on their ideology/attitude?
Also consider ... Empathy Regulation and Close-Mindedness Leonie Huddy, Stanley Feldman, Romeo Gray, Julie Wronski, Patrick Lown, and Elizabeth Connors
Also asked about domestic welfare and foreign aid attitudes...
sample fairly large ... 1100 or so?
Surveys/Predicting EA interest
Caviola et al
Introduction, scoping work
Strategic considerations
Previous sections considered... 'How to get more people to care about this stuff'. 'How to get the "Einsteins" of the next generation interested in this.' And 'how do we introduce this to people?'
But, an equally-important concern may be... WHOMdo we target? How do we do market profiling? Not just 'what do we present', but 'who do we present it to'
In this section, we cover the limited work that has been done on this, and the scope to do more.
Scoping and considering the value of doing this
Leander Rankwiler's recently (17 Feb 2023) did a scoping exercise for this. See . This work focuses on "the rationale, literature research, and data collection", and comes to relatively negative conclusions ("it's much less valuable to pursue than previously assumed"). This particularly reflects concerns that doing, publicly reporting, and acting on this research to 'target promising groups' may do some harm (see fold).
Downside risks (Rankwiler)
Risk of harming the diversity (of personalities) within EA, by targeting the "typical" EA personality.
"Risk of negative public perception of the method of using personality traits to find promising users (à la Cambridge Analytica)"
He also sees many sources of (statistical) bias in any feasible analysis.
In the sections below, we present and link recent and ongoing direct work that may also be relevant and informative.
Analysis: Statistical approaches
What to do with the data after you collect it (and what you should put in a pre-analysis-plan).
Impact of treatment on 'rare event' incidence
Notes from slack:
I’m finding some issues like this in analyzing rare events … not quite that rare, but still a few per thousand or a few per hundred.
I’m taking 2 statistical approaches to the analysis (discussion, code, and data in links):
Randomization inference (simulation) … for a sort of
I think either of these could be ‘flipped around’ to be used for power calculation or ‘the Bayesian equivalent of power calculation’
My colleague Jamie Elsey has some expertise with the latter; , although it’s mainly frequentist and not Bayesian ATM.
Open and robust science: Preregistration and Preanalysis plans
There are reasons 'some pre-registration' or at least 'declaring your intentions in advance' is worth doing even if you aren't aiming at scientific publication
Which statistical tests/methods
Fehr/SOEP analysis... followup
Your Place in the World: Relative Income and Global Inequality
See discussion in:
NBER Working Paper (2019/2021), Dietmar Fehr, Johanna Mollerstrom, and Ricardo Perez-Truglia
Attitudes towards global redistribution
"De-biasing" intervention (how rich participants are relative to Germans, how rich Germany is globally)
Tied to
German Socio-Economic Panel (SOEP), a representative longitudinal study of German households. The SOEP contains an innovation sample (SOEP-IS) allowing researchers to implement tailor-made survey experiments.
a two-year, face-to-face survey experiment on a representative sample of Germans. We measure how individuals form perceptions of their ranks in the national and global income distributions, and how those perceptions relate to their national and global policy preferences. [Their main result]: We find that Germans systematically underestimate their true place in the world’s income distribution, but that correcting those misperceptions does not affect their support for policies related to global inequality.
Why might this be relevant to our profiling:
They ask about support for global redistribution, international aid institutions, globalization, immigration, and more, and have an incentivized giving choice. These are (arguably) measures of support for some EA behaviors/attitudes.
I suspect that this data could be tied to a variety of rich (personality? demographic?) measures in the SOEP. A predictive model for actual EA/Effective giving targeting in other related contexts? If so, let's focus on things we are likely to observe in those other contexts (or at least likely to have proxies for). If there are any 'leaks' (not sure I'm using the term correctly)... missing a single feature could ruin the predictive power of the whole model.
Causal interpretations (very challenging)?
Here 'nearly immutable characteristics' (like ethnicity, age, parental background, maybe some deep psych traits) might be a bit more convincing
*Descriptive* (whatever we mean by that)
See in next section
Animal welfare attitudes: profiling/surveying
A brief outline and links to what has been done across organizations
Farmed/overall
Faunalytics
This is possibly the best meta-resource as well as a source of original research
Sentience institute
Our Animals, Food and Technology (AFT) survey tracks attitudes towards animal farming and animal product alternatives in the US. In 2020, as in the 2017 and 2019 iterations, we found significant opposition to various aspects of the animal farming industry, with a majority of people reporting discomfort with the industry, and strong support for a range of quite radical policy changes, such as banning slaughterhouses. The trend in attitudes between 2017 and 2020 is relatively stable, though slightly negative (not statistically significant). Notably, the number of people who consider animal farming to be one of the most important social issues fell from 2017 to 2019, and remained at this lower level in 2020.
Rethink Priorities
Some replication work on the above
Various work including
__
DR: I'm awaiting permission to share the list.
Paper: The moral standing of animals
Wild animal suffering
ACE
ACE - Wild Animal Suffering 1
ACE - Wild Animal Suffering 2
Other
(Rethink Priorities, in progress)
"Scientists’ Attitudes Toward Improving the Welfare of Animals in the Wild"
I very briefly discuss particular tools in the Bookdown:
A more organized categorization of barriers can be found in an airtable database (view below), linked to tables of specific tools, theories, barriers, etc. (This can be accessed HERE; it is not the airtable for this project, although we link in some content).
Specific tools
The above table links a set of specific tools, evaluating their relevance for effective giving:
We are considering a narrower set of tools (in a different airtable, the airtable for the current project...
Specific proposed tools/interventions (GWWC focus, adding WIP)
Introduction...
Late-2024 update: This project is on hiatus/moved
Note from David Reinstein: The EA Market Testing team has not been active since about August 2023. Some aspects of this project have been subsumed by Giving What We Can and their Effective Giving Global Coordination and Incubation (lead: Lucas Moore).
Nonetheless, you may find the resources and findings here useful. I'm happy to answer questions about this work.
I am now mainly focused on making a success. I hope to return to some aspects of the EAMT and effective giving research projects in the future. If you are interested in engaging with this, helping pursue the research and impact, or funding this agenda, please contact me at daaronr@gmail.com.
Differentiating our work (previous research in psychology, economics) we write down what our basic consensus and knowledge
As Burum, Nowak, and Hoffman () state: “We donate billions of dollars to charities each year, yet much of our giving is ineffective. Why are we motivated to give, but not motivated to give effectively?”
... raises two related questions:
I. “Why don’t we give more to the most effective charities and to those most in need?” and
II. “Why are we not more efficient with our giving choices?”
To address this, we must understand what drives giving choices, and how people react to the presentation of charity-effectiveness information
In slightly more detail
There are two related and largely unresolved puzzles:
Why are people not more generous with the most highly effective causes? and
When they give to charity why do they not choose more effective charities?
There is some evidence on this but it is far from definitive. We do not expect there to be only a single answer to these questions; there may be a set of beliefs, biases, preferences, and underlying circumstances driving this. We would like to understand which of these are robustly supported by the evidence, and will have a sense of how important each of these are in terms of the magnitude of driving and absence of effective giving. There has been only a limited amount of research into this and it has not been systematic, coordinated, nor heavily funded.
We seek to understand because we believe that there is potential to change attitudes, beliefs, and actions (primarily charitable giving, but also political and voting behaviour and workplace/career choices). Different charitable appeals, information interventions and approaches may substantially change peoples charity choices. We see potential for changing the “domain” of causes chosen (e.g., international versus US domestic) as well as the effectiveness of the charities chosen within these categories. (However, we have some disagreement over the relative potential for either of these.)
Our main ‘policy’ audience includes both effective nonprofit organisations and ‘effective altruists’. The EA movement is highly-motivated, growing, and gaining funding. However, it represents a niche audience: the ‘hyper-analytic but morally-scrupulous’. EA organisations have focused on identifying effective causes and career paths, but have pursued neither extensive outreach nor ‘market research’ on a larger audience (see , ). `
(Lack of) previous synthesis on this
Academic work:
@loewensteinScarecrowTinMan2007
introduction to @Berman2018, @baron2011heuristics)
Non-academic/unpublished:
'Behavior and Charitable Giving' (Ideas42, 2016),
"
'The Psychology of Effective Altruism' (Miller, 2016, slides only).
Overall, these have not been detailed or systematic. While , is probably the strongest, most relevant, and most insightful (and has some connection to the structure presented in the '' project), it does not drill deeply into the strength of the evidence and the relative importance of each factor. However, this may stem from a small amount of available evidence to survey.
Ideas42 wrote (ibid)
We did not find many field-based, experimental studies on the factors that encourage people to choose thoughtfully among charities or to plan ahead to give.
Definitions - "Efficiency" versus impact
A working definition is provided and discussed I (Reinstein) provide a critical discussion of some standard economic models of giving in this context
Tools and trials: overview
See, and coalesce ideas from the links below (and more)
Discussion
Here, we propose methods for grouping, organizing, and categorizing these tools for motivating effective giving and action:
Theoretical frameworks --> tool categories
Certain outcomes are relevant to some tools only
Atheoretical 'trying different marketing colors' and tools that push several buttons
Existing categorisation
As well as
Some tools and tests of high-interest (overview, quick presentations)
Nick Fitz: "some quick types of different tests/questions EA orgs are interested in"
identifiable victims vs statistical (etc), (DR: Some groups have principled objections to presenting identified victims; which ones do not?)
emotional vs factual/statements,
videos v images v text,
Followup with Thomas Ptashnik
Further scoping, access, PhD partner
Thomas Ptashnik is a Psychology PhD student interested in working on this with us. He is using the SOEP-Core data and familiar with SEM/Latent variable methods.
We have gained access to the relevant data
H the link to the Fehr appendix that contains the survey items they created (starting at Appendix B on page 33).\
Some salient example content:
These items correspond to the SOEP-IS surveys, which can be found here (use item names, like Q132, to search quickly
These links also mention that individuals with preexisting data access can apply for expanded access. I [Thomas] have access to SOEP-core version 36 (1984-2020 surveys),..
DR: Some interesting content (at a quick peek)
From 2017...
Q380: What you value in your work likerts ... includes "Having much influence" and "Socially responsible and important work"
Q160: Optimism/pessimism about the future
Q162: ... bunch of Likerts on "attitudes towards life and the future" (e.g., 'The options that I have in life are determined by social circumstances.')
A proposed project
DR notes on 15 Dec 2021 meeting with
Does the Fehr/SOEP data provide valuable 'outcome measures' of EA and effective giving support?
I think we might see positive responses to the Fehr et al questions and donation choices as ‘necessary but not sufficient' for people to become effective givers or even EAs. If (especially in spite of the de-biasing) people still don’t support international redistribution, international orgs, and don’t opt to give from the lottery earnings to the global poor person … I think they are very unlikely to be susceptible to an EA or effective giving (e.g., GiveWell) appeal. (See further discussion and debate on this below).
(But, as a check on this, it might be good to try to ask these same questions on a sample of actual EA’s and effective givers, and a comparison group!. #surveyexperiments)
Two projects on the same data
I envision two related projects on the same data: 1. Building a 'portable' model for prediction to aid targeting and 2. Building a 'deeper' model to aid understanding
I’m hoping that looking for predictors of (or ‘coherent factors explaining’) these responses in the SOEP data would prove useful for organizations like GWWC to consider ‘which groups to target in doing outreach’ (and perhaps especially ‘which groups to rule out’)
I hope we can do a sort of ‘leak-proof validated predictive ML model for this’
perhaps especially relevant for the German/EU context
Thomas: After talking it over with some colleagues, I think this approach is our best bet in terms of developing something with practical utility that still has a chance of being published in an academic journal. This is not my area of expertise, but if I remember correctly you have some R code already written. So I should quickly be able to put something together.
2. An (exploratory model) to help understand key factors that might be driving EA-adjacent attitudes and behaviors, offering insight into ‘what drives people towards or away from this mindset’.
Here we could engage the richer set of SOEP variables and consider latent factors
Anonymous colleague; caveats on 'the two goals'
if one simply wants to target people for giving to some specific EA-aligned cause in terms of a donation. In that case of the hypothetical African Christian women are likely to give, and it doesn't matter so much how they get to that decision. Quite a different set of metrics is desired (the kind of things we are trying to get at) if one is trying to actually select/find 'effective altruists'[RT2] if one simply wants to target people for giving to some specific EA-aligned cause in terms of a donation. In that case of the hypothetical African Christian women are likely to give, and it doesn't matter so much how they get to that decision. Quite a different set of metrics is desired (the kind of things we are trying to get at) if one is trying to actually select/find 'effective altruists'
Red team
But I'm less sure about: ..."would prove useful for orgs like GWWC to consider ‘which groups to target in doing outreach’ (and perhaps especially ‘which groups to rule out’)"
[suppose] you measure something like 'interested in giving to people in poverty in Africa' (or, at best, cosmopolitanism), and you find that the people highest in this are [Classical music fans], but the people most interested in EA stuff are [Techno ravers]. I think there are lots of reasons why this might occur. It could be that interest in EA is a combination of cosmopolitanism + interest in maximising effectiveness, but differences in the latter swamp the former. (If so the reasoning would at least be along the right lines, but would potentially be very practically misleading to GWWC)...
But I think what could be going on could be even worse, i.e.:
Red team analogy
(I think of this case as a bit like studying interest in Marxism by asking about whether people are interested in helping the poor (or some such) In one sense you might think of this as a necessary condition / people who don't have any concern for this are not likely to be interested in Marxism. OTOH you'll probably mostly be picking up the 99% of people who are interested in helping the poor but not interested in the much more niche / slightly weirder thing that is also closely related to helping the poor, but is also associated with slightly counterintuitive views like 'donating to the poor is not good, you need to be concerned with [systemic change and global revolution / AI safety] etc.)
Red team:
[red team]
I guess it will be interesting to find out through your analysis:
Are these measures predicted by plain altruism + cosmopolitanism (which a priori we might say are more likely to be connected to EA)
Or are these measures predicted by egalitarianism + belief we should repay the third world / belief the rich should help the poor (which seem like they may be less closely connected with EA)*
DR and TP response to red team
Good points, and I even think “global redistribution” might rub some actual EAs the wrong way, as well as many EAs rejecting the 'repay our collective guilt' aspect.
Still, GWWC and TLYCS are pushing more for behaviors (esp. giving) than for intellectual alignment with EA. They are also pushing the traditional global poverty part of the EA agenda. I suspect the Fehr/Soep measures will pick up people more receptive to this than to longtermist 'avant garde' EA.
RT2: Is there any way you can think of to get at EA more like a style of thinking/justification of choices as opposed to possibly the highly context-dependent choices are themselves? Some kind of relevant psychometric things are probably possible e.g., need for cognition or something similar
RT1:
One option create or use measures of maximising + cosmopolitanism + altruism (or of maximising cosmopolitan altruism) ... maybe we are getting at 'EA style of thinking'. And if we can show that these more abstract measures are connected to behavioural or otherwise more concrete measures of EA inclination (whether that's decisions/choices, signing up for mailing list or something else) then it does seem reasonable to think of these as capturing EA inclination.
Value of incentivized measures here
(DR ideas)
IMO it would be nice to have some meaningful behavioral (incentivized) measures on top of the ‘psych’ ones. The ‘donation to the very poor’ measure in Fehr et al gets at this a bit … although its a pretty small probablistic sacrifice. And I suspect it measures all three of the above except maximizing. And I don’t think these things are all separable, so I think that the fact that it measures ‘altruism and willing to sacrifice in a cosmopolitan-relevant context’ is good.
It would also be pretty nice to have a behavioral/incentivized measure of ‘maximizing in an altruistic context’ …If Fehr ea had asked them to (e.g.) allocate giving among a German poor person, an African poor person, and themselves, this might have been a decent measure.
(We have this choice in some other contexts though … not as rich data but maybe worth digging into). Why might that choice have been better (in some ways) than a hypothetical choice? Because I imagine in a hypothetical choice some people would be like “OK they obviously want me to say support the poor person in Africa, and I see the maximization arguments, so, fine.'But when it involves real money, and even their own money, I expect that for some people, other motives will outweigh the ‘maximizing motive’…“wait, I’d rather keep the money than give it to an African who will waste it”“wait, if this is real, I’d rather help someone local”.
Analysis Plan, sample, and variables under consideration (01/31/22, Ptashnik)
DR: See sidebar comments
Analysis plan
Lasso regression to identify the most salient cluster [DR: how is this defined?] of predictors for effective giving
I will use k-fold cross-validation to compare a lasso model with ridge regression and OLS to confirm it is the best method for handling our data [DR: 'best in what sense? I recommend the elastic net approach if possible.]
Bayesian and latent lasso
TP: There is now a Bayesian form of lasso, but the R packages to run this analysis are in their infancy and the results between the methods are strikingly similar (Steorts, 2015). So, on the first pass I will just use one of the methods above but may rerun the analysis time-permitting to check my assumption that results won’t change.
Similarly, there is latent lasso regression, but most of our constructs have only one indicator and the R package for this analysis also appears to be at a nascent stage.·
Sample
To start, I’m just considering the 2017 survey and the control group (i.e., those who weren’t notified of their position in the national and global income distribution (~700 individuals). We can expand to the 2018 survey and the treatment group in future analyses using the same method (although some items may not be included across surveys).
Outcome Variable
Q280 and 281 in the SOEP-IS dataset developed by Fehr et al. (2019)
You were paired with another household in Kenya or Uganda. This household belongs to the poorest 10 percent of households worldwide. Now, you have 50 EUR at your disposal and can split this amount between the other household and you in any way you want. If this task is selected for payout, you will receive the amount you decided to keep at the end of the interview. The amount you want to give the other household will be given in full to the other household (without transaction costs) at the end of the field period by Heidelberg University via a charitable organization. In full means that every given euro will be received by the other household 1:1. A leaflet with information about the donations will be given to you after you have made your decision. I ask you to make this decision alone now.”
“How much of the 50 EUR do you want to keep and how much do you want to give the other household?”
2017 survey questions:
Variables Under Consideration
Below I list variables below in terms of what the intended construct I’m trying to get at and the proxy measures that are available within the SOEP dataset.
Theoretical rationale for construct from 'charitable giving' review
Theoretical rationale for these constructs comes from the most comprehensive review on predictors of charitable giving I could find (Bekkers & Wiepking, 2007; also see Bekkers & Wiepking, 2011 and Wiepking & Bekkers, 2012 for follow-ups on this review). These reviews seem like a reasonable starting point because they are cross-disciplinary and only consider studies that involve real money to real charitable organizations. There were a surprising number of what I think of as common-sense variables that weren’t included in these reviews that I add in the table below (i.e., those without an asterix).
There were several variables omitted because I did not think they were relevant or other constructs exist that better get at the underlying effect. ...
Home ownership: Appears to just be an indicator of wealth, so using income is preferrable.
Perceived financial position: Bivariate studies (Bennet & Kottasz; Havens et al., 2007) conclude those who perceive their financial situation as more positive are more generous donors. However, Fehr et al. (2019)—which has a more robust design—reports that “we find no evidence that perceived rank in the global income distribution affects support for global redistribution, donations to the global poor, globalization or immigration. If anything, when thinking about these policy preferences, it matters more how one compares to other people nationally than to others around the globe.” Given these findings and the fact that we are using the same data, it is probably sensible to omit this variable. Although studies have found confidence in the economy (Okunade, 1996), so an interesting pivot could be to measure optimism (both domain-specific and general forms).
[DR: I think 'previous failire to find significant effects' shouldn't be reason to exclude!]
Variables held constant by the survey design (see Bekkers & Wiepking 2007 for detailed explanation): Solicitation, benefits, reputation, and efficacy.
Construct *outlined in review articles
Brief Rationale for Inclusion
Items from SOEP
DR comments:
A very interesting list of features
were these all asked before the charity questions? (I'm worried about reverse causality otherwise)
maybe remove 'unavailable' rows for space\
We should discuss how the fitted model will be used and interpreted ... maybe identifying a few collections of useful subsets:
Give if you win/ conditional pledge
See giveifyouwin.org
Conditional pledges (‘Give if you Win’), esp.
Work with EA orgs at universities and in companies; possibly working with 80k hours &/or FoundersPledge, give opportunity for career guidance
Control: Ask about career goal/target, follow up in 1 year, ask for pledge then
Treatment: Same but ask initially for conditional pledge (‘if you attain the goal’)
See project (hope to scale up evidence from smaller contexts)
(Outcome: Pledge, give substantially (& effectively))
Some interventions are aimed at getting people to make substantial contributions, or pledge to do so (e.g., GWWC pledge) ... to effective charities
(Moral duty (of well-off))
What questions do we have what challenges are we facing?
What previous work has been done to investigate these questions?
What evidence is there so far on these questions?
What are the relevant theories of behavior for this work?
introduction to Caviola et al: "on how both incorrect beliefs and preferences for ineffective charities contribute to ineffective giving"
@greenhalghSystematicReviewBarriers2020 (qualitative, focuses on largest philanthropists only)
e.g., for GWWC they want to test 1% v 10% pledge asks,
for CES they want to test saving-democracy v representation messaging,
for Humane League they want to test different types of animals, etc)
From 2019
... they seem to collect genetic data
The measures measure something like 'not being so parochial that you won't give to a non-German charity', which is (ex hypothes) a necessary condition, but so minimal it's not really informing us about the much more demanding thing
... it measures something more specific/narrow that may be orthogonal or even antagonistic to EA (e.g. interest in overseas charity/poverty specifically [even if it doesn't maximise effectiveness]). Thought experiment: how would a libertarian-leaning AI-safety concerned German EA respond to the questions?
[still, this] seems worthwhile... I'd just be very tentative about inferring anything about what GWWC should do etc
*of course EAs are overwhelmingly liberal/egalitarian, but liberal/egalitarians are overwhelmingly not EA, which I think is an important complication"
Thomas: This is the main point to highlight. We probably need to limit our generalizability to the people-oriented neartermist worldview bucket. As the comments above note, I'm not sure this worldview necessarily maps onto the longtermist individual concerned about, say, AI safety risk. However, as you point out, there is still utility in focusing on understanding individuals that have this worldview for GWWC and other EA orgs, and this worldview (according to the EA survey) is currently the largest in the community.
DR: Agreed, but we probably need to make sure not to water it down too much; ideally we would retain some notion of 'the importance of prioritization and cost-effectiveness' in the worldview we are targeting
TP: As you mentioned, it would be interesting to replicate this survey with explicitly EA endorsing individuals. Particularly, in seeing how well the ML model can predict cohorts that fall into the three different worldview buckets.
DR: yes but the model that predicts "EA/global poverty supporting types within a general population may be unlikely to predict groups *among explicit EA's*" ... still, the comparison could be interesting (and we've done a bit of this already with the EA survey)
TP: Also, as a long-term idea, it could be useful to consider developing more EA-oriented items for SOEP-IS (the survey Fehr and colleagues used) that take into account all the issues listed here.
DR: that would be great!
The risk otherwise is that theoretically we think these 3 things correspond to EA thinking... and actually they don't ...
Consider NFC, IRT, Rationality Quotient etc. as predictors of EA-inclination \
DR: My conception was maximizing + cosmopolitanism + _altruism + willing-to-sacrifice/non-competitiveness …_I think many people think “I should work to help humanity” but also think ‘yeah but I’ll be a sucker if I give to charity while my neighbor gets a new swimming pool and Hawaii holiday…’That’s where “willing-to-sacrifice/non-competitiveness” comes in, in my mind. (It needs a better name?)I think this last trait more important for effective giving than for EA-intellectual-engagement… and it may not be important at all for the latter.
Thomas: In psychology, altruism captures this notion. Prosociality is a concept of helping others but allows for self-concern, while altruism is distinguished by a purer form of selflessness (I have a paper under review that goes into detail about this, which I can privately share....argh, the closed doors of academia). Fortunately, altruism is widely studied and there are even a few items that capture it in the SOEP dataset.
\
Place of residence and years of residence: Mixed findings and it appears to be a weak predictor regardless.
Immigration and citizenship status: Better captured by other variables. “Osili and Du (2005) found that immigrants in the United States are less likely to give to charitable organizations and also give less, but that these differences are due to differences in racial background, lower levels of income, and education” (Bekkers & Wiepking, 2007: 15).
Youth participation: Impacts donations through socialization, which is better captured through parental background. It also strengthens social bonds of the children in the community, making them less likely to make effective donations over local causes.
Volunteering: In simple bivariate analysis, volunteers are usually found to donate more to charity. However, differences between volunteers and non-volunteers often vanish in multiple regression analyses controlling for joint determinants of giving and volunteering (Bekkers, 2002, Bekkers, 2006a, Wiepking & Maas, 2006). Given SOEP only asks about time spent volunteering and does not categorize where one volunteers, this variable seems like a blunt tool that is likely to be insignificant.
Awareness of need: A strong predictor of general philanthropy, but Fehr et al. (2019) did not find significant effects for effective giving.
DR: I think 'failing to find significant effects' shouldn't be reason to exclude this!
Higher income households donate higher amounts than lower ones, however, the relationship with discretionary income is complex and unresolved (McClelland & Brooks, 2004). Income elasticity has been shown to be a salient predictor (Brooks, 2005), but for our purposes, general net income seems like the most sensible since this is information EA organizations might be able to obtain or estimate.
“How satisfied are you with your household income?”
“How satisfied are you with your personal income?”
----------
“I earned [net income]”
----------
“What do you think is your monthly gross salary in one year?”
Age*
Unclear relationship: generally, appears to increase over time and level off around retirement, but this relationship is highly dependent on covariates such as church attendance, number of children, and marital status.
Should be available. I’m waiting for confirmation.
Number of children*
Positively related to philanthropy in most studies, but the age of the children may influence the direction and magnitude of the effect, specifically when they are younger than 14 (Okten & Osili, 2004) and 18 (Okunade & Berl, 1997).
According to ‘My Infratest’, these are the children in your household that were born in 2001 or later. Please state whether these children still live in your household.”
----------
…accompanied by companion question: “Do more children live in your household which were born in 2001 or later?”
Marital status*
Mostly found to be positively related to giving, although a number of studies finding null effects (Apinunmahakul & Devlin, 2004; Carroll et al., 2006) call into question the magnitude of this effect.
“What is your family status?”
Employment*
The employed generally donate more than the unemployed (Chang, 2005a&b); those who work more (days and hours) donate more (Bekkers, 2004; Yamauchi & Yokoyama, 2005); retirees are highly charitable; self-employed are less generous (Carroll et al., 2006); and public service employees are more likely to engage in philanthropy than for-profit workers (Houston, 2006).
…could confirm officially unemployed: “Are you registered as unemployed at the Employment Office?”
“What is your current occupational status as a self-employed?”
…closest question I could find that gets at something other than for-profit work: “Do you work for a public sector employer?”
Gender*
Mixed findings in general and no finding when looking at one-person households (Andreoni et al., 2003). Still, given the ubiquity of this variable, it is sensible to include it in the model even though I have little faith it will be significant.
Should be available. I’m waiting for confirmation.
Race*
Caucasians generally give more, but this finding is tempered by the cause (non-whites donate more to the poor and religious organizations; Brooks, 2004; Brown & Ferris, 2007; Smith & Sikkink, 1998).
Should be available. I’m waiting for confirmation.
Parental background*
Higher levels of parental education, parental religious
involvement, and parental volunteering in the past are related to higher amounts currently donated by children (Bekkers 2005a). While current parental income and church attendance also predict giving (Lunn et al., 2001; Marr et al., 2005).
I thought a proxy for parent’s occupational prestige might be a salient predictor. Questions 496-502 cover the mother’s background and have the exact same wording.
Questions split depending on occupation and all contain the header: “What was your father’s occupational status as…”
“A self-employed person?”
“A civil servant?”
Personality*
Donations have been found to increase with emotional stability and extraversion (Bekkers, 2006b), as well as openness to experience (Levy et al., 2002). General social trust has also been found to be a salient predictor (Brooks, 2005; Micklewright & Schnepf, 2007). Empathy has been found to be related to donations (Bekkers & Wilhem, 2006), as well as altruism.
Big Five Personality traits:
Agreeableness: “is considerate and kind to others”
Openness to experience: “is eager for knowledge”
The self-control scale. Sample item: “I am good at resisting temptation.” 10-item scale split between two links below.
Cognitive ability*
Persons with higher verbal scores (Bekkers & De Graaf, 2006), IQ (Millet & Dewitte, 2007), GPA (Marr et al., 2005), and ability to think in abstract terms (Levy et al., 2002) donate more.
Innovation exercise to assess emotional intelligence.
“What emotion was shown by the individual? For every emotion, please rate how strongly you perceived it. If you saw a group, please rate the emotion of the individual in the middle.”
For questions assessing quantitative skills (probabilities):
“Out of 1,000 people in a small town 500 are members of a choir. Out of these 500 members in the choir 100 are men. Out of the 500 inhabitants that are not in the choir 300 are men. What is the probability that a randomly drawn man is a member of the choir? Please indicate the probability in percent.”
Context*
Donations are influenced by behavior of coworkers in the same salary quartile (positive; Carman, 2006), income inequality (negative; Okten & Osili, 2004), individualistic cultures (positive; Kemmelmeier et al., 2006), and the stock market (positive; Drezner, 2006).
Stock market optimism: “Initially we focus on the next year (next 12 months). Do you expect the DAX [German blue-chip index] to show rather profit or loss compared to the current value?”
Numeric version: “Expressed in numbers: What [Profit/Loss] do you expect for the next year overall in percent?”
This same question stem of stock market optimism is used for items about the next two, ten, and thirty years
Occupational prestige*
Generally, positively related to donations (Carroll, McCarthy, & Newman, 2006).
Current occupation (open question):
----------
Occupation (answer choices included):
----------Political orientation*
Each occupation choice is then further refined:
Political orientation*
Previously, no differences were found for secular donations (Brooks, 2005), but Fehr et al. (2019: 26) find that “for right-of-center respondents, there are indications that higher national relative income is related both correlationally and causally to more giving to poor Germans and Kenyans.”
Item designed by Fehr et al. (2019):
“In politics people often talk about ‘left’ and ‘right’ to mark different political attitudes. If you think about your own political attitude: Where would you place yourself?”
Locus of control*
Persons with an internal locus of control are more likely to engage in philanthropy and other formal helping behaviors (Amato, 1985).
Ten item scale with the stem: “The following statements describe different attitudes towards life and the future. To which degree do you personally agree with the individual statements?”
Health*
People in better health donate more (Bekkers 2006b, Bekkers & De Graaf, 2006).
“How would you describe your current health?”
“How satisfied are you with your health?”
Mood*
Positive affect facilitates giving, while negative moods may also facilitate giving in specific circumstances but it is conditional on lots of factors (e.g., helping contains minimal barriers and when prompted to think about the negative feelings that would result from not helping; Cunningham et al., 1980; Weyant, 1978).
Short scale of emotions (angry, afraid, happy, sad):
“Thinking back on the past four weeks, please state how often you have experienced each of the following feelings very rarely, rarely, occasionally, often, or very often. How often have you felt...”
Values*
Endorse of prosocial values has a positive association with charitable giving. This is also true of individuals who are less materialistic (Sargeant et al., 2000) and care about justice (Todd & Lawson, 1999).
Questions 172-175 on justice. For example, the stem “To begin with it is about situations which result in others advantage and your disadvantage, because you were penalized, exploited or treated unfair. To what extent do you agree with the following statements?” Followed by “It makes me angry when other are undeservingly better off than me.”
----------
Previous donations*
Charitable giving is to some extent habitual behavior (Barrett, 1991; Barrett et al., 1997).
Not available for SOEP-IS
Optimism
Belief that the future could be better might provide motivation to influence it in becoming better.
“When you think about the future, are you…”
Likelihood of events (e.g., financially successful, not get any serious illness, successful at work, content in general) happening compared to other people the same age and gender.
Life satisfaction
Spending money on others has been shown to have a consistent, causal impact on well-being (Aknin, Barrington-Leigh, Dunn, Helliwell, Biswas-Diener, Kemeza, Nyende, Ashton-James, & Norton, 2010). “One possibility is reverse causality, that is, that those who are inherently happier by nature are also more likely to help individuals” (Moynihan, DeLeire, & Enami, 2015).
“In conclusion, we would like to ask you about your satisfaction with your life in general. How satisfied are you with your life, all things considered?”
Risk propensity
Cluelessness has been cited as a case against longtermism (Greaves & MacAskill, 2021). Thus, individuals that are predisposed to EA but are risk-adverse may be more likely to make global health and development donations.
Stem: “What do you think about yourself: How prepared to take risks are you in general?”
“not ready to take risk at all ... ready to take risk”
“What did you think of when you made your estimate (i.e., the value) regarding your preparedness to take risks?”
Religious involvement*
One of the most studied variables in philanthropic studies. However, a large body of research finds that religious involvement is not related (or even inversely related) to secular giving (Brooks, 2005; Lyons & Nivison-Smith, 2006; Lyons & Passey, 2005). Still, given its prominence (and that fact that there are religious EA groups), it is worth including in our analysis.
Has been found to have a positive relationship with secular giving (Yen, 2002), more EA-aligned giving (e.g., development aid versus emergency aid; Srnka et al., 2003), and there are conflicting results on whether education impacts the amount donated (c.f., Schervish & Havens, 1997; Brooks, 2002).
“What type of vocational training or university degree did you receive?”
A handful of studies have found graduates of different fields to be differentially generous, although which groups are at the top is inconclusive (c.f., Bekkers & De Graaf, 2006; Belfield & Beney, 2000)
A description of our most promising academic paper ideas based on the opportunities we have so far
Why list these here? By identifying specific hypotheses (for an academic paper), it will help:
Generate ideas for non-profits to test
Avoid indecision about what ideas to try
How this 'gitbook' works
Explain how to add content, embed, groups vs pages vs subpages, how we're organizing it, how/who to join/invite, , payment/cost, the link with git/github (for tech people), formatting tweaks
What is this Gitbook (wiki) and what is it for?
Rather than chains of disconnected emails and many unlinked Google docs, I (David Reinstein) thought it would be better to organize our project with this well-structured format.
Other tech
Airtable, Slack, etc.
Airtable
Airtable is an online database that is user-friendly and social. We are using the airtable "GWWC+ testing/trial ideas" (ask for edit access) to keep a simple listing of key elements and structured information; in conjunction with this Gitbook.
The first table in the airtable (picture below) explains all the other tables
Charity ratings, rankings, messages
Considering 'what information and ratings are out there about charity effectiveness and how is it/should it/could it be presented
What are the existing sources of information and ratings about charity effectiveness? How credible are these? How are these presented, and how could/should they be presented?
Givewell
Innovationsinfundraising.org
Innovations in Fundraising was an academic impact project and resource. innovationsinfundraising.org was hosted as an interactive Dokuwiki.
It aimed:
To explain and promote practical fundraising innovations stemming from academic research, to encourage trials and experiments, to promote effective giving and encourage collaboration and knowledge-sharing.
A key resource was a linked interactive database of 1. relevant papers, and 2. relevant 'tools'. Our automation tools allowed us to update this content via an Airtable, integrating it into the formatted DokuWiki table.
The project is no longer being hosted. Please contact David Reinstein to request access to any of the resources (or the underlying Airtable).
Literature: animal advocacy messaging
DR: How people respond to animal advocacy ads and what appeals to them more? XXX redacted
There was no clear trend showing which tactics were most effective. Among the top ten, some used writing, pictures or virtual reality to show the suffering of animals on factory farms. Others added information about the health and environmental impacts of factory farming. Still others gave specific suggestions on how to eat less meat or discussed laws to improve how animals are treated on farms.
There was no clear trend showing which psychological strategies were most effective, although many different strategies were employed. Tactics often employed descriptions of how eating meat is becoming less normal, the emotions of farm animals, individual victims of factory farming, comparisons between farm animals and pets, and specific suggestions for how to eat less meat.The journal-published version:
https://www.sciencedirect.com/science/article/pii/S0195666321001847?via%3Dihub
DR: Thanks. But I guess this stuff was mainly trying to appeal to the general public. XXX REDACTED I think the group that is being targeted is rather different.
Anytthing in the above seems specifically relevant to this, like stuff trying to get people who are already interested in animals to pursue it more seriously?
Items 888-928 assess the ability to do expected utility calculations:
“Please imagine the following situation: You have the choice between a safe payment and a lottery. In detail: Do you prefer a 50% opportunity to win 300 Euro while you do not win anything by 50% or a safe payment of 160 Euro.”
“Now answer another question within 20 seconds. Continue the multiplication tables of the base 17 as far as possible. Starting with 17, 34, etc. The time is running - now.”
This version is currently PUBLIC but unlisted. It doesn't contain information on on our trials or marketing activities (as of 18 Jan 2022), but we hope to be adding and integrating some details soon. We hope to make most of this public in due time, in line with information sharing and open science.
What do the sections and groups mean?
"Groups" can hold multiple pages and pages can have sub-pages. But groups cannot have subgroups and the groups have no direct link (while pages do). (In the 'git repo' groups seem to be represented by folders).
How do I edit it and add content?
Getting started
If you have 'write (Editor) access' ....
Update: as of 15 Oct 2021 Gitbook has changed its protocols. You now need to
click the icon in the upper right to 'start a change request',
and then 'submit' this request when you are ready (ideally, with a brief informative message explaining what you have done.
Give it a try. Once you 'submit', you, or someone else can 'merge' it in.
In newly created blocks/elements "command-slash" (on mac) brings up a lot of cool options (scroll down)
Typing the "@" symbol offers a quick way to link other pages in this book
Merges and conflicts
If you have the Administrator status, you can merge in your own, or others' changes.
What if I get a 'conflict'? If 2 people edit simultaneously and both make changes they try to merge in, this can happen. It should be simple enough to resolve. Just find the icon for the bits indicating a conflict, and choose which version you want to keep.
It should be simple enough to resolve.
Just find the icon for the bits indicating a conflict in the outline bar (that arrow triangle thing), go to that section/those sections, and choose which version you want to keep.
It 'backs up' nicely to a set of easy-to-follow markdown files and folders. If you prefer to work offline, in nice 'raw text formats' (rather than via the web interface)... you should be able to edit those files in any interface and push/merge the content in. (If you are familiar with git and Github.
The markdown and project organization syntax is a little bit distinct from others I've used, such as Rmd/bookdown. \
The folders have meaning for the structure of sections, I think, but the SUMMARY.md file seems to govern most of it. \
There is a particular dash-separated 'description' section at the top of each .md
And there are some special code elements like
{% ="URL HERE" %}
for embedded content (esp. Google docs),
... multi-tab tab elements:
..And callout boxes, including 'hints'\
A good way of starting with Airtable/databases is to think
These are just a bunch of spreadsheets or individual ‘data sets’; I’ll treat them as separate for now
Nice, it’s a bit easier to quickly add entries if I choose single or multi-select field types , or checkboxes
Hey look, if I make this a “Link” field type can easily add rows from sheet B into sheet A, that’s cool!
I can also ‘create new rows in B while adding them to A’
Cool, sheet B now has a column indicating where it has been entered into sheet A
Hmm, sheet A has stuff on it that is not relevant for our partner; let me create a simpler ‘view’ of sheet A filtering out rows hiding columns that are not relevant to our partner
"$670 provides an additional year of healthy life to a blood transfusion patient." (Note this is based on US data)
This seems implausible as an actual 'impact' of a $670 donation; it is not clearly considering the counterfactual
Updates 4 Oct 2022: There may be some promising developments within Charity Navigator; watch this space
(The Life You Can Save)
(Animal charity evaluators)
(Other)
Non-effectiveness ratings (for comparison)
Charity Navigator (mainly non-impact, see above)
Guidestar (AKA 'Candid')
ratings have little or nothing to do with impact.
- Guide Dogs for the Blind and Make-a-Wish are both top ('Platinum' rated) ... we know these are ineffective (classic examples)- Against Malaria Foundation is unrated and "New Incentives" gets the lower 'Gold' rating -- both are top-rated on GiveWell.
Also, note the Guidestar criteria:
The Platinum Seal of Transparency indicates that the Foundation shares clear and important information with the public about our goals, strategies, capabilities, achievements and progress indicators that highlight the difference the Foundation makes in the world.\
It's about transparency, not impact.
I (David Reinstein) took down innovationsinfundraising.org for several reasons including:
I didn't have time and funding to keep it updated, and I didn't want this to 'crowd out' others' work
Hosting costs (roughly $400 per year)
It was largely superceded (at least in my own work) by other resources and projects, including "EA Market Testing" (the present Gitbook, and linked resources)
I would consider reviving this in the future, and would be happy to join it with other maintailed resources. Please contact me if you would like to pursue this.
Key details of the Innovations in Fundraising Project (as of roughly 2017)
Purpose: To explain and promote practical fundraising innovations stemming from academic research, to encourage trials and experiments, to promote effective giving and encourage collaboration and knowledge-sharing
Key innovations and ideas: , default recognition, give more tomorrow
Funding and support: ESRC Impact Acceleration; University of Exeter; Centre for Effective Altruism (CEA)
We are partnering with: Employers, fundraisers, philanthropists, third-sector organisations
Scientific advisors and co-authors
Psychology: Dr. Nick Fitz, Ari Kagan (Duke)
Economics and Finance: ,
Statistics and Data science: Dr. Mark Kelson (Exeter)
Research and technical assistance: Katja Abramova, Janek Kretschmer. Previous contributors: Audrey Utchen, Agata Siuchinska, Samuel Dexter, Alexis Carlier, Louis Philipp Lukas, David Serero, Daisy Newbold-Harrop.
Wider project: The Innovations in Fundraising impact project aims to unlock generosity and increase the level and impact of charitable giving in the UK and abroad, while enhancing donors’ understanding and appreciation of their generosity.
IIF's key activities and resources included...
Knowledge exchange, tools
Build Innovations in Fundraising Wiki interactive knowledge and resource base on core issues
… including employee giving (schemes), incentive pay, philanthropy in the UK, practical fundraising research findings
Produce reports and guidelines to explain research results, industry knowledge, and best practice to a wider audience
Engagement, innovation
Engage banks, investment firms, fundraiser and other large employers, individually and in small groups, to discuss goals, processes, and opportunities for promoting employee giving in innovative ways, with a focus on high-impact charities
Hold meetings and focus groups to identify the necessary requirements and potential obstacles to implementing and testing systems allowing employees to commit potential incentive pay (bonuses) to charity
(We will collaborate with CEA in organising these, and these will also incorporate CEA initiatives and priorities)
Run pilots and controlled trials of ‘Give if You Win’ and other employee-giving innovations inside firms