1 of 83

EA market testing (public)

Introduction/overview

Introduction & explanation

See also

Our public reports of trials and analysis in the web-book here
The Barriers to Effective Giving living web book
Our regularly updated 'data analysis report' on all the trials and evidence, which you can download HERE as a protected zip file (need to request password, permission granted with consent of participating organizations)
EAMT Goals, FAQ; EAMT Progress and results

What is the "EA Market Testing Team"?

We are a small group of researchers and practitioners. This project is organized by David Reinstein , who maintains this wiki/Gitbook and other resources.

We aim to promote awareness and understanding of the (EA), and "to make giving effectively and significantly a cultural norm." We consider marketing campaigns, charitable appeals, events, and public communication, working both with our partner organizations and in independent surveys and trials. We want to improve the design and messaging of organizations like Giving What We Can and 80000 Hours to improve their outreach methods and maximize their impact. Measuring and testing 'what works and when': While we are also testing and analyzing this rigorously. We help run and track careful data collection and rigorous controlled trials, as well as helping to organize the reporting of less rigorous trials. We robustly analyze the results to better understand which approaches tend to have a more positive impact. Communication: We track, organize, and share what we have learned with the EA community, building and organizing resources and a knowledge base. This will address questions such as:

How to implement and test marketing campaigns?
What has worked to promote EA?
What motivates/blocks effective giving?
What profiles of people are most likely to be interested in effective altruism?

We strive to be transparent. We want to report and share our data, procedures, code, and evidence without overselling the results.

We believe this is the first organized collaboration of its kind. We aim to...

Coalesce our understanding and evidence on barriers and facilitators of effective altruism, effective giving, and effective action
Run a broad set of high-powered trials (large samples, high-stake real-world contexts, substantial differences between conditions)
- ... to gather evidence on what works best to promote meaningful actions in specific cases,
- ... while aiming at
Do profiling, survey, and segmentation activities and trials, building evidence on 'which types of people' are most responsive to effective giving messages and appeals
Share our results, data, and tools, with the relevant EA and research-interested communities. This will enable more and better outreach, promotion, testing, and insight.

What is our mission? ...

As EAMT has progressed, we have encouraged others to do work and pursue initiatives in the 'space' of studying EA messaging, and marketing EA and effective giving. We hope that the resources we have provided, and the connections we have made have contributed to this. As the space changes, the EAMT mission, scope, and activities are adjusting as well.

We are moving towards a heightened focus on

Advising, proposing, and helping to design and coordinate experiments, trials, and initiatives.
Transparent presentation of the results, rigorous statistical analysis
Synthesizing, sharing, and communicating this knowledge and skills base

This work provides substantial public goods, whose benefit is shared among the partner organizations and the EA community.

Other relevant/new organizations and initiatives

(including User Friendly, Good Impressions, and Altruistic Agency)
Rhetorical.org
Giving What We Can 'Bequests' (a project we have encouraged and advised)
Effective Altruism Psychology Lab (NYU)
Effective Giving Collaboration and Summit

We believe the EA Market Testing Team is the first organized collaboration of its kind.

Goals and FAQ

Goals/FAQ link (below in detail, scroll outside margin to skip past it)

What have we accomplished?

For an overview of our progress and ongoing work, see the 'progress and results' document we are building. (Below in detail, scroll outside margin to skip past it)

Note that we cannot publicly share details of ongoing and upcoming trials. We aim to share the results when it is possible. We aim to integrate shareable aspects of this private doc.

For a data-driven dynamic document covering (some of) our trials and evidence see HERE.

"Testimonials"

Luke Freeman, Executive Director of Giving What We Can

"The EA Market Testing team has been very helpful in helping us to pursue our mission of creating a world where giving effectively and significantly is a cultural norm. They have helped us at each stage along the process of ideation through to analysis so that we can base our outreach activities on sound theory and strong evidence. This is at a particularly important time as we have been scaling up our marketing activities to reach and engage new audiences with effective giving and the ideas of effective altruism more broadly. We look forward to an ongoing collaboration with EAMT so that we can continue to iterate and increase our impact.”

Grace Adams, Head of Marketing, Giving What We Can:

It’s been extremely useful to hear what others in EA, individuals and orgs are doing and sharing learnings between us. I hope that we can develop a set of tactics that we know successfully convert people and get them more involved in EA. A reliable set of best practices for marketing EA would be a great outcome.

Greg Gianoupolis, Charity Elections

"As a quick testimonial relevant to this stage of the process, David [Reinstein]'s support has been critical to the Charity Elections team's development of plans for marketing and program evaluation. Our first ad campaign was particularly impactful, generating one click to the Charity Elections page on the Giving What We Can website per $0.01 spent on the campaign. We will continue to incorporate his advice into our advertising to spread awareness of the Charity Elections program among high school students and teachers."

How to get involved?

If you are interested in getting involved with our project or have feedback for us, contact David Reinstein at daaronr AT gmail.com.

Next, check out the Gitbook content overview.

(For an explanation of this Gitbook's structure, content, and aims.

This quote comes from the 2022 Giving What We Can strategy document.
However, we are also careful to be efficient, recognizing the tradeoffs between rigorous experimental design and practical marketing.
Including testing templates, guides and implementation tips

Meet the team

We are a group of researchers and practitioners across a range of fields (Economics, Psychology, Marketing, Statistics) and organizations, particularly those interested in effective charitable giving and effective altruism. This is outlined in the , with embedded views below.

This project is organized by , who maintains this wiki/Gitbook and other resources.

As individuals and organizations, we are goal-driven and impact-driven: we are in this to improve the world, particularly through directing funds and support to the most effective rercauses and interventions. Because we share these common goals, we are better aligned for collaboration than typical academics and charitable organizations. We have an unprecedented opportunity to collaborate, learn what works, and 'move the needle'.

We are actively collaborating with the following organizations (links indicate publicly reportable trials)

Content overview

You can now ask questions of this gitbook using a 'chatbot': click the search bar and choose 'lens'.

Introduction

Our mission, what we are trying to do and why, most recent updates, and the organization of our team

Other key resources

EA Market Testing data analysis: dynamic document/notebook (Quarto site) covering our trials & evidence
Airtable view of the relevant trials (); links, categorization provided

Five Key Sections

1) Partner Organizations and Trials 🤝

In this section, you will find with organizations, including Giving What We Can and One For the World.

2) Marketing & Testing: Opportunities, Tools, Tips 🪧

Here we share tools to implement planned trials, as well as tips relevant to 'doing marketing'. We answer questions like how to set up campaigns and track outcomes on various platforms. See "Implementing ads..." and "Collecting data..."

3) Research Design and Methodology 🎨

We discuss qualitative and quantitative research design and methodology issues that are relevant to the trials we are running. Pages in this section will be linked in reports when relevant to a particular trial.

4) Profiling and Segmentation Project 🧮

Our profiling project aims to help better understand what sorts of people are amenable to EA-related ideas and to taking EA-favored actions.

5) (In)effective Altruistic Choices: Review of Theory and Evidence 📋

We've done a review of existing literature: to inform the trials we are running, and to identify important research topics. This includes What is known/models of effective giving and Principles and theories behind potential trials.

You can find references, tech support, and other resources in the appendix.

What is this 'Gitbook' meant for?

The three key aims of this public gitbook are to:

Convey who we are, what have accomplished, and the scope of our work to funders, people in the broader EA community, and people not yet involved in the project who would be interested in joining
Share tools and knowledge with people in the EA/global priorities community who will apply it to their work. We are building a knowledge base. Content in the public gitbook can inform and support a diverse set of projects (i.e., implementing marketing campaigns, fundraising initiatives, academic research)
Seek feedback on our work. This includes technical and industry feedback on implementation and academic expertise (literature reviews and frameworks to consider, methodology, and experimental design).
(Grouped by organizational partner.)
We include background information on each organization and its priorities for testing.

Progress/goals (early 2023)

Partners, contexts, trials

Introduction

For a data-driven dynamic document covering (some of) our trials & evidence see

In the Partner Organizations and Trials section, you will find reports of the trials we have run with organizations, including and .

These trials are also cataloged : (); .

Our primary approach and goals

We want to identify the most effective and scalable strategies for marketing EA and EA-adjacent ideas and actions. To do this, we believe that running real-world marketing trials and experiments with EA-aligned organizations will provide the best evidence to act upon. By systematically varying the messaging, framing, and contexts, we can map out 'what works better where'.

We believe this approach is likely to be the most fruitful because:

Using naturally-occurring populations in real-world settings with meaningful costly choices and outcomes will lead to more relevant findings. In comparison to convenience samples of undergraduates or professional survey participants who are aware that they are doing a research study, we anticipate greater:
- Internal validity: our results are less likely to be influenced by biases, such as acquiescence bias and hypothetical decision-making.
- : the context we are testing is similar or identical to the context we care about.
We will "learn by doing" by encountering unanticipated obstacles and learning about practical involved with advertising, promotion, and communication.
We can share what we learn with relevant EA organizations and audiences. They then can build on our findings, rather than having to repeatedly make mistakes themselves.
The trials themselves should also have a direct positive value in promoting EA.
There is limited downside risk. We are generally not testing risky messages and are careful to avoid diluting or misrepresenting EA's core ideas.

Key themes, priorities, and 'high-value questions'

This project primarily aims at:

Robust and generalizable insights that improve communication and messaging
Meaningful and relevant long-run outcomes, such as;
- Creating new, strong EAs by getting people more interested and involved in EA ideas, actions, and the community
- Having people consider and identify with key values and practices, such as making meaningful altruistic choices, considering effectiveness and impact in doing so, strong analytical and epistemic practices, and broad (or carefully considered) moral circles
Across a range of EA causes and groups (longtermism, global health, animal welfare)

In the document below (), we consider the shared goals, paths, and questions that are valid across organizations. Specifically, these are actionable and promising themes and projects that can be implemented, measured, and communicated fluidly throughout the EA network.

Giving What We Can

Luke Freeman is the lead contact.

Giving What We Can's mission is to make giving effectively and significantly a cultural norm. has updated their. They are looking to significantly increase their marketing activity by producing videos, funding ads, and conducting systematic and robust research. As such there will be a large crossover between our work and theirs. This section highlights our collaborative efforts.

Presentation: overview

Ideas and opportunities

We want to learn from existing work, run tests on the GWWC platform, and support research into this.

Stages of the funnel:

Awareness & Consideration
Increase casual visitors and raise curiosity
Conversion & Acquisition
Donate or pledge to donate
Retention
Fulfill and report pledge
Advocacy
Promoting GWWC to others

Some key questions

“What should the call to action be for the casual person in the funnel?”
Testing all parts of funnel/pledge journey; website, welcome messages/welcome packages, reminders and thank-you's

Completed studies: See sections below

Pledge page (options trial)

The presentation of options on GWWC's 'pledge page' were randomly varied at the individual browser level over a certain period to see which option increased pledges.

A summary of this has been shared as a post on the EA Forum.

Notes on content/building this

This follows the Trial reporting template, edited slightly for public reading.

We intend to redo and augment much of this analysis in a more transparent way; directly importing the data and doing our own analyses ....rather than Google's built-in tools. We intend to put this within the EAMT Analysis web-book here.

Summary of trial and results

Giving What We Can (GWWC) has three giving pledge options, displayed in the 'Original presentation version' below.

From April-July 2021 they ran a trial presenting its 'pledge page' options in three slightly different ways. Considering 'clicks on any button' as the outcome, and a Bayesian 'preponderance of evidence' standard...

"Separate Bullets for Other Pledges" was the most successful presentation. It only showed a box for "The Pledge", with the other options given in less prominent bullet points below. This had about a 20% higher incidence rate than the Original presentation.
"Pledge before Try Giving" was the least successful presentation this was like the one displayed above, but with "Try Giving" in the central position. This had about a 23% lower incidence rate than the Original presentation.

These results may only apply narrowly to the GWWC pledge case, and even here, we have some caveats. However, it loosely suggests that when making a call to action, it may be most effective to present the most well-known and expected option most prominently, and not to emphasize the range of choices (see further discussion below).

Getting people to take the GWWC pledge may be seen as an important outcome on its own. It on getting people engaged in the Effective Altruism community and other EA activities, such as EA career impact decisions.

General idea and main hypothesis

GWWC: How can we present pledge options to maximize positive outcomes (pledges, fulfillment)?

General: For those considering making substantial giving pledges (of a share of their income), how does the presentation of these 'pledge options' matter?

Theories and mechanisms to consider:

Tendency to choose 'middle options' (Simonson and Tversky 1992)
Too many options may lead to 'indecision paralysis'
The signaling power of choice; e.g., if there's a 'more virtuous choice' I may feel that my 'middle choice' looks less good by comparison

Background and context

GWWC has three distinct pledge options, as shown above

1. "Try Giving" (1% of income),

2. "The Pledge" (10% of income)

3. The "Further Pledge" (donate all income above a living allowance).

here (link from October 2020).

Three versions of this page were randomly presented (between 19-21 April and 10 July 2021)

The content of the key 'choice button' part varied between these three versions

"Original:" A block of three (in the order of commitment) 'The Pledge' (10%) in the center and highlighted (see above)
"Pledge before TryGiving": A block of 3 with "Try Giving" (1%) in the center and highlighted
"Separate Bullets for Other Pledges": A single block for 'The Pledge' (10%), with the other pledges given as clickable bullet points below (as well as a bullet for the 'company pledge' ... which had a different presentation in other versions)

The version presented stayed constant according to an individual's IP cookie tracking.

Points of contact, Timing of trial, Digital location of project/data, Environment

Points of contact

Julian Hazell (julian.hazell at givingwhatwecan.org), Luke Freeman

'Academic' contact: David Reinstein.

Timing of trial (when will it/did it start and end, if known)

Start: 19 April 2021 (or 21 April)? End: 10 July 2021 (Source: Google Analytics)

Digital location where project 'lives'

(Planning, material, data)

Statistics are available on Google Analytics/Optimizely. Reinstein has access to this and, is planning to input into R for more detailed analysis, to be reported in the analysis web book.

The present document is currently (11 May 2022) the only writeup.

Environment/context for trial

https://www.givingwhatwecan.org/pledge/ ... see above

Participant universe and sample size

'Everyone going to the above page' within the above time duration.
People interested in GWWC pledges'

Sample size: see below, from Google Analytics

Key treatment(s)

"Original" (Block of 3 in order of commitment, Middle Pledge in Center)

2. "Pledge before TryGiving" ... as above but with Try Giving and The Pledge swapped, and Try Giving (in the center) highlighted

3. "Separate Bullets for Other Pledges" (see below)

Treatment assignment procedure

Three versions of this page were randomly presented
Equal likelihood of

The non-exact balance below seems an imbalance in 'sessions' not in 'participants'.

Our analysis should focus on outcomes per participant; thus, the figures below may need some adjusting (although at first pass, the results go in the same direction). This doesn't seem to be adaptive assignment. In Google's help on 'create an A/B test' they state:

All variants are weighted equally by default in Optimize. A visitor who is included in your experiment has an equal chance of seeing any of your variants.

The version presented stayed constant

Outcome data

Statistics on Google Analytics: This records only 'pressed any button' (any pledge) as the successful outcome.

Ideally, for future trials, this would include...

One entry per page view over the interval, detailing

Whether pledged
Which pledge
Time and date of view, Time spent on page, Other clicks, Location of user, Any other information about user

Most importantly:

Number of page views over the interval, by treatment
Number of pledges over the interval
- by treatment
- by type of pledge
Follow-up donations etc (if connectable)

Ex-post: Reporting results (brief)

Implementation and data collection

See Google A/B, optimize interface for details on data extraction from the interface

From shared image from Google Analytics:

'Experiment sessions' (observations) by treatment (as labeled on Google Analytics shared image):

Original: 2588

Pledge before Try Giving: 2686

Separate Bullets for Other Pledges: 2718

Total: 7992 sessions (=2588+2686+2718)

3. Where is the data stored ... [noted above]

Basic results/outcomes

Quick interpretation

The "separate bullets for other pledges" seems to have been the most successful, with an 0.49% higher (percentage point) incidence rate than the 'Original', i.e., a 22% higher rate of pledging (2.69 vs 2.20).

These differences seem unlikely to be statistically significant in a conventional sense. Still, Google analytics (presumably a reasonable Bayesian) model states an 80% chance that this is the best treatment, and this seems useful and informative.

If anything, these result for 'separate bullets' seems potentially understated...

Note that GA is reporting conversions based on sessions (contiguous use periods) and not users. We can reasonably assume that a roughly equal number of users were assigned to each treatment (as per the design). As a result, we assume that roughly equal shares 'viewed the relevant page at least once' (because of the law of large numbers). However, the most successful treatment, the 'Separate block', is recording more sessions. Thus, the relative conversion rate, as a share of users, would be even higher than the one reported here, relative to the baseline.

Aside on statistics

Optimize uses Bayesian inference to generate its reports... Optimize chooses its priors to be quite uninformed.

DR: But this still doesn't tell us what these priors are. There's a lot of sensitivity to this choice, in my experience.

Dillon: there is possibly a more sophisticated approach to this than what Google is doing ... the better prior is an 'empirical Bayes' approach (but it may be controversial). See this guide to empirical Bayes

The "Pledge Before Try giving" treatment performed substantially worse than the original.

The poor performance of ‘pledge before try giving’ ...

The poor performance of ‘pledge before try giving’ appears even more substantial than the strength of ‘Separate Block’. It even seems to border on conventional statistical significance … I expect that in a standard comparison of the latter two treatments, we’d find conventional statistical significance.

These differences are meaningful–consider the 'posteriors':

Downloading the 'Analytics data' behind the above graphs, we see:

Variant

2.5th Percentile Modeled Improvement

25th Percentile Modeled Improvement

Modeled Improvement

75th Percentile Modeled Improvement

97.5th Percentile Modeled Improvement

Original

Pledge Before Try Giving

-50%

-33%

-23%

-11%

18%

Separate Bullets For Other Pledges

-18%

20%

36%

76%

This suggests it is very reasonable to think that 'Separate Bullets' is substantially better

Our 'posterior' probability thus infers that

a 2.5% chance that 'Separate Bullets' (SB) has an 18% (or more) lower conversion rate than 'Original'
a 22.5% chance on SB being between 18% worse and 4% better
a 25% chance of SB being 4-20% better
a 25% chance of SB being 20-36% better
A 22.5% chance of SB being 36-76% better
A 2.5% chance of SB being more than 76% better

We can also combine intervals, to make statements like ...

a 50% chance of being 4-36% better
a 50% chance of being 20-76% better

For 'Pledge before...' (PB) we can state, e.g.,

PB has a 75% chance of being at least 11% worse than Original
and a 50% chance of being at least 23% worse than Original

Intuitive interpretation

Perhaps giving people more options makes them indecisive. They may be particularly reluctant to choose a “relatively ambitious giving pledge” if a less ambitious option is highlighted.

This could also involve issues of self and social signaling. If the 'main thing' to do is a 10% pledge (as in "separate bullets"), then this may seem a straightforward way of conveying 'I am generous'. On the other hand, if the 'Further pledge' is fairly prominent, perhaps the signal feels less positive. And if the '1% pledge' is made central, 10% might seem more than a necessary signal.

The "pledge before try giving" may perform the worst because it makes the 'Try Giving' pledge a particularly salient alternative option. (In contrast, the "Original" at least makes 'The 10% Pledge' the central and the middle option.)

But in this case, why should the overall pledge rate (any button-press) be lower with more options (Original vs 'separate bullets'), and lower still when Try Giving is made central?

It's hard to say too much if we don't know the composition of the pledges people make.

Still, it might be that people mainly came in with the desire to take The Pledge (10%), as this is most heavily promoted. In such a case, making other pledge possibilities prominent may A. Cause people to rethink their choices and delay a decision (perhaps never returning) and/or B. Feel less comfortable with the overall 'signal' their pledge will send. This doesn't mean that the 'multiple boxes' environment are worse overall, but it may perform worse for those people coming here, as these were the people particularly attracted by the '10% is the main thing' signaling environment.

Caveats

I am assuming that the 'outcome being measured here' is whether the person 'clicked on any giving pledge'; this is what Luke has conveyed to me

I assume this is 'conversions ever from this IP', and 'sessions' represents 'how many different IPs came to the treatment'. If it's something else (e.g., each 'session' is a 'visit' from an individual), this could reflect these people converting in fewer sessions but not necessarily being more likely to convert overall. Even if this is 'by IP' the alternative interpretation 'not converting now but maybe later' may still have some weight if people are entering through multiple devices.

We should try to focus more carefully on 'whether this is having any effect on ultimate pledge-taking and pledge-follow-through behavior'.

I would be surprised if a moderate difference in the framing of a particular page should have such a large (2.69-1.71/1.71 = 57%) impact on the incidence of such a large life choice, involving at least tens of thousands of dollars. However, I still expect the incidence of 'click this button' to be likely related to that ultimate outcome, thus I suspect these results are still informative and useful as they stand.

Giving guides - Facebook

Along with GWWC, we tested marketing and messaging themes on Facebook in their Effective Giving Guide Facebook Lead campaigns. Across four trials we compared the effectiveness of different types of (1) messages, (2) videos, and (3) targeted audiences.

A summary of this has been shared as a post on the EA Forum:

We build the results and analysis transparently in the EAMT Analysis web-book here.

Context: Facebook ads on a range of audiences

... [with text and rich content promoting effective giving and a "giving guide" -- links people to a Giving What We Can page asking for their email in exchange for the guide]

Objective: Test distinct aiming to get people to download our Giving Guide. A key comparison:

Also informative about costs and the 'value of targeting different groups' in this context.

Key findings:

The cost of an email address via a Facebook campaign during Giving Season was .
“Only 3% of people give effectively,” seems to be an effective message for generating link clicks and email addresses, relative to the other messages.
Lookalike and animal rights audiences seem to be the most promising audiences.
Demographics are not very predictive on a per-$ basis.

Key caveats

Specificity and interpretation: All comparisons are not for 'audiences of similar composition' but for 'the best audience Facebook could find to show the ads, within each group, according to its algorithm'. Thus, differences in performance may combine 'better targeting' with 'better performance on the targeted group'. See our discussion of the 'divergent delivery' problem HERE. I.e., we can make statements about "what works better on Facebook in this context and maybe similar contexts", , as the targeting within each audience may differ in unobserved ways.

The outcome is 'click to download the giving guide'.

Previous writeup and results

Link to the previous Gdoc report

Message Test (Feb 2022)

Summary

Main Question: Do some message themes work better than others for drawing visitors to Giving What We Can’s landing page?

Main findings: 'Social proof messages' on Facebook ads were most effective at generating landing page views per dollar compared to other message themes (effectiveness, services, giving more, and values).

Future directions: There were significant differences in 'link clicks per dollar' on the different messages by age. We recommend a systematic test to determine if age makes a difference in the relative effectiveness of social proof and values messages. Future studies could explore why the social proof message was more effective in this study than the previous giving guide study and the importance of the message to “join” the movement as social proof.

Possible connection between this trial and the : Note that the two best-performing messages both prompted the user to “join” a movement or a group of people (perhaps an elite group); but beware .

to report below.

Pre-trial reporting template

General idea, main 'hypothesis' (if there is one)

In this test, we are aiming to find out if one 'theme' of messages resonates better with our target audience than others.

If we knew which 'themes' were most effective with our advertising, then we could create more ads on this theme and improve our conversion.

Specifically, which of the following themes resonate with our target audience the most:

effectiveness
giving more
social proof
values
services

On choosing an objective of this test, originally I planned to use link clicks, but this is not the most high quality indicator of conversion, and when I tried to use newsletter signups Facebook warned me that I might not see any conversions at all... So instead, the campaign will optimise for landing page views, which is slightly better than a link click and will generate enough conversions that we should [see?] we statistically significant results.

Point of contact (at organization running trial)

Grace Adams

Timing of trial (when will it/did it start and end, if known)

Trial will run for 7 days on GWWC's ad account, from 9.30am AEDT Friday 25 Feb to 9.30am AEDT Friday 4 Mar.

Digital location where project 'lives' (planning, material, data)

Working document can be found but all important details will be listed in this brief

Environment/context for trial

This test will take place on Meta platforms including Facebook and Instagram

Participant universe and sample size

We are targeting a "Philanthropy and Lookalikes (18-39)" audiences, based in UK, US or Netherland
Estimates from Facebook: Reach is expected to be 1.4K-4.1K per day (7 days) per ad set (5 ad sets) = 49K-143K
Estimates from Facebook: Conversion is expected to be 10-30 landing page views per day (7 days) per ad set (5 ad sets) = 350-1050

Key treatment(s)

We are using the GWWC Brand Video by Hypercube as the creative across all tests. Although it did not perform as well as our other ads in the Giving Guide campaign, I think that it will interfere less with our messages we aim to test.

We are going to test a set of messages for each theme, please see them in the

Mock up of ad:

Treatment assignment procedure

This test has been set up as an A/B test through Facebook, testing each campaign head to head, each campaign covers one theme, with the different ads as a child.
This will allow us to test which theme was better, not just which individual ads
A/B testing on facebook will ensure that the audiences fall into an individual treatment group

Outcome data

Primary measure will be cost per landing page view, but secondary measures such as CPC, 3 second video plays, email sign ups will also be tracked
Data will live on Meta ads platform

YouTube Remarketing

GWWC youtube remarketing campaign (trial)

See also the cross-organization notes on advertising, google, youtube, etc (=placeholder for now) and the tips on Doing and funding ads

YouTube Remarketing

July 20, 2021: GWWC launched a YouTube remarketing campaign. That means that when someone goes to the GWWC website, leaves, and then goes to YouTube we show them one of the following videos:

Algorithm decides which video to present to people.

Understanding assignment, proposing experimental design @Joshua Lewis’s questions:

Q: Is each video assigned to a different situation or are videos randomly chosen to be displayed? If the latter, you could randomize videos by location and see if the different videos were more or less effective. Alternatively, just randomizing the whole campaign seems like a good idea to me....

A: Videos are selected based on the likelihood of the user watching >30 seconds (by the algorithm) ... randomization by individual will be hard because users don't click and act right away. Instead I think we have to randomize by geography

Results summary (Early, JS Winchell; may need update)

Most important takeaway: It costs $1 to get a website visitor to watch 1h of your videos! High level metrics

Cost: $205
Views: 6,071 (a view is when a user chooses to watch >30s of an ad)
Total watch time: 223 hours (~$1/h)
Unique viewers: 4,937 (this is an estimate)
Average impressions per user: 5.8
View rate: 20% (20% of the time people choose to watch more than 30s)
CTR: 0.37%
Average CPC: $1.83
Conversions (users spending >30s on the website): 2
Thinking: 'This is not a good tactic for driving site traffic or donations (although we could optimize for this instead if we wanted)'

Interesting observations\

Efficiency has significantly improved over 3 weeks

Cost per view has gone down from $0.05 per >30s view to $0.02 per >30s view
Views have increased 75% without increasing budget (from 220/day to start to 386 yesterday)

2. 10% of the time people watched the full video! \

You can see this data by video if you are interested to control for video length
E.g., 5% of people chose to watch the entire 13 minutes of _this video

3. Your best video had a view rate (% of time people choose to watch >30s) twice as good as your worst video 4. You can see view rate by age, gender, and device in the "Analytics" tab

For the 13m video, older people and men were more likely to choose to continue watching

Possible next steps

Could add "similar audiences" which is when we let Google use machine learning to find people similar to your website visitors and also show ads to them
Could walk David Reinstein and Joshua Lewis through the UI so they can get a sense of the metrics/reporting available and how it could be used for research

One For the World (OftW)

Chloë Cudaback is the lead contact (communications manager). (Previously Jack Lewars)

Background on OftW

How does OftW differ from others in this space?

Chloe: Focus on youth and university students at a pivotal point in their life

Accessible messaging, more of a starting point, less gatekeeping

David: 1% is 'more manageable' as a starting point perhaps

Luke: Narrow focus on one type of charities: global health and poverty

OftW has a donor base of ~700 active donors, ~1650 pledged donors (who pledged but haven't started donating yet) and ~2000 lapsed donors.
80% (of donors?) are in the USA
Focus on global health charities
They focus on donations to GiveWell charities ... but technically OftW pledgers can give to any 510c3

Some key goals

Reinstein/Lewars conversation notes

Activating more donors who took the pledge at university, so their donations actually start;
Retaining donors for longer once they activate;
Upselling donors to give more over time (either more as a raw amount, e.g. 'keeping pace' at 1% of their income; or more as a percentage, e.g. 'graduating' to take the 10% GWWC pledge)
Acquiring new donors with fewer touchpoints, e.g. via online advertising, via our website etc. (we currently get ~0 organic sign-ups)

Chloe's OKRs Notion

Who/what/how to test, learn, and adapt

Pipeline/groups/segments

Pledgers
Active donors, i.e., "Activated pledgers" (Chloe is thinking of segments to this and how to appeal to them)
1. Second tier -- people who have given each month for 12+ months; "Legacy donors" (DR: maybe 1x per year high-value donors should be in this group)
One-time donors (these may or may not be pledgers)
Cancelled
Payment failures

Another group worth considering: 'pledge-curious supporters'

Goals/actions

'Activating' Pledgers as donors (pledged but not donated)
Active donors
- Retain
- Upsell (maybe only to the second tier?)
Acquiring pledges, perhaps from a 'pledge-curious group'

Interested in knowing more about

Content -- expand our ability to tell stories about the beneficiaries
- Ways to tell these stories
Frequency (of comms with supporters)

Communications contexts

Platforms: Social media, email flows
Telling stories in a corporate context

Typical audiences have been students and young professionals, but there is interest in corporate outreach

Zoom and lunchtime talks in corporate contexts (How many? Seems very promising!)
- How many people are activating/pledging following these lunch+learn?

Typical donor journey:

We are in the process of creating these homepages and setting up conversion tracking. As OFTW has ~0 organic sign ups currently, we are testing for a variety of conversion routes, including: [Todo: clarify this]

university campus, someone I like tells me they are involved in OftW, asks me to come along with free food
at some point I take the pledge
It is not a highly controlled process
asking us (staff) a question by email
joining a group call with others wanting to learn about effective donating (Kennan as dir. of chapter management)
taking the pledge
making a one-off donation

Some rough numbers

650 active donors

1500 people in pipeline (pre-activation date)

750 new people a year are recruited... thinks it would be 2-2.5k

OFTW has a donor base of ~700 active donors,

~1650 pledged donors (who pledged but haven't started donating yet) and

~2000 lapsed donors.

Ongoing/completed/upcoming experiments

Email upsell emotion/impact message trial (see below)

University experiment - redacted as being prepared

Homepage message testing
Activation trial

Pre-giving-tues. email A/B

Context: Donation 'upsell' to existing pledgers

Question: Are effectiveness-minded (EA-adjacent) donors and pledgers more motivated to donate by

"A": (non-quantitative) presentation of impact and effectiveness (as in standard OftW pitch)
"B": Emotional appeals and 'identified victim' images

Further information on experiment and outcomes in in-depth replicable analysis, organized in dynamic document here

General idea, main 'hypothesis'

Are effectiveness-minded (EA-adjacent) donors and pledgers more motivated to donate by

"A": (non-quantitative) presentation of impact and effectiveness (as in standard OftW pitch)
"B": Emotional appeals and 'identified victim' images

In the context of One for The World's (OFTW) 'giving season upselling campaign', potentially generalizable to other contexts.

Academic framing: "Does the Identifiable Victims Effect (see e.g., the meta-analysis by Lee and Feeley, 2016) also motivate the most analytical and committed donors?"

Background and context

One for The World's (OFTW) 'giving season upselling campaign''

10 emails total over the course of November were sent in preparation for GivingTuesday

Point of contact (at organization running trial)

Academic-linked authors: David Reinstein, Josh Lewis, and potentially others

Timing of trial

: November 10, 18, 23, all in 2021, but may be delayed for feasibility

Digital location where project 'lives' (planning, material, data)

Present Gitbook, Google doc linked below, preregistration (OSF), and github/git repo

Environment/context for trial

Emails ... to existing OftW pledgers (asking for additional donations in Giving Season)

All 10 emails had the same CTA: make an additional $100 donation for the giving season/GivingTuesday on top of their recurring monthly pledge donation.

Participant universe and sample size

Roughly 4000 participants, as described.

A series of three campaign emails will be sent out by OftW to their regular email lists, to roughly 4000 participants, as described.

Key treatment(s)

A list of ~4500 contacts (activated pledgers) was split into two treatment groups.
Treatment Group A received emails that were focused on the contact's impact
while Treatment Group B received emails that were focused on individual stories of beneficiaries

See preregistration, treatment specifics

Treatment assignment procedure

See preregistration How many ... conditions

Outcome data

Targeting: Donation incidence and amount in the relevant 'giving season' and over the next year, specifically described in prereg under

key dependent variable

Data storage/form:

MailChimp data (Chloe is sharing this),
Reports on donations (Kennan is gathering this)

Optional/suggested additions

Planned analysis methods, preregistration link here

Cost of running trial/promotion: Time costs only (as far as I know)

Proposed/implementing design (language)

(Link)

Pre-registration work

Pre-registered on OSF in 'AsPredicted' format, content incorporated here here

Preliminary results

Overview:

The Emotion treatment leads to significantly fewer people opening emails, but more people clicking on the in-email donation link (relative to the standard Impact information treatment). However, we are statistically underpowered to detect a difference in actual donations. More evidence is needed.

Chloe: those emails that appealed to emotional storytelling performed better (higher in-email click rate) than those that were impact-focused.

DR, update: I confirm that this is indeed the case, and this is statistically significant in further analysis.

Evidence on donations

(preliminary; we are awaiting further donations in the giving season) ...

This is 'hard-coded' below. I intend to replace this with a link or embed of a dynamic document (Rmarkdown). The quantitative analysis itself, stripped of any context and connection to OftW, is hosted HERE

Note: We may wish to treat the 'email send' as the denominator, as the differing subject seemed to have led to a different number of opens

Treatment 1 (Impact): We record

1405 unique emails listed as opening a ‘control’ treatment email
29 members clicking on the donation link in an email at least once (2.1% of openers)
15 members making some one-time donation in this period (about 0.11% of openers, 0.075% of total)
8 members emails donating (likely) through the link (0.057%/0.04%)

Treatment 2 (Emotional storytelling):

1190 unique emails listed as opening an email (a significantly lower 'open rate', assuming the same shares of members were sent each set of treatment email)
56 members clicking on the donation link in an email at least once (4.7% of openers)
11 members making some one-time donation in this period (about 0.9% of openers, about 0.055% of total)
9 unique emails donating (likely) through the link (0.08%/0.045%)

Note: We may wish to treat the 'email send' as the denominator, as the differing subject seemed to have led to a different number of opens

‘Initial impressions of preliminary outcomes’

The conversion rates are rather low (0.5%) … but maybe high enough to justify sending these emails? I’m not sure.
While people are more likely to O_pen_ at least one Impact email, they are more likely to Click to donate at least once if assigned the Emotion email
But we can't say much for actual donations.
Given the low conversion rates we don’t have too much power to rule out ‘proportionally large’ differences in conversion rates (or average amounts raised) between treatments …

The figure above seems like a good summary of the ‘results so far’ on ‘what we can infer about relative incidence rates’, presuming I understand the situation correctly …I plot Y-axis: ’how likely would a difference in donations ‘as small or smaller in magnitude’” than we see in the data between the incidence … against X-axis: if the “true difference in incidence rates” were of these magnitudes

Implementation and management: Chloe Cudaback, Jack Lewars

Our data is consistent with ‘no difference’ (of course) … but it's also consistent with ‘a fairly large difference in incidence’
E.g., even if one treatment truly lead to ‘twice as many donations as the other’, we still have a 33% chance or so of seeing a difference as small as the one we see
We can reasonably ‘rule out’ differences of maybe 2.5x or greater
Main point: given the rareness of donations in this context, our sample size doesn’t let us make very strong conclusions in either direction about donations

Preregistration: OftW pre-GT

Academic-linked authors: David Reinstein, Josh Lewis, potentially others going forward

Implementation and management: Chloe Cudaback, Jack Lewars

AsPredicted questions

1) Have any data been collected for this study already?

No, no data have been collected for this study yet.

2) What's the main question being asked or hypothesis being tested in this study?

Are effectiveness-minded (EA-adjacent) donors and pledgers more motivated to donate by

"A": A (non-quantitative) mention of impact and effectiveness (in line with the standard OftW pitch)
"B": Emotional appeals and 'identified victim' images

Framing this in terms of the psychology, social science, and philanthropy literature:

"Does the Identifiable Victims Effect (see e.g., meta-analysis by Lee and Feeley, 2016) also motivate the most analytical and committed donors?"

3) Describe the key dependent variable(s) specifying how they will be measured.

d_don_specific: Whether the person receiving the series of emails makes an additional 'one time gift' following the link at OftW, within the OftW interface, during the 'Giving Season', a time-period that (for this preregistration) we declare to begin on receipt of this first email and end on 15 January 2022.
don_specific: The total amount donated through the above
don_general_gs: (If observable), the amount the person donates during the 'Giving Season', as observed through the OftW/donational/Plaid network
don_general_1yr: (If observable), the amount the person donates during the 'Giving Season' and for the following year (ending 15 January 2023) as observed through the OftW/donational/Plaid network
d_continue_pledge_1yr: Whether the person is still an active OftW pledger a year after the current giving season (15 January 2023)

4) How many and which conditions will participants be assigned to?

Two conditions (treatments):

A. "Impact"

B. "Story/Emotion"

Assignment details

Participants (c 4000 people at various points in the One for the World pledge process) will be split into groups (blocks) by previous donation behavior or point in the process. (OftW have mentioned, pledgers still in school, active donors, and lapsed donors).

Within each group, they will be randomized (selection without replacement to ensure close-to-exact shares) into equal shares in treatments A and B.

Treatment specifics (i.e., 'experimental conditions')

A series of three emails will be sent, with participants remaining in the same treatment across all three emails.

See actual texts for design and timing HERE

Example content differences, from email 1:

A. Impact version:

As of 2021, One for the World has had a tremendous impact on the lives of those that are helped by our charity Top Picks programs:

[IMPACT SINCE 2021 GRAPHIC]

B. Story/Emotion version:

Here’s our first story this season from Eunice of Kenya. When asked how her life changed when she received the first cash transfer from our partner organization, GiveDirectly, she responded”

“I have been able to make new goals and achieve them since I started receiving this money [from GiveDirectly]. I have been able to buy a piece of land that would have taken [me] many years to earn [enough to buy the land]. I was also able to buy livestock, like goats. I have even managed to dress my family properly by buying them decent clothing. Lastly, I have even been able to [pay my children’s] school fees without any strain.” (Source GiveDirectlyLive)

[PICTURE OF EUNICE]

5) Specify exactly which analyses you will conduct to examine the main question/hypothesis.

We will report all of the following analyses, with our preferred method in bold:

Binary outcomes:

Fisher's exact test
Bayesian Test of Difference in Proportions (as in here), with an informative beta distribution for the prior over the incidence rate in each treatment, with a parameter based on the incidence rates for similar campaigns in the prior 2 years.

Continuous outcomes:

Standard rank-sum tests (Mann–Whitney U test)
Simulation/permutation based tests for whether the mean (including 0's) is higher in group A or B (including 0's)
... same for median, but medians will almost always be 0, we anticipate
T-test with unequal variance

All tests will be 2-sided.

We will also report Bayesian credible intervals and other Bayesian measures for the proportion tests. We may also explore Bayesian approaches for the continuous outcomes, e.g., Bayesian beta regression.

We also anticipate reporting multiple-hypothesis-test corrections, but we are not pre-registering a method. Our approach to this is likely to follow that of List et al (2017), which this paper applied to a similar domain (charitable giving experiments with multiple donation-related outcomes).

We will report confidence intervals on our results as well as Bayesian credible intervals under flat and weakly informative priors. Where we have a 'near-zero' result, we will try to put reasonable bounds on it to convey the extent of our certainty that the true effect or parameter was fairly small.

Where situations arise that have not been anticipated in our preregistration and pre-analysis plan, we will try to follow the Don Green lab standard operating procedures unless there is a very strong reason to deviate from this, which we will specify.

6) Describe exactly how outliers will be defined and handled, and your precise rule(s) for excluding observations.

Included: All individuals who received this mailing.

We will not exclude any observations from the sample, unless they make it clear to us that they are aware of this trial.

We will not Windsorise or exclude outliers.

7) How many observations will be collected or what will determine sample size?

A series of three campaign emails will be sent out by OftW to their regular email lists, to roughly 4000 participants, as described above

Targeted dates: November 10, November 18, November 23, all in 2021, but these may be delayed for feasibility

Other

Anything else you would like to pre-register? (e.g., secondary analyses, variables collected for exploratory purposes, unusual analyses planned?)

Exploratory and secondary hypotheses/questions/analyses

Secondary hypotheses and questions

Which treatment motivates a higher rate of...

Email open rates (note, as we have three obs per participant, we will need random effects or clustered standard errors). and
Use click rates (with same caveat)?

We consider these as secondary because the click and open rates do not necessarily strongly relate to outcomes of interest, particular among this set of already effectiveness-minded donors. These outcomes may simply reflect attention or curiosity about the content.

Exploratory: what factors (especially gender, university/student status, university subject) predict which treatment leads to greater donation (incidence and amount)

Note that our partner is planning to use this trial to inform future trials and experiments, particular for the 'Giving Tuesday' season itself.

Power calculations

We did not have time to do even simple power calculations before the start date of this experiment. However, we will try to conduct these before we obtain any of the data, and update this preregistration.

The Life You Can Save (TLYCS)

Leads: Bilal Siddiqi, Neela Saldhana; Other partner contact: Jon Behar (Giving Games)

We have completed various trials in conjunction with The Life You Can Save, the most recent being theAdvisor signup (Portland) city-level YouTube test. There are a number of additional proposed trials and tests, however, at the moment these considerations are limited to the private Gitbook.

Note that in the past TLYCS has worked with the Graduate Policy Workshop School of Public and International Affairs at Princeton University, who produced the 'Behavioral Insights to End Global Poverty' report embedded below.

Quick takeaways from the "Princeton" report

(From 'summary'... the report authors' takes are given except where italicized)

5 key principles: choice architecture, social norms, empathy, overhead cost aversion, and anchoring

Factual:

TLYCS demographics are predictable (White, Male, tech...)
Donations cluster at the end of the tax year

Pages and promotion

Social media channel is promising
The "Best charities" page underperforms: there is a high bounce rate
- DR: Maybe because Givewell etc do better at this?
"Visual presentation of charities does have an effect
- DR: Not clear how this is causally identified
Ran "social media tests' they claim are underpowered.
- DR: but these could be analyzed with Bayesian methods for actionable insights
They suggest simplified presentation/navigation, and a 'decision tree quiz' to reduce cognitive load

Advisor signup (Portland)

TLYCS ran a campaign in a single city involving 'donation advice'

General idea and main questions

Specific goal of TLYCS promotion: To get people to click on the ad and go to the 'landing page' of TLYCS. Here, they will fill out to request an appointment with a donation advisor. We will simultaneously be raising awareness for TLYCS.

General questions:

Can we get people to sign up for donation advice using videos in YouTube Ads?
- How many sign-up and what sorts of people?
Do these ads boost engagement with TLYCS in net? (E.g. donations, website activity, book downloads)
- "Lift test" on Portland market (analyze with difference-in-difference relative to other markets)
Which ads are best at this? (These ads differ in substance as well as in style)

Background and context

Participant universe and sample size

Location: Portland, OR

Audience: Top 10% of household income

People living in Portland, Oregon in the top 10% of household income (approximated by Google) will get an in-stream ad (ad plays before video user intended to watch)

Key treatment(s)

Exposure to a sequence of nine versions of YouTube ad videos. Frequency cap: 6/weeks

Three main 'theme/header' variations (similar, slightly different phrasings)

these variations were crossed with...

Three categories of videos within each theme:

"Bravery": Charlie Bresler explains how 'you can save lives without being brave' with small amounts of money for bednets, nutrient micro-doses, etc.
$10: Man giving out money to poverty-stricken people in Capetown. Text narrative overlaid describes that $5 can buy a slice of pizza, or an interocular lens to treat cataracts, etc. Leans towards 'identified victims/recipients'.
"I want to do good": Colorful puppets sing about giving and donating to save lives. Counters common arguments about 'breeding dependency', fear of administrative waste, etc.

These are organized and linked .

Note/limitation: Unfortunately, we were not able to track 'which video got more clicks'.

Each video comes with a site-link extension with a Call to Action:

Treatment assignment procedure

We assigned the particular video treatments to audiences using a YouTube/Google optimization algorithm. This chose videos to maximize the probability that a user chose 'Speak to an Advisor' and filled out the linked form.

Outcome data

How long people watched the videos for
Whether they 'clicked through'
Whether they filled out the form for advising (Algorithm is serving to optimize this)

Results (simple analysis)

Note: we present some more in-depth analyses and graphs in the Quarto , along with a code and data pipeline

Cost per user (first-pass)

A first pass and upper bound on impact and (lower bound on) cost/session

Assumptions/data interpretations

The numbers used in our data come from meaningful sessions from unique users
The 'date range' is the relevant one for being affected by the advertisements of interest
The 'comparison cities' are approximately randomly selected

Most optimistic (unrealistic) bound

Guiding assumption: a counterfactual 0 visits from Portland in this season

306 Portland Users (389 Portland site visits) in relevant 2021 period.
If these were all driven by the advertisement (and counterfactual was 0 visits), this is +306 Users and +389 visits
Cost $4k
-->Lower bound on cost of $13.07 per user ($10.28 per visit)

Year-on-Year (maybe reasonable) optimistic bound

Guiding assumption: a counterfactual 'sam as last year' in Portland

306 Portland Users (389 Portland site visits) in relevant 2021 period.
144 Portland Users (189 Portland site visits) in relevant 2020 period.
--> 306 - 144 =162 users uptick,
- (389 - 189 = 200 visits uptick)
--> $4k/162 = $24.69 Lower bound on cost per user
- ($4k/200 = $20 per visit)

Difference in Differences comparison to other cities

Guiding assumptions:

The cities used are fairly representative
'Uptick as a percentage' is unrelated to city size/visits last year
All the cities in the comparison group are 'informative to the counterfactual' in proportion to their total number of sessions

This yields

112.5% visits uptick (Year on Year) for Portland in 2020

For all North American cities other than Portland (with greater than 250 000 people):

The average is 46.5 users in the 2020 period and 64.5 users in the 2021 period, an uptick of about 38.8%. This is very similar to the result if we look at all cities which has an uptick of 43.1%.

38.8% uptick multiplied by 144 users = 55.9 (‘counterfactual uptick’ in users for Portland)
162 - 55.9 = 106 (uptick relative to counterfactual)
USD 4000 /106 = 37.7 USD cost per additional user through this ad

Note this is a midpoint estimate, we have not yet given statistical bounds.

In the graph below (pasted from the Quarto ), we show these year-on-year upticks in context.

Other outcomes

There were very few signups for the concierge advising service. Only about 16 in December 2021 globally, only 1 of which was from Portland.

Notes

Other detailed notes are in our private Gitbook. More formal and detailed analysis could be done if it seems merited.

Fundraisers & impact info.

Reinstein and others work with charity partners, some of which are not EA-aligned (but perhaps moderately effective), which inform EA giving. Several trials focus on the 'impact of impact information'

https://app.gitbook.com/u/WrM9GjKWCyRyoIjCKt7f0ddJwCr1's research (along with others) considers 'how do potential donors respond to (different presentations of) impact information'. Reinstein and his academic partners ran several experiments, working with (and on) mainstream charities and fundraising platforms.

See work:

and discussion:

Other work is ongoing and cannot be publicly shared yet (see private gitbook if you have access).

ICRC - quick overview

See public 'open science' work in progress and preliminary results HERE

https://daaronr.github.io/dualprocess/icrc-donation-suggestion-and-cost-info-trial-project-description-summary-timing-background.html

ICRC: quick overview

April 2021 mailing addressed to ICRC's existing donor base of:

active (regular donors),
warm (last donation between 12+ and 24 months ago) and
sleepy (last donation 24+month ago) donors.
169,919 donors (active donors (58,330; 34,14%); warm donors (48,672; 28.49%); sleepy donors (62,758; 36.73%))
Mailing goes out to donors in Switzerland (all parts: German, French and Italian)

ENGLISH

DONATE TODAY: your donation can supply food parcels to a Syrian family

DONATE TODAY: your donation can supply food parcels (ca. 17CHF/parcel for one month) to a Syrian family

DONATE 50CHF TODAY: your donation can supply 3 food parcels (ca. 17CHF/parcel for one month) to a Syrian family

DONATE 150CHF TODAY: your donation can supply 9 food parcels (ca. 17CHF/parcel for one month) to a Syrian family

DONATE 50CHF TODAY: your donation can supply food parcels to a Syrian family

DONATE 150CHF TODAY: your donation can supply food parcels to a Syrian family

DONATE TODAY: - With 50CHF you offer 4 Hygiene kits to Syrian families - With 100CHF you offer 14 school kits to Syrian students - With 150CHF you offer 9 Food parcels to Syrian families

CRS/DV: overview

Catholic Relief Services/DonorVoice experiment

See public 'open science' work in progress and preliminary results HERE

'Thanksgiving email' trial run in 2 subsequent years
Super-overoptimistic information (2018), Moderately overoptimistic information (2019)

Posts and writings

EA Forum posts

See the Google doc embedded below and feel free to add comments or questions.

University/city groups

EA seeks to amplify its impact through movement-building. Organizations like 80,000 Hours and CEA are into developing and expanding the EA community. Building EA groups has been at the core of this agenda, especially in elite and influential places (such as top universities). Key aims include 'creating highly engaged EAs' and encouraging people to pursue impactful careers.

(in-progress)

Our collaborative goals

Currently, university EA groups operate in conjunction with the Centre for Effective Altruism, but with high levels of autonomy. There is only limited collaboration between groups. Such collaboration could allow them to achieve economies of scale and scope, run more systematic and powerful trials, and share insights and methods that increase student engagement.

The EAMT hopes to help coordinate this, consolidate the evidence, and provide accessible tools to newly-formed groups. We want to help avoid repeating errors and 'reinventing the wheel' each time.

EA groups at universities

The efforts and experience of individual EA groups can provide contextual evidence and insights. The EAMT aims to aggregate this knowledge, find generalizable principles, and disseminate this to the wider EA community. We are focused on meaningful medium-term outcomes, e.g.:

Membership and participation in EA organizations, and markers of post-university involvement
How career plans are impacted (focusing on particular programs and paths)
How research and discourse at universities can be influenced

Relevant organizations and programs

The programs below also aim for generalizable principles; e.g., their 'starter toolkits' are implemented across a range of cities, universities, and settings.

Centre for Effective Altruism

CEA has , passing funding and efforts on to Open Philanthropy. However, CEA is still involved in promotion through the (UGAP), which offers guidance and resources to newly formed groups. Furthermore, CEA's (CBG) Program helps develop national and city-based groups (outside of universities).

University Group Accelerator Program

may be the These have been summarized from different data points; some formal testing, some anecdotal, and some intuitive.

Community Building Grants Program

CBG focuses on supporting city groups, providing grants to support their activities and resources to help with expansion. These resources and support systems currently lack data supporting EA community building. (The identified this as a major bottleneck; we hope to collaborate to help them improve this.)

EA Group Organizers Survey

The is a collaboration between CEA and Rethink Priorities. It analyzes the changes in EA groups yearly, with two main components:

The growth and composition of EA groups and their activities
The opinions of the group's status from the organizer's point of view

The first component gives insight into priorities and progress. The second can help guide our research and provide insight into the tools required by group organizers to increase group interaction and outreach.

See especially:

Open Philanthropy

provides funding for part-time and full-time organizers helping with student groups focused on effective altruism, longtermism, rationality, or other relevant topics at any university (not just focus universities). This has replaced CEA's Campus Specialist and Campus Specialist Internship programs.

, a selective 2-year program that gives resources and support (including $100K+/year in funding) to particularly promising people early in their careers who want to work in areas that could improve the long-term future. (Intended partially for particularly strong Campus Specialist applicants.)

80,000 Hours

80,000 Hours is actively targeting university students and offering them guidance on high-impact career paths. (see private Gitbook, if you have access)

Further outreach

There are some further initiatives in this area but most of the material cannot be shared at the moment (see private Gitbook).

Independent group testing and coordination

In this section, we are putting together documents, trials, and knowledge currently being gathered by different EA groups. As we increase our collaboration with these groups, these trials, ideas and documents will become integrated with the Gitbook and EAMT's work, forming a basis for future work and testing.

Funnel Map

This is our basic understanding of the processes used to draw in new members to EA university groups and fellowships, and how members progress through different stages of engagement. Each stage gives us grounds for testing through the different variations of these approaches. This is not just about testing which methods work for attracting the highest number of new members (i.e., which 'call to action' to use at activity fairs, etc), but also increasing engagement and developing high-level EAs (i.e., fellowship program alternatives, discussion group topics, etc).

(Above: a preview of funnel map; for full description and work in progress)

Stanford University

Awaiting response from Stanford EA.

University of Chicago

Currently limited to private Gitbook.

MIT, MIT Alignment org

Currently limited to private Gitbook.

EA Israel

This discusses their strategy in-depth. A lot of their findings are not specific to Israel or country-wide EA groups. Useful as a resource for EA groups.

Useful findings will be synthesised and integrated here in the future.

Independent University Group Outreach

We have been independently contacting organizers that are known to be actively seeking to test outreach methods, and also publicly via a on the EA Forum. An important aspect of the work here is to bring together people who are active in this space but working independently. The airtable below presents our current (non-exhaustive) list of groups or organisations that have relevant knowledge (strategy documents, marketing guides, etc), or have done some form of independent testing.

Writing credits for this page

Kynan Behan helped create and write this page.

Thus, we hope our efforts will be valuable to these initiatives and groups, by providing and sharing evidence on successful approaches to increasing engagement.
Note that the survey does not collect data from the group's *members*, although they do ask about the overall numbers of people who engaged with each group.

Workplaces/orgs

What's been/being done, what do we know?

"Innovations in fundraising" earlier work

and … was the knowledge sharing I tried to get going (as you can see, not very much was shared)… as an academic, with very limited funds. I also worked a bit with George Howlett at CEA on his ‘Workplace Activism’ project.For this part of the project I was focusing on

How to get your organization to support effective charities (or at least, not limit their generosity to local causes)?
E.g., that offered giving incentives, and whether these were ’EA-promising”
How to get a fundraising event or giving game going within your org.

Consideration: Key obstacles/questions for workplace action

"Format questions:"

Which audiences
Who (internal/external)
What formats? Fundraiser and a talk or just one?

Workshop, giving game, etc?

Workshop: a guided discussion ... why do you give etc. "I want to help more" etc. Workshop/worksheet. Philanthropic goals The 'five whys'? Keep asking why and they sometimes get down to base suffering. 'Guiding but not leading'.

Next to consider: "Opportunities" (to 'do', measure, learn) ... we should make an inventory here

Discussion space

In case you don't like writing in this Gitbook, I created

Other partners

"Charity Elections" (in schools): trials in preparation, extensive consultation
80000 hours: trials in progress, preparation, and analysis; some work joint with Rethink Priorities. Note: we have limited permission to report on these trials
The Life You Can Save: Trials run and in preparation, limited permission to share
High-Impact Professionals (HIP): Advising on surveys and approaches
GiveWell (discussions and consultation)
... And other organizations that didn't want us to report on this publicly

Related/relevant projects/orgs

Note 7 Mar 2023: I just started this page, it is far from complete

Organizations and 'do' initiatives

: Including (new) EA-aligned marketing groups

Which links THIS spreadsheet

Much of which is embedded into THIS Airtable view as well (which will have some further comments on the relevance, as well as organizations that are not-so-EA related, with discussion)

(a list of orgs in the 'EA effective giving' space; private gitbook atm)

Non or semi-EA initiatives

EA or 'effective giving' orgs working with foundations and wealthy donors

Research and information-gathering initiatives

innovationsinfundraising.org - "The IiF wiki collects and presents evidence on the most successful approaches to motivating effective and impactful charitable giving, and promotes innovative research and its application." This precedes and is partially integrated into the current resource

Rhetorical.org

We are an academic collective and research non-profit, dedicated to providing public communication campaigns with cutting-edge research and rigorous tools for message development.
"Crowdsourcing" ... Recent research suggests that regular people can often be far more effective than experts at predicting which messages will best resonate with others in their community.

On Adaptive design/sampling, reinforcement learning...

The challenge is that the “space” of messages for campaigns to decide between is enormous — there are very many things a campaign could say and many different ways to say them. Unfortunately, research shows that relying on theory and expert guidance about “what works” when designing campaign messages is unlikely to be effective by itself, because “what works” is difficult to predict and can change dramatically across contexts (e.g., see [1], [2], [3], [4]).
-->
Efficient message search. We design research pipelines that allow campaigns to explore the large space of potential messages more efficiently, and to quickly zero-in on the most impactful messaging strategies. Our methodology is based on a combination of large-scale adaptive online survey RCTs, Bayesian machine learning and surrogate metrics.

Marketing & testing: opportunities, tools, tips

Testing Contexts: Overview

Testing 'implementation strategies'

Contexts to test outreach messages

Contexts allowing individual randomization & tracking of medium-term outcomes

GWWC web site at point of email signup
Email lists
- immediate: subject headers w/ 'open rates' as dependent variable
- medium-term: all outcomes tied to email

Contexts for 'Immediate outcomes' (clicks etc)

Facebook; But the targeting algorithm may frustrate randomization. (see .) Can it be switched off?

Contexts allowing

See

This is helpful if the important outcomes can be tracked by ZIP code/post code/address.

Online display advertising
Google search
YouTube
LinkedIn
Facebook (presumably)

Testing Rich Content

How to test rich content?

We can use some of the same strategies as above to test "rich content", i.e., short or even long talks, book chapters, podcasts, and so forth.

However, we may also want richer more detailed 'qualitative' feedback...

Paid participants may allow richer feedback (see )

Emails might be an opportunity
Surveys with professional participants
Surveys with undergraduates

Here generalizability may be a challenge, particularly extending inference from convenience samples to larger and more general populations. "Might be good to think of creative ways of doing that though, e.g., looking at which content creates the most extreme enthusiasm."

What to test in 'rich content'

Does the messenger matter?

Does the messenger demographics and appearance matter?
Does it depend on the audience?
What’s the optimal length?

Message customization (heterogeneity and targeted marketing)

We haven’t thought about this much but it seems important – it might be worth, for example, having different messaging for different cause areas and letting them be algorithmically targeted.

Imagery/non-content considerations

How many images to include on a page?

How much text to include in a page?
How many buttons?
How many choice options?

The 'mysterious sauce' ... JS knows about ()... we don't always have a "theory" but it might be meaningful.

Targeting

See also

Question: If our aim is to change the culture of giving in general, what kind of people should we be targeting?

Influencers (People with lots of social influence)
Low-hanging fruit (i.e., people who are naturally predisposed towards effective giving, pledging, & EA)

Idea: Compare different outreach methods on the basis of "cost per pledge" (or per "whatever-metric-we-use"). (Outcomes: ... & ... )

Ideas/methods for targeting: platforms and audiences

Some audiences and approaches to targeting

Public lists of political donations (e.g, )
- ... donors to candidates sympathetic to a relevant cause area

Internet activity ... those who watch/read/search for:
- Videos relevant to a cause area
- Reddit threads relevant to a cause area
- Magazines/news sites relevant to a cause area
- Search/visiting webpages about charity effectiveness/merit (e.g., Charity Navigator) 👍

Education
- Courses/degrees/majors relevant to a cause area
  - (e.g., development econ/studies, animal behavior, AI)
- People at high-status institutions (future influencers/policymakers)

Exploiting social network structure
- Targeting "influencers" and "central" people (on the basis of "number of followers" / friends / etc.)

Key search terms (google 'effective giving' etc)

Podcast listeners (philanthropy, economics, development & global health ...)

Implementing ads, messages, designs

Practical issues with running experiments and trials and gathering data on these and making reliable inferences -- see subsections below

Doing and funding ads

Video ads/Best-practice guidelines

Targeted ad on FB, with variations: setup (next subsection: Collecting outcome data)

Other marketing/implementation resources

User Friendly: Design and implementation of marketing; EA aligned and savvy.
- See their EA Forum post
Opinion: Digital marketing is under-utilized in EA JS Winchell offers advice on how to do this right and leverage certain grants and do good marketing.
JS Winchell has started an agency called "Good Impressions"; they are working to implement and measure the impact of EA marketing for the highest-value clients.
Altruistic Agency provides free tech support and development to organizations in the effective altruism (EA) community.

Doing and funding ads

Guidelines and resources on how to get ads and marketing going, how to finance it, tips on how to do it right

Google ads grants

JS Winchell presentation:

Tomasik (2014) post

General grants -- see "grants" section

Video ads/Best-practice guidelines

Opinion: Digital marketing is under-utilized in EA:

Videos - Facebook

Youtube seminars

Email from JS:

The YouTube team holds quarterly workshops to explain how best to build and use your organic (not paid) YouTube channel. Based on previous discussions it sounds like this is something that might be of interest to your orgs.

Note: This is aimed at beauty and fashion brands but I'd imagine 80% of it would apply to GWWC/80k/1ftw

Agenda for the workshop:

Explain why YouTube is crucial for your brand identity
How to claim your narrative on this platform
Reach and engage new and existing audiences through content
A review of channel best practices
Enhance your channel's search and discovery potential
Develop an always on strategy

Facebook

Cost of ads: benchmarks

Reinstein, FB ads tied to fundraisers.

Note: this information is subject to change; updated ~ Apr. 2022

My costs have been:\
about $0.01 per impression
about $0.50 - $1.20 per click

Targeting at Universities ... Facebook's estimates

The estimated cost per impression (?‘reach’) and per click varies with the targeted audience. In general, narrower is estimated to be more costly. I think this is about ‘a larger audience allows FB to serve the ads to a larger number of people who tend to be click-happy’ Some data points:

For Oxford, ‘In College’, living in the UK: They claim we will get 4-18 clicks per day for $50 per day over 2 days (29 Mar 2022 check on FB ads manager)
If I put in Birmingham instead I get a fairly similar figure.
If I remove the only-one-university narrowing, it gets cheaper. They claim I’ll be able to get 86-250 clicks per day for the same cost …

Meta allies

Research advancement manager: Michael Zoorob

EA groups (employees) within Meta

If you run a lot of ads FB will assign you external consultant helpers. They are somewhat helpful, but they don't seem to know everything.

FB tips for charities/fundraisers

28 Nov 2022, Zoroob:

Meta has just released a recorded series of videos (Meta’s Nonprofit Advertising Education Series) to help non-profit organizations meet their year-end fundraising goals. Some of these materials may also be helpful for researchers using Meta ads (e.g., materials on designing effective ad creatives), so I am passing the info along. Blurb below.

The three-part series of virtual webinars provides nonprofits with advertising training and best practices around how to use Meta technologies to further their missions:
Advertising Basics: Get started with Meta advertising with our Nonprofit Advertising page today.
The session also features On-Facebook Donation Ads that enable donation transactions within the Facebook app.
Creative Best Practices: Learn what great nonprofit creative can look like with best practices from Meta.
Consider saving our Creative Considerations guide to learn more about the five key creative considerations that apply to cause-driven campaigns.
Measurement: Introduce yourself to measurement best practices on Facebook and Instagram! Afterwards, explore split testing, lift measurement, and the experiments tool, on our Advertising Measurement hub.
Visit our website to view resources to help your organization drive more positive change, and do more good.
These and other videos for non-profit advertisers can be found on the "On-Demand Video Library."

Targeted ad on FB, with variations: setup

Below, we give one example from a relevant context, illustrating (with screenshots) what choices you might make, what it would look like, and how to implement it.

Updates/general advice: (Sep 2022) To do 'any good tracking and optimization through 'Facebook, you should set up the Meta Pixel and Conversion API as soon as possible.

You may want to jump to the Optimizing and pixels (WIP) section.

Getting started

"Meta Business Suite"(https://business.facebook.com/) is the starting point of your ad campaign. If you have a Facebook Business account, you should have a "Meta Business Suite":

Next, click on "Ads manager" (See the megaphone on the left).

Link a page?

You have to link a "Facebook Page" or "Instagram Account" to your ad campaign to have a visible front ground of your business that users could connect with the ad. You can create a new page or manage access to an existing page or Instagram account:

The next step is to select "Create a campaign" and choose an "objective"... the interface gives you some idea of what these aim for:

Budget optimization

When creating a new "Traffic campaign" ('cold traffic campaign' referenced HERE) there are a lot of options to help you optimize your delivery while minimizing your expenses.

You need to opt-in to these tools by ticking "create A/B test" and "Budget Optimization" on the first page of your "ad campaign manager." Since there is no downside (we would like to learn which ad design works best), we decide to opt-in to each of these.

Budget optimization is closely related to the choice of the target group. In general, the larger the target group, the cheaper it becomes to reach a certain amount of "link clicks".

Targeting the ad

Suppose we wish to create a targeted ad for a particular Facebook audience. For example, we might wish to put an ad...

in the 'feed' of US Americans who are interested in charity or volunteering or philosophy
giving them a link to a page encouraging them to learn about EA

Targeting example

Here, I chose "Get more website Visitors". ... Then "Edit Audience". Below, I chose people in the US over age 18 who are interested in any of a set of things related to charity, volunteering, or philosophy. This is a very broad audience, with about 80 million potential people

Facebook estimates that spending $5 per day over 5 days will lead 358-1000 people seeing the ad and 72-208 clicks. That implies a cost of between 12 cents and 34 cents per click

We can use the "schedule and duration" function not only to automate the timing of our campaign, but also to estimate its cost. For example, we assume that we need 800 participants to click-through to start the 20 fundraisers (i.e., a rate of 2.5%).

Below, we see that FB estimates 172-497 link clicks per day for 10 Euros per day for (a different_ case.

Benchmarking these numbers

These numbers seem over-optimistic in general, we've seen figures of $1-2 per click elsewhere. Some potentially reliable figures below (sources "Wordstream" and re-reporting of Wordstream here)

From a recent relevant experience in our group's context...

The last campaign based on clicks I ran got 461 clicks for $244 USD over 2 weeks with 113k impressions. [i.e., $0.50 per click]

Note that (maybe obviously) 'clicking on a Facebook ad' is a rare thing for people to do. In the quote above, thats about 4 clicks per 1000 impressions.

Narrower targeting in the 'ads manager'

It seems you can target more carefully in the "Ads Manager".

If you don't have an existing contact list or comparison group, you may prefer to simply specify characteristics. That is "Create a Saved Audience". For example, you can specify age groups and then 'detailed targeting' categories, including, e.g., Schools (including universities):

More detailed targeting

Write a captionYou can specify

Demographics
Interests
Behaviors

"Include" seems to be the default when specifying these ... it 'expands the audience'. You can click 'narrow further' to constrain the audience.Don't forget to use the search tool within 'browse' to find ways to do careful targeting Exit with⌘↩

Create a saved audience

You can specify

Demographics
Interests
Behaviors

"Include" seems to be the default when specifying these ... it 'expands the audience'. You can click 'narrow further' to constrain the audience.

Don't forget to use the search tool within 'browse' to find ways to do careful targeting

During this process, you can see a concise statement of your choices, and the estimated audience size further up on the page:

How should we (EA, effective giving) target ads?

We have some evidence that narrower targeting helps. An obvious candidate is

Traffic choice

The next big choice is 'where do you want to drive traffic?'. You'll enter more details about the destination later.

Since we want people to click our web app, we chose "website".

Version testing

We may have several versions of the ad we want to try out, and we want Facebook to iterate towards the one that is more successful using their algorithm. Ideally, we would like to learn as much as we can about 'which ads perform better on which audiences'.

We can set up Facebook's ("meta") algorithm to dynamically optimize 'over which will get the most clicks.'

"Dynamic creative"

"Dynamic Creative" is an option to enhance this process. It takes multiple media (images, videos) and multiple ad components (such as images, videos, text and calls-to-action) and then mixes and matches them in new ways to improve your ad performance.

"Dynamic creative" can be either switched on or off. (Given that we want to optimize over several versions, I see no downside to this feature. Thus, we switch it on.)

Where do we actually specify, enter, and style our ad content?

Finally, we have to decide which delivery we want to optimize.

We may want the ad that gets the most "~~conversions~~ traffic to our page". Therefore, we choose the option "link clicks".

However, we might instead want FB to optimize the ad presentation in terms of which ad not just leads to the most 'clickthroughs' but leads to the most "conversions" or some other action taken on our page To do that we need to set up a "meta pixel". See Optimizing and pixels

Cost and cost controls

DR: In my past experience, you ended up paying Facebook based on the number of "clicks" you got not simply on how long your ad was up. But it's probably a combination of these, and there are probably different pricing plans. You can tell Facebook to put a limit on either of these do not go "over budget". Facebook will aim to spend your entire budget and get the most link clicks using the lowest cost bid strategy.

Currently EUR 315 is the max for new users ... but for our present pilot we may want less than this (check: how much do we expect to pay for 800 clicks, let's split this up into ... first 100 clicks, next 300 clicks,.. to see if its going OK )

Designing your ad

Finally, you enter the third and last page of the ad creation process. Here you have to verify your ID and Facebook page and choose the actual design of your ad versions. ["of which the most important one is whether you want to have a video or single image." (?) ]

The last step before publication is to specify the destination for your campaign.

We chose a website and simply copy the URL into the mask to make sure the ad is linking people to the right destination.

Payment (and monitoring)

Optimizing and pixels

Setting up the pixel

The pixel includes content from Facebook that needs to be integrated into your website/page of interest. (To do: link instructions for this).

Adding pixel 'events' to your web page

One simple way of doing this: "Events setup tool"

Once you are in the ads manager for an ad, go to the 'Events Manager':

"Add events", choose "from the pixel"

"Events setup tool"

Put the URL for your site in and 'Open website'

As seen below, this opens our page, and show what things have already been associated with a Pixel. Here the "create fundraiser" button on this page has been associated with a button on this page with the "Initiate Checkout". (We use default names Facebook is familiar with, even though there is no 'checkout' in this case).

("Facebook Pixel Helper" extension in Chromium might be helping here, but I'm not sure how).

"Track new button" lets you see what click options you could associate with a pixel.This highlights clickable things you can do this with. ('Create fundraiser' is not highlighted, probably because it's already been assigned).

For example, I could click 'who are we' on a page and associate it with 'view content'

I could 'add a value' to this, if it makes sense.

Can I use this later to have FB optimize for 'net value' of a user generated on the page? This might be a useful way to assign greater importance to certain things, even if they aren't actually monetized.

After this 'finish setup' ... it gives you the chance to see what you have asked it to do and confirm or cancel it.

Using the pixel events for Facebook ad optimization

Once you have nice pixels set up, you can use this in helping Facebook decide which versions of ads to serve, which audiences to serve them to, etc. You set up your ad, define an objective etc...

Define your goal as 'conversion', and define what 'conversion' corresponds to in terms of pixels:

Here we're choosing 'initiate checkout', which we defined as clicking on a 'create fundraiser' button on the first page of our site (early in the funnel)

The warning below might not matter as we

The warning below might not matter as we haven't had our page up for a while. But we have also been told elsewhere that before you can get the ad to optimize for conversions ... you first need to have the pixel set up and the ad running, optimizing for views. So this might still be a concern.

Facebook tracks people for a while. So in optimizing, you can change 'what time period of outcomes it attributes to which (version of the ad)':

I assume that the same 'conversions' target defined above is used in optimizing the 'dynamic creative' if you turn that on.

Collecting outcome data

For a trial to yield insight, we need to be able to track and measure meaningful outcomes, and connect these to the particular 'arm' of the trial the person saw ... (if they saw any arm at all)

Notes from conversations (need explanation)

In this section we discuss how to see the results of your promotions and trials, and how to access data sets of these results that you can analyze.

Facebook ads interface and Pivot tables
Google A/B, optimize interface and Google analytics interface

Aside notes on modes of testing and tracking

Tracking: See a page, track action afterwards
Putting ads on Youtube and testing click through
Geographic segmentation: Targeting zip codes and looking at activity from zip codes…

Other approaches include time-series and difference in difference in response to switching on or off the ad, trial, or page content. Nick: Branch and Amplitude/apply/segment ... to track someone throughout the whole funnel

Facebook ads interface

How to get data from trials of Facebook ad

Using Meta ads manager reporting suite

Extracting simple results

Go to "the reporting suite in Meta ads manager"

How to get to the 'reporting suite' in Meta ads manager view as above?

URL should look like: https://business.facebook.com/adsmanager/reporting/manage?act=ACCOUNTNUMBER&business_id=BUSINESSID

Go to https://business.facebook.com/adsmanager/
Click on the relevant account/campaign

2. Specify some filters:

This gets us the screen below

3. Specify the date range.

Here “Effective Giving Guide Lead Generation campaign … ran late November 2021 - January 2022" (Careful in specifying the dates; the interface is weird)

After specifying these dates, more information comes up in the basic columns:

5. Export simple results for Campaigns

Click 'Reports' ... upper right.

We can 'create a custom report', which saves this for later tweaking, or merely 'export table data'. I will do the latter for now:

csv or xls?

.csv and .xls formats are about equally good; R and other software can import either one. I'll choose csv because it's a tiny bit simpler... but in other contexts, xls might be useful for exporting multiple sheets.

Note: I chose CSV and do not include summary rows, to avoid confusion later.

Exploring alternative: direct input into R

See tools like the rfacebookstat package; docs here

Now I import this data into R (I usually use code but let's do it the interactive way for illustration)...

It seems that the option 'include summary row' was probably not wanted here, and that row with blank 'campaign name' could cause confusion.

It seems to have removed the "bid strategy" column, and added 'reporting starts' and ...'ends' from the filter. Otherwise, everything else seems the same as in the ad manager view, although some labels have changed.

Campaigns, ad sets, ads

What's the difference between these?

FB/Meta gives some explanation HERE, although it leaves some open questions.

You set the advertising objective at the campaign level. Here you decide the end goal for your ads, like driving more likes to your Page. At the ad set level, you define your targeting strategy by setting up parameters like targeting, budget and schedule. Finally, your ads are creative visuals, like pictures or videos, that drive the audience to what you are trying to promote.

Keep in mind that a campaign can include multiple ad sets, each with different targeting, scheduling and budgeting options selected.

Some things are still unclear: Can multiple 'ad sets' use the same 'ads'? (I think so) Why do we seem to see budget and schedule choices listed under 'campaign' in the ads manager?

We see three tabs

Campaigns
Ad sets for 1 campaign
Ads for 1 campaign

Campaigns

Here we have 7 campaigns, each with separate budgets, and start and end dates (although these mainly overlap).

It looks like some campaigns were set up for direct comparison or "A/B" perhaps, with the exact same budgets and end dates, and similar names:\

Ad sets

Here, there are 52 total 'ad sets' across all campaigns.

I'm going to export this as a csv too, in case it's useful.

Ads

There are also 52 "ads"; it seems in this case, one per ad set:

Ad sets with multiple ads?

In theory ad sets could contain multiple ads. I wonder when/whether/why it would be worth doing this. __ Luke: In the Giving Guides trial ... we used a smart ad format where you upload lots of creatives (images, videos, post text etc) and it tests them all as a single ad. That particular ad format has a 1:1 relationship with the ad set, and then you investigate the success by pulling other specific reports for the attributes (e.g. “Post Text” or “Image or Video”)

The information in the 'ads' table seems the same as in the 'ad sets table' ... other than a link to preview the ad content itself (which I don't seem to have access to atm).

Pivot tables

You may want to see or export crosstabs of one outcome, user feature, or design feature, by another. Sometimes you just want to see these quickly, but this might also be a way to extract the 'raw data' you wish to analyze elsewhere.

Start new pivot table

~~From within Ads Manager~~ From 'ads reporting' (3 Aug 2022 updated interface)

Click "Create Report" --> Pivot table

2. As before, make sure you've selected the right date range, and (redo) any relevant filters

Here I add a filter for 'campaign name ' contains 'general'. Because I'm specifically trying to pull down some information on 'which video people saw' in this group (which needs a special setting to access... as noted below)

3. "Customize pivot table" – "Breakdowns" ... the things you want this to disaggregate across (sums and averages within groups)

the 'campaigns', the 'ad names'
timing, demographics

Drill down to "Custom breakdowns", "Dynamic Creative Asset", to get it broken down by the text linked to the ads:

However, some breakdowns are not compatible with other breakdowns (maybe for privacy reasons?) For example, if I tick 'Gender' I cannot have it broken down by 'Image, video, and slideshow', at least in the present case ... (perhaps because it narrows down too few observations?)

4. "Customize pivot table" – "Metrics"

Select the things you want reported, and deselect things that are not interesting or irrelevant to this case (like 'website purchases') or numbers that can be easily computed on your own

Normally, I'd suggest leaving out the redundant 'Cost per Result' but it's probably good to have as at least one sanity and data check.

Other stuff like 'video play time' could sometimes be very relevant, but I'll leave it out for now

I added a few features I thought might be interesting or useful. Was anyone drawn in to pledge? When did each campaign start/end (doublecheck)? How many unique link clicks?

5. (Optional) Conditional formatting

This could also be helpful if you are using the Ads Manager tools in situ, but obviously this has no value for downloading.

6. Save report for later use, share

If you think the report is useful in-situ, you can also share a link

7. Export the data

As in Extracting simple results...

(or consider direct import into R using tools like the rfacebookstat package)

Google analytics interface

Add section: How to set up GA

Tracking the progress of a trial in GA

Some key 'flows and tips'

**'**Home'
'behavior', 'site content', 'all pages'
- remember to set date range!
Acquisition, all traffic, channels: here 'social' (probably) tells you who came from Facebook etc
- Acquisition, all traffic, Source/medium drills down into this

How delayed is the data you see... in various parts of GA?

suggests there may be substantial delay. But does this only apply to sites with a great deal of traffic?

Some open questions

How to 'exclude your own testing the page' from GA results?
How accurate are the timings presented?
How to get at unique users for the key stats
Date filters don't seem to work on home page graphs ... choosing custom dates doesn't change it

"Getting the data"

DR: I'm not sure how to get 'all the data', but I have been able to get data on, e.g.,

a set of outcomes,
over a set period of time, (a particular month and the same month in the previous year)
broken down by another feature (by city)

After logging in and selecting 'all domains'...

Select 'customization', 'custom reports', 'new custom report'

Then search and select your desired ‘metrics’ (outcomes) of interest. “Users” and “sessions” seem pretty important, for example.

Next you can break this down by another group such as “city”. You can put in 'filters' too, if you like, but so far I don't see how to filter on outcomes, only on the dimensions or groups.

I don't know an easy way to tell it to “get all the rows on this at once.” but if you scroll to the bottom you can set it to show the maximum of 5000 rows.

Next, scroll up to the top and select export. I chose to Export it as an Excel spreadsheet., as this imports nicely into R and other statistical/data programs.

We were able to do this in two goes, but for larger datasets this would be really annoying. I imagine there is some better way of doing this., maybe a way of using an API interface for Google Analytics to just pull all of this down.

A partial workaround fix is to do a ‘filter’ to discard rows you don’t need… click ‘advanced’ at the top and…

Google A/B, optimize interface

Understanding how this tool works to test different versions of pages. GWWC Pledge page trial as first context

Mapping the key non-obvious features of running and analyzing these A/B trials using the Google analytics/optimize system.

Reporting and considering this in the context of the GWWC Pledge page (options trial)

What is it? How to create a trial

Setup and requirements

Set 'variants' and give them weights (assign shares for each)

Page and audience targeting

Objectives: The things you are trying to measure and improve

"Activity": Trials created/started/changed/ended/archive

Viewing results and Google's analysis

Clicking on a particular 'experience' in the 'container'...

(if you have been granted read and analyze permission), will open the useful 'Optimize Report' (which Google explains here)

Optimize report: top

The overall start/end and 'sessions' are given first. What are "sessions"? The short answer: 'Sessions' are the number of 'continuously active' periods of an individual user. So individual users may have multiple sessions! (see Sessions vs. Usersbelow). Here, there have been 7992 such 'sessions' over 81 days.

I am not sure where we can learn 'how many users there were'.
("View full chart" can give you a day-by-day breakdown of the number of sessions.)

OR: Conversion rates section

The next section compares 'sessions' and 'conversions' by treatment, and does a Bayesian analysis. This seems the most useful part:

Relative conversion rates, analysis

Above, the 'Separate block' (SB) seems to be the best performing treatment. Google calculates a 2.69% conversion rate for this (here, presumably the rate of people checking 'any' of the follow-on boxes).

Considering the Analysis, Google Optimize "uses Bayesian inference to generate its reports... [and] chooses its priors to be quite uninformed." The exact priors are not specified (we should try to clarify this).

But if we take this seriously, we might say something like ...

if our initial priors gave each treatment an equal chance of having the highest conversion rate ('being best'), and assumed a [?beta] distributed conversion rate for each, centered at the overall mean conversion rate ...
then, ex-post, our posterior should be that the SB treatment has an 80% chance of being best, our 'Original' has a 17% chance of being the best ...

Google also gives confidence intervals for the conversion rates for each treatment, with boxplots and (95%) credible interval statistics:

The grey bar for the baseline is mirrored in all rows. The 95% CI for the 'improvement over the baseline' is given on the right. But this is a rather wide interval. More informatively, if we hover over the image, we are given more useful breakdowns:

Although this does not exactly tell us the 50% interval 'improvement over the baseline' (this would need a separate computation), we can approximately infer this.

But fortunately it is reported in data we can download; see below "Download (top right)".

From that data, we get:

Variant

2.5th Percentile Modeled Improvement

25th Percentile Modeled Improvement

Modeled Improvement

75th Percentile Modeled Improvement

97.5th Percentile Modeled Improvement

Original

Pledge Before Try Giving

-50%

-33%

-23%

-11%

18%

Separate Block For Other Pledges

-18%

20%

36%

76%

Our 'posterior' probability thus infers (assuming symmetry, I think) that we should put (considering odds ratios, not percentage points)

a 2.5% chance of SB having an 18% (or more lower rate of conversion than 'Original'
a 22.5% chance on SB being between 18% worse and 4% better
a 25% chance of being 4-20% better
a 25% chance of being 20-36% better
A 22.5% chance of being 36-76% better
A 2.5% chance of being more than 76% better

We can also combine intervals, to make statements like ...

a 50% chance of being 4-36% better
a 50% chance of being 20-76% better

We report on this further, for this particular case, under Basic results/outcomes

There is some repetition (can we 'mirror blocks'?)

Session balance

Above, even though the treatment has been assigned randomly (presumably a close-to-exact 1/3, 1/3, 1/3 split), the number of 'sessions' differs between the treatments ('variants').

Why? As far as I (DR) understand,

while each individual user (at least if they are on the same machine and browsing with cookies allowed) is given the same treatment variant each time...
the same users may 'end' a session (by leaving or being inactive for 30+ minutes), and return later, getting the same treatment but tallying another 'session'. This suggests that users in the "Separate Block" (SB) treatment are returning the most (but also see 'entrances' below).

Breakdown over time

The final section gives the day to day breakdown of the performance of each treatment, presumably, along with confidence intervals. This seems relevant for 'learning and improving while doing' but possibly less relevant for our overall comparison of the pages/treatments.

Download (top right)

The 'Analytics data' gives us sessions and conversions by day and by treatment.

(Where no session occurs in a day for a treatment, it is coded as blank).

Clicking on 'view in analytics'

... this gives some other information, mainly having to do with the user experience.

"Unique page views" represent "the number of sessions during which that page was viewed one or more times." ... Recall "sessions" are periods of continuous activity.

"Entrances" seem potentially very important. According to Google:

Sessions are incremented with the first hit of a session, whereas entrances are incremented with the first pageview hit of a session.

In the present context, this suggests that the 'Separate block' page is inspiring users to come back more often, and to spend more time on average.

Sessions vs. Users

As noted, essentially: 'Sessions' are the number of 'continuously active' periods of an individual user

Analytics measures both sessions and users in your account. Sessions represent the number of individual sessions initiated by all the users to your site. If a user is inactive on your site for 30 minutes or more, any future activity is attributed to a new session. Users that leave your site and return within 30 minutes are counted as part of the original session.
The initial session by a user during any given date range is considered to be an additional session and an additional user. Any future sessions from the same user during the selected time period are counted as additional sessions, but not as additional users.

Reconciling FB/GA reports

Facebook's Ad Manager and Google Analytics often report results that seem to have discrepancies. Below, one particular case, and possible explanations.

What is going on 'in our latest trial'?

Facebook: We have 50k+ unique impressions, and 1335 clicks
Google Analytics records only 455 page views, 403 users
- And only about 20 doing any sort of Engagement like scroll or click (if we read it correctly)

1. Where do the other 600 clicks end up? Ad blockers? Do they click the ad and shut down before the page comes up?

JS: main reasons [DR: slightly edited[
1. "Do they click the ad and shut down before page comes up?" Yup! Closing the page before the redirect fully loads. Facebook will be as generous as possible with their click reporting.
2. ... If a user clicks on the FB ad twice within 30 minutes, then Google Analytics would record that only as a single user and a single session.
3. If a user has JavaScript disabled or doesn’t accept cookies, then Google Analytics doesn’t track.

Leticia at Facebook: can be mistaken clicks, this is common.. need a pixel to fix this ..., can change it to 'landing page view'

2,. How is it possible that 455 people come to the page and only 20 (under 5%) of them actually even do anything on the page?

Survey/marketing platforms

Why use survey/experiment/marketing platforms?

To test content in more depth than an A/B trial permits
Better control over 'who is participating' and how much attention they are paying
Things more towards 'basic background research'
Closer to a 'representative sample'

Some participant recruitment platforms

: Created specifically for (academic research). Our impression is that this is among the highest quality panels, although there is some controversy over this.

CloudResearch: CR approved Mturk

CloudResearch: Prime Panels

Positly: https://www.positly.com/ Qualtrics (panel) Lucid

Dyndata

Trial reporting template

For each proposed/ongoing/past trial, we should report the following minimal details, with links (proposed template)

For each proposed/ongoing/past trial, let's try to report the following minimal details, with links (proposed template) If you don't have time and you have another clear presentation of most of this, please link or embed it.

"Concise reporting template"

Please keep your answers brief -- if you want to give more detail (which is not necessary) please link a later section or external page. _

Short version of this template (link copy-opens a new version for you to work in)

General idea, main 'hypothesis' (if there is one)

Firstly, what is this promotion trying to do (e.g., 'encourage signups for giving pledges')?

But more importantly, what are you trying to learn here... What might you have a better understanding of after the trial than you did before the trial?

E.g.,

Specifically:
Does the opportunity we offer to sign up for an 'accountability partner' increase or decrease the rate at which people DO XXX activity?
Does it lead to greater overall XXXlinked donations per visitor over the next 1 year interval?
Generally:
Does 'social accountability' help to encourage XXX activities and promises and the fulfillment of these? Does the 'fear of being held accountable' discourage people from making commitments?

(Optional: brief on background theory and previous evidence)

Point of contact (at organization running trial)

You can enter more than 1 person here, including an external organizer (like JS Winchell), but ideally, also someone inside the organization.

Add 'academic/research lead' here if there is one

Timing of trial (when will it/did it start and end, if known)

Digital location where project 'lives' (planning, material, data)

The present Gitbook/and a nested Github repo folder could be ideal. Please give a precise link so others could access it.

Environment/context for trial

(Is it on a web page, a google advertisement, a physical mailing, etc)

Participant universe and sample size

Who will be targeted or who do you expect to be part of the trial?
(Somewhat optional) How many people (or 'units') do you expect to be involved (median guess)?
(Optional): How many do you expect will have a 'positive outcome' (e.g., a 'conversion')?

Key treatment(s)

Description, link exact language/content if possible

Treatment assignment procedure

At what level is it varied? (individual visitors, postal codes, days of the week, etc)
How are treatments assigned ('blocked randomization', 'adaptive/Thomson sampling', etc.)
- If you are using a 'set Google, Facebook etc algorithm', just input the settings you used here, and/or link the (Google, FB, etc) explanation
How many/what shares are assigned to each treatment?

Outcome data

What measures (outcomes, other features) will be collected?
When and how
Where will the data be stored, who will have access

Optional/suggested additions

Planned analysis methods, preregistration link, IRB link, connection to other projects and promotions

Ex-post: Reporting results (brief)

Implementation and data collection

Did it go as planned? Any departures? (Timing, randomization, design changes, etc)
How much/what data was collected? How many observations?
Where is the data stored (also link/adjust the above), who has it, and under what conditions?

Basic results/outcomes

"Partners and stakeholders opinions": were they happy with the trial? Did they seem to think it was a success?
Simplest statement (e.g., "3% donated in the treatment versus 2.2% in the control, with an average amount raised of $4.3 in the treatment and $3.1 in the control')
Preliminary interpretation, with statistical test if possible (e.g., 'google Optimize states an 80% chance that the treatment outperformed the control', a Fisher's exact test yields a p=0.06 that a positive donations was more likely in the treatment than the control)
"Full analysis"
1. Who/what when will it be done?
2. Link to 'where' it will be done (both the 'follow up the pre-analysis plan, and the full write-up, if applicable)
3. Possibly: Briefly characterize the overall confusions/state of analysis here (state the date last updated)
Feeding synthesis and meta-analysis
1. Which generalizable questions does this inform?
2. Is data sharable? Key comparable outcomes?
3. What other work/trials does this relate to?
4. State of meta-analysis

Research Design, methodology

Methods: Overview, resources

Sections

"Qualitative" design issues: How to design the 'content' of experiments and surveys to have internal validity and external generalizability

Real-world assignment & inference: How to set up trials to have comparable groups

Adaptive design/sampling, reinforcement learning: Adjusting the treatments and design as you learn, to 'get to the highest value in the end'

Analysis: Statistical approaches: How to make inferences from the data after you have it (and plan this in advance)

What are our estimation goals?

Statistical power versus optimized learning

Fixed vs adaptive designs

See adaptive design notes

Resources

Rethink Priorities notes (some are works in progress)...

The https://declaredesign.org/ framework and R package seems very helpful. I (David Reinstein) am learning and trying to adapt it.

Dillon's 'Hemlock'

Reinstein 'research tools and data' airtable list

"Qualitative" design issues

Discussion of issues in designing experiments/studies that are not specifically 'quantitative', but are important for gaining clear and useful inference

Naturalness of setting versus 'pure' treatments

Academics usually try to make each treatment differ in precisely one dimension, these treatments are meant to represent the underlying model or construct as purely as possible. This can lead to setups that appear strange or artificial, which itself might bring responses it will not be representative or generalizable.

For example, in my '' (lab) work we had a trial that was (paraphrasing) 'we are asking you to commit to a donation that may or may not be collected. If the coin flips heads, we will collect the amount you commit, otherwise no donation is made'. It was meant to separate the component of the "give if you win effect" driven by the uncertain nature of the commitment rather than the uncertain nature of the income. However when we considered bringing this to field experiments, there was no way to do it without it making it obvious that this was an experiment or a very strange exercise.

When we consider an experiment providing 'real impact information' to potential donors, we might be encouraged to use the exact write-up from Givewell's page, for naturalness. However, this may not present the "lives per dollar" information in exactly the same way between two charities of interest, and the particular write-up may suggest certain "anchors" (e.g., whole numbers that people may want to contribute). Thus if we use the exact GW language we may not be 100% confident that the provision of the impact of information is driving any difference. We might be tempted to change it; but at a possible cost of naturalness and direct applicability.

There are very often tradeoffs of this sort.

Awareness of testing can affect results

In the present context, we have posted about our work, in general terms, on a public forum (). Thus the idea that ‘people are running experiments to promote effective giving and EA ideas’ is not a well-kept secret. If participants in our experiments and trials are aware of this it may affect their choices and responses to treatments. This general set of problem is referred to in various ways, referring to different aspects of this; see 'experimenter demand', 'desirability bias', 'arbitrary coherence/coherent arbitrariness', observer bias (?), etc.

Mitigating this, in our context, most of our experiments will be conducted in subtle ways (e.g., small but meaningful variations in EA-aligned home pages), and individuals will only see one of these (with variation by geography or by IP-linked cookies). Furthermore, we will conduct most of our experiments targeting non-EA-aligned audiences unlikely to read posts like this one. (People reading the EA forum post are probably ‘already converted’.)

Incentives and 'more meaningful responses'?

Other issues to consider

(To be fleshed out in more detail)

Universe (population) of interest, representativeness
Design study to measure 'cheap' behavior like 'clicks' (easier to observe, quicker feedback) versus meaningful and long-run behavior (like donations and pledges)
- attribution issues
- attrition issues (also see the quantitative sections)
Choice of impact measure/metric (also see the quantitative sections)

Real-world assignment & inference

Geographic blocks versus individuals How to block/stratify

See Geographic segmentation/blocked randomization for a mainly theoretical discussion of this

Facebook split-testing issues for how to do split testing on Facebook, and the limits to traditional design given their setup

Difference in difference/'Time-based methods'

Estimating Ad Effectiveness using Geo Experiments in a Time-Based Regression Framework Jouni Kerman, Peng Wang, and Jon Vaver Google, Inc. March 2017

Abstract .... While effective, this geo-based regression (GBR) approach is less applicable, or not applicable at all, for situations in which few geographic units are available for testing (e.g. smaller countries, or subregions of larger countries) These situations also include the so- called matched market tests, which may compare the behavior of users in a single control region with the behavior of users in a single test region. To fill this gap, we have developed an analogous time-based regression (TBR) approach for analyzing geo experiments. This methodology predicts the time series of the counterfactual market response, allowing for direct estimation of the cumulative causal effect at the end of the experiment. In this paper we describe this model and evaluate its performance using simulation.

Some specific notes/concerns

Geo experiments” where only a single geo is targeted for a treatment seem fairly common in practice. You ‘try something in a single market 1x only and see what it does’.\

This is probably reinventing the wheel some existing thing in Econometric (difference in difference, event studies?), but what?
I find it strange/suboptimal that they aggregate across the Geos in the control group, throwing important variation here … that might tell us something about how much things ‘typically vary by without treatments’. I wonder if there’s another approach that brings that variation back?
1. Maybe this is 'because this is an easy extract to get from Google Analytics'? How do we get it?
The package is 5 years old with no recent updates … ages in this world; is there something better to use instead

Facebook split-testing issues

Facebook trials: "divergent delivery" --> limited inference

The main point

Facebook serves each ad variation to the people it thinks are most likely to click on it.

Thus, in comparing one ad variation to another... you may learn:

"Which variation performs best on the 'best audience for that variation' (according to Facebook)"
But you don't learn "which variation performs better than others on any single comparable audience."

Update 4 Oct 2022: We may have found a partial solution to this, with ads targeting 'Reach' rather than optimizing for other measures like 'clicks'. We are discussing this further and will report back.

Researchers are interested in running trials using Facebook ads. However, inference can be difficult. Facebook doesn't give you full control of who sees what version of an advertisement.

With A/B split testing etc: They have their own algorithm, which presumably uses something like Thomson sampling to optimize for an outcome (clicks, or a targeted action on the linked site with a 'pixel'). Statistical inference is challenging with adaptive designs and reinforcement learning mechanisms. As the procedure is not transparent, it is even more difficult to make statistical inferences about how one treatment performed relative to another.
Segmentation and composition of population: Facebook's 'PageRank' algorithm determines who sees an ad. I don't think you can turn this off.
1. We haven't found a way to be able to set it to "show all versions of an ad to comparable populations"
2. (And even if you could, it would be difficult for you to specifically describe "which population" your results pertain to.)

Divergent delivery and "the A/B test deception"

Further notes

Orazi, D. C., & Johnston, A. C. (2020). Running field experiments using Facebook split test. Journal of Business Research, 118, 189-198.

"Haven’t heard of an update since. They do something to mitigate the effects of targeting different audiences with the different treatments, but it’s still not quite random assignment"

"Bottom line: good news, bad news. I'm confirming that you're right: The "latest best possible settings" are still not giving you results that reflect the random experiment that a researcher in consumer psychology or advertising would be expecting. But the problems are worse than they may have seemed to you initially."

Notes on Facebook “Lift tests/Lift Studies” with ’Multiple Test Groups”

Do Facebook “Lift tests/Lift Studies” with ’Multiple Test Groups” give us the freedom we want to …

Randomize/balance different ad content ‘treatments’ to comparable groups?
Make inferences about ‘which treatment (ad) performs better, holding the audience constant’?

See "‘Meta for developers’ on Lift Tests:"

No. ****Josh: "what it says is something importantly different: you can compare the number of people who do the action you are interested in ... according to whether or not they see a given ad. So, you do have random assignment when comparing the effect of an ad to the effect of no ad. ... if we compare the lift for two different treatments (What these multi-cell lift tests are doing), we are doing almost exactly the same thing as we were without the lift functionality...

A and B are displayed to different audiences, so this test does not have random assignment."

Essentially this allows you to get the correct 'lift' of A and B, on their own distinct audiences, by getting the counterfactual audiences for each of these correct. But you cannot compare the lift of A and B on any comparable audience.

To help understand the context... "Facebook often randomizes the whole audience into different cells and THEN targets the ad WITHIN that audience. So there is random assignment at the initial stage, but that's irrelevant, because not everyone in the potential audience sees each ad"\

Simple quant design issues

How many observations, how to assign treatments, etc.

Resources

Todo: Integrate further easy tools and guides, including those from Jamie Elsey

"Even a few observations can be informative"

Drawing from Lakens' excellent resource:

You are considering a new and an old message.

Suppose you are a ‘believer’ … your prior (light grey up to) is that ‘this new message nearly always performs better than the control treatment’

Suppose you observe only 20 cases and the treatment performs better only half the time. You move to the top black line posterior. You put very little probability on the new message performing much better than the control.

Now suppose you have the ‘Baby prior’, and think all of the following ten things are equally likely

less than 10% of people rate the new message better than the control
10-20% of people rate the new message better than the control
…
… 50-60% of people rate the new message better than the control
...
90-100% of people rate the new message better than the control

You run tests on 20 people, and you get 15 people preferring the new message.

Now you update substantially. From some calculations (starting from Lakens' code, pbeta(0.65, aposterior, bposterior)) you put about an 80% posterior probability that the new message is preferred by at least 65% of the population. (And only about 1.5% probability on the control being better)

So if I really ‘am as uncertain as described in the example above’ about which of two messages are better (and by how much)...

... then even 20 randomly-selected people assessing both messages can be very informative. How often does this ‘strong information gain’ happen? Well, under the "baby prior", you would get information at least this informative in one direction or the other about half the time.

Adaptive design/sampling, reinforcement learning

Overview: conversation with Dillon Bowen

Dillon writes: I've run some very promising MTurk pilots using my adaptive experimentation software. Compared to traditional random assignment, it increases statistical power, identifies higher-value treatments, and results in more precise estimates of the effectiveness of top-performing treatments. From simulations, I estimate that the gains from adaptive experimentation are approximately equivalent to increasing your sample size by 2x-8x (depending on the distribution of effect sizes).

This would allow us to run studies like Eric Schwitzgebel + Fiery Cushman's study on philosophical arguments to increase charitable giving much more effectively

Overview: conversation with DB

Dillon Bowen: End of 3rd year of decision processes in Wharton PHd.

Here is a stats package for estimating effect sizes in multi-armed experiments. https://dsbowen.gitlab.io/conditional-inference/

Adaptive experimentation software: Hemlock

I just made a getting started video: Welcome to Hemlock - YouTube

Adaptive experimentation (discussion)

...running experiments with many arms and winnowing out the 'best ones' to learn the most/best.

See: adaptive design, adaptive sampling, dynamic design, reinforcement learning, exploration sampling, Thompson's sampling, Bayesian adaptive inference, multifactor experiment

Treatment space

Discrete vs continuous: switches vs. knobs

In our cases of the ‘options are discrete’, many knobs to turn, although some are discrete. There is a different version of this for discrete vs continuous

If we can order the different treatments (arms/knobs) as 'dimensions' we can infer more... Can do better thinking of them as a ‘multifactor experiment’ rather than 2 unrelated … several separate dimensions

"Model running in the background" trying to figure out ‘things about the effectiveness of the interventions you might use’

'Explore only' or 'explore & exploit' at the same time

“Ex-post regret versus cumulative regret” … latter suggests Thompson sampling (Does Thompson's sampling take into account the length of the future period?)

Learning and inference

Ex-post … Use machine learning to consider which characteristics matter and how much they matter … although he doesn’t know of papers that have looked at this, but assumes there are adaptive designs that incorporate this.

Statistical inference can be challenging with adaptive designs, but this is a ripe area of research

Dillon: has a paper on traditional statistical inference after an adaptive design.

Goals 'what kinds of inference':

The arm you using relative to (? the average arm?)
Which factors matter/joint distribution ….. Bayesian models

Notes: Implementing adaptive design on existing sites

We need a great web developer, a system so that a program Dillon writes is fed data on the factors (?) to assign a user to a treatment. Dillon will set up an ML model that is continuously updated … ‘next person clicking on this page gets this treatment … web dev makes sure it shows the recommended content’

We figure out what factors we want, what levels, have a basic web design … Dillon comes in and turns the ‘1000 dim treatment space and featurize it so his model can use it’.. Works with a dev to set up a pipeline.

'Observational' studies: issues

Example: "Hours spent promoting" vs "number of fellows"

Consider a study where

EA groups are asked to voluntarily participate (with no direct compensation)
to report the 'time spent on each recruiting activity',
and to ask their fellows/members 'how did you hear about our group?'

Suppose this finds

'per hour spent by the organizers, far fewer people report "tabling" as the source, relative to 'a direct email'.

Should we interpret this as

'direct emails are a more efficient use of time than tabling, thus groups should spend less time doing tabling and more time sending emails?'

Maybe, but we should be careful; there are other explanations and interpretations we should delve into. Some of these could be partially addressed through survey design, others through careful analysis. Other 'causality' issues may require an experiment/trial/test to get at.

Statistical inference: chance and selection/selectivity

Random variation: With a small sample of groups, these numbers may be particularly high or low (for tabling, for emails, etc) by chance; the averages for a 'typical group' may turn out to be very different.

This is the standard issue of statistical inference about a population from a sample.
The issue of 'misrepresenting the population' tends to be worse with smaller samples (here small number of groups, and small numbers of observed outcomes in each group; e.g., only a few fellows)
However, 'as Bayesians know' you can still draw valuable decision-relevant inferences from small samples. IMO (Reinstein) the "problem of small samples" tends to be overstated because we mainly learn about statistics designed for a particular scientific frequentist approach.

Selection/selectivity: The groups that 'opted in' to be part of this survey may not be a 'random draw' from the population of relevant groups. It may represent more careful or more enthusiastic groups, perhaps groups that are particularly analytical and not so good socially, etc. If some of the 'fellows' within the groups don't complete the survey, this could add another 'selection bias'.

'Marketing causality' issues

Attribution with multiple sources: “How did you hear about this program?” This could be interpreted in several ways, probably “how did you first hear”. But in marketing sometimes people hear about something multiple times, and it’s hard to know which of these are pivotal in getting them to take action. (We could probably do something to make this question a bit more informative.)
“Lift”: some people might have signed up anyways even without the activities identify as ‘how they heard about it’. Other people may have been harder to reach, and for the latter (e.g.) tabling ‘Spoke to us while we were tabling’ may be pivotal.

Decisionmaking implications

Costs $\neq$ hours: the cost of these activities may not be fully proportional to the times spent … e.g., writing a professor may be mentally costly and possibly cost some other social capital. On the other hand tabling may be fun and social, and also generate interesting feedback (and other benefits that are harder to measure, like links with other groups also doing tabling)
Diminishing returns/hard limits on some activities … e.g., there may be only so many professors (or students) to email. After a few hours of this

Analysis: Statistical approaches

What to do with the data after you collect it (and what you should put in a pre-analysis-plan).

Impact of treatment on 'rare event' incidence

Notes from slack:

I’m finding some issues like this in analyzing rare events … not quite that rare, but still a few per thousand or a few per hundred.
I’m taking 2 statistical approaches to the analysis (discussion, code, and data in links):

Randomization inference (simulation) … for a sort of

I think either of these could be ‘flipped around’ to be used for power calculation or ‘the Bayesian equivalent of power calculation’

My colleague Jamie Elsey has some expertise with the latter; , although it’s mainly frequentist and not Bayesian ATM.

Open and robust science: Preregistration and Preanalysis plans

There are reasons 'some pre-registration' or at least 'declaring your intentions in advance' is worth doing even if you aren't aiming at scientific publication

Which statistical tests/methods

Profiling and segmentation project

Introduction, scoping work

Strategic considerations

Previous sections considered... 'How to get more people to care about '. 'How to get the "Einsteins" of the next generation interested in this.' And 'how do we introduce this to people?'

But, an equally-important concern may be... WHOM do we target? How do we do market profiling? Not just 'what do we present', but 'who do we present it to'

In this section, we cover the limited work that has been done on this, and the scope to do more.

Scoping and considering the value of doing this

Leander Rankwiler's recently (17 Feb 2023) did a scoping exercise for this. See "Detecting affinity for the ideas of effective altruism on social media". This work focuses on "the rationale, literature research, and data collection", and comes to relatively negative conclusions ("it's much less valuable to pursue than previously assumed"). This particularly reflects concerns that doing, publicly reporting, and acting on this research to 'target promising groups' may do some harm (see fold).

Downside risks (Rankwiler)

Risk of harming the diversity (of personalities) within EA, by targeting the "typical" EA personality.
"Risk of negative public perception of the method of using personality traits to find promising users (à la Cambridge Analytica)"

He also sees many sources of (statistical) bias in any feasible analysis.

In the sections below, we present and link recent and ongoing direct work that may also be relevant and informative.

Existing work/data

23 May 2022 update: After a long period with little relevant data, some work is underway from other groups: See sections below

Surveys/Predicting EA interest

Caviola et al

Awareness: RP, etc.

Note that RP is not a 'part of this Market Testing team', but we want to coordinate with them and benefit from the survey and profiling work they are doing/have done. I try to map/link the space here.

RP: "How many people have heard of EA" survey

Asks respondents to tick terms and people that they are familiar with (EA/non, real/rare/fake). If they have heard of EA, we follow-up with open-ended questions to detect actual understanding. We also ask about socio-demography and politics. Administered to a ‘national sample’.

(We will follow up with attitude surveys among those who have heard of EA.) We use Bayesian models to generate the posterior distributions of

share who know/understand EA within different groups,
weighted to be nationally representative (of each group).

Wild Animal Welfare/Suffering attitudes

Various survey projects ongoing

EA Attitudes and Longtermist Attitudes

Developing measures of attitudes towards EA/Longtermism
Conducting large national surveys looking at predictors of these attitudes (including differences across groups)
Standard ‘message testing’ (what arguments/framings work best for outreach (including differences across groups)

Kagan and Fitz survey

Sample, Design, & Measures. We recruited a national online sample of 530 Americans. Participants read and reflected on an introduction to evidence based giving, and then completed our main outcomes of effective giving. Participants then completed a series of measures of their beliefs, behaviors, values, traits, sociodemographics, etc. The instrument, measures, and data are available upon request.

Was this a 'representative sample'? How were they recruited?
Note they 'read about EA first' ... perhaps making them vulnerable to demand effects?
DR: I've requested this data, but I think the authors are having trouble finding the time to dig this up

Primary Measures. To measure effective giving, we assessed several attitudes and behaviors; this summary presents results from a novel 7-item scale, the Support for Effective Giving scale (SEGS) [ ⍺ = .92], and an effective giving behavior allocation.
The items in SEGS assess general interest, desire to learn more, support for the movement, and willingness to share information with others, identify as an effective altruist, meet others who support the movement, and donate money based on effective giving principles. To approximate giving behavior, we presented participants with short descriptions of three causes Deworm the World Initiative, Make a Wish Foundation, and a local high school choir and had them allocate $100 between these groups and/or keeping it themselves.

Was the allocation purely hypothetical or incentivized in some way, perhaps 'one response was chosen'?

Secondary Measures.
To measure beliefs, behaviors, and traits of people who endorse effective giving, we employed measures of: perceived social norms, charitable donation beliefs and behaviors, self perceptions, empathy quotient ( EQ ) , empathic concern & personal distress ( IRI ), the five moral foundations ( MFQ 20 ) , the five factor personality model ( TIPI ), goal & strategy maximization ( MS S ), updated cognitive reflection tests ( CRT ), sociodemographics (e.g., age, gender & racial identity, income), politics & religion, familiarity with ‘the effective altruism ’ movement , and state residence

So far, the best overall model predicts 41% of the variance in support for effective giving.

Summarized in posts...

.... After participants read a general description of EA, they completed measures of their support for EA (e.g., attitudes and giving behaviors). Finally, participants answered a collection of questions measuring their beliefs, values, behaviors, demographic traits, and more.

The results suggest that the EA movement may be missing a much wider population of highly-engaged supporters. For example, not only were women more altruistic in general (a widely replicated finding), but they were also more supportive of EA specifically (even when controlling for generosity). And whites, atheists, and young people were no more likely to support EA than average. If anything, being black or Christian indicated a higher likelihood of supporting EA.
Moreover, the typical stereotype of the “EA personality” may be somewhat misguided. Many people – both within and outside the community – view EAs as cold, calculating types who use rationality to override their emotions—the sort of people who can easily ignore the beggar on the street. Yet the data suggest that the more empathetic someone is (in both cognition and affect), the more likely they are to support EA. Importantly, another key predictor was the psychological trait of ‘maximizing tendency,’ a desire to optimize for the best option when making decisions (rather than settle for something good enough).

Longtermism attitudes/profiling

See

Rethink Priorities

RP has a remit and some funding to pursue this.

Animal welfare attitudes: profiling/surveying

A brief outline and links to what has been done across organizations

Farmed/overall

Faunalytics

This is possibly the best meta-resource as well as a source of original research

Sentience institute

Our Animals, Food and Technology (AFT) survey tracks attitudes towards animal farming and animal product alternatives in the US. In 2020, as in the 2017 and 2019 iterations, we found significant opposition to various aspects of the animal farming industry, with a majority of people reporting discomfort with the industry, and strong support for a range of quite radical policy changes, such as banning slaughterhouses. The trend in attitudes between 2017 and 2020 is relatively stable, though slightly negative (not statistically significant). Notably, the number of people who consider animal farming to be one of the most important social issues fell from 2017 to 2019, and remained at this lower level in 2020.

Rethink Priorities

Some replication work on the above

Various work including

__ DR: I'm awaiting permission to share the list.

Paper: The moral standing of animals

Wild animal suffering

ACE

ACE - Wild Animal Suffering Survey 1

ACE - Wild Animal Suffering Survey 2

Other

Wild Animal Welfare/Suffering attitudes (Rethink Priorities, in progress)

"Scientists’ Attitudes Toward Improving the Welfare of Animals in the Wild"

Other data

Who gives effectively? Unique characteristics of those who have taken the Giving What We Can Pledge

we focus on individuals who have taken the Giving What We Can Pledge: a pledge to donate at least 10% of your lifetime income to effective charities. In a global survey (N = 536) we examine cognitive and personality traits in Giving What We Can donors and compare them to country-matched controls. Compared to controls, Giving What We Can donors were better at identifying fearful faces, and more morally expansive. They were higher in actively open-minded thinking, need for cognition, and two subscales of utilitarianism (impartial beneficence and instrumental harm), but lower in maximizing tendency (a tendency to search for an optimal outcome). We found no differences between Giving What We Can donors and the control sample for empathy and compassion, and results for social dominance orientation were inconsistent across analyses.

Tangential: 'Omnibus' lab survey at University of Exeter

Includes real donation choice question(s), rich survey and psychometric data, including 'mind in the eyes' empathy judgements
Students and nonstudents (local town population)

Consider Lown and XX paper... MITE empathy moderates the impact of political attitude, or something ... dissonance resolution Feldman, Ronsky, Lown https://onlinelibrary.wiley.com/doi/full/10.1111/pops.12620

mturk + qualtrics

ended up manipulating whether aid was government or charity, and domestic vs foreig; thought those would be moderated by MITE depending on their ideology/attitude? Also consider ... Empathy Regulation and Close-Mindedness Leonie Huddy, Stanley Feldman, Romeo Gray, Julie Wronski, Patrick Lown, and Elizabeth Connors Also asked about domestic welfare and foreign aid attitudes...

sample fairly large ... 1100 or so?

Fehr/SOEP analysis... followup

Your Place in the World: Relative Income and Global Inequality

See discussion in:

NBER Working Paper (2019/2021), Dietmar Fehr, Johanna Mollerstrom, and Ricardo Perez-Truglia

Attitudes towards global redistribution
"De-biasing" intervention (how rich participants are relative to Germans, how rich Germany is globally)

Tied to

German Socio-Economic Panel (SOEP), a representative longitudinal study of German households. The SOEP contains an innovation sample (SOEP-IS) allowing researchers to implement tailor-made survey experiments.

a two-year, face-to-face survey experiment on a representative sample of Germans. We measure how individuals form perceptions of their ranks in the national and global income distributions, and how those perceptions relate to their national and global policy preferences. [Their main result]: We find that Germans systematically underestimate their true place in the world’s income distribution, but that correcting those misperceptions does not affect their support for policies related to global inequality.

Why might this be relevant to our profiling:

They ask about support for global redistribution, international aid institutions, globalization, immigration, and more, and have an incentivized giving choice. These are (arguably) measures of support for some EA behaviors/attitudes.

I suspect that this data could be tied to a variety of rich (personality? demographic?) measures in the SOEP. A predictive model for actual EA/Effective giving targeting in other related contexts? If so, let's focus on things we are likely to observe in those other contexts (or at least likely to have proxies for). If there are any 'leaks' (not sure I'm using the term correctly)... missing a single feature could ruin the predictive power of the whole model.

Causal interpretations (very challenging)?
- Here 'nearly immutable characteristics' (like ethnicity, age, parental background, maybe some deep psych traits) might be a bit more convincing
*Descriptive* (whatever we mean by that)
- Some things like "Previous donations" might be sort of colliders or 'confounds' (I'm a bit vague here) in interpreting other associations
- I tried to tackle some of this stuff

See in next section

Followup with Thomas Ptashnik

Further scoping, access, PhD partner

Thomas Ptashnik is a Psychology PhD student interested in working on this with us. He is using the SOEP-Core data and familiar with SEM/Latent variable methods.

We have gained access to the relevant data

Here's the link to the Fehr appendix that contains the survey items they created (starting at Appendix B on page 33).\

Some salient example content:

These items correspond to the SOEP-IS surveys, which can be found here (use item names, like Q132, to search quickly
2017: https://paneldata.org/soep-is/inst/soep-is-2017-f
2018: https://paneldata.org/soep-is/inst/soep-is-2018-f
These links also mention that individuals with preexisting data access can apply for expanded access. I [Thomas] have access to SOEP-core version 36 (1984-2020 surveys),..

DR: Some interesting content (at a quick peek)

From 2017...

Q380: What you value in your work likerts ... includes "Having much influence" and "Socially responsible and important work"

Q160: Optimism/pessimism about the future

Q162: ... bunch of Likerts on "attitudes towards life and the future" (e.g., 'The options that I have in life are determined by social circumstances.')

From 2019

... they seem to collect genetic data

A proposed project

DR notes on 15 Dec 2021 meeting with

Does the Fehr/SOEP data provide valuable 'outcome measures' of EA and effective giving support?

I think we might see positive responses to the Fehr et al questions and donation choices as ‘necessary but not sufficient' for people to become effective givers or even EAs. If (especially in spite of the de-biasing) people still don’t support international redistribution, international orgs, and don’t opt to give from the lottery earnings to the global poor person … I think they are very unlikely to be susceptible to an EA or effective giving (e.g., GiveWell) appeal. (See further discussion and debate on this below). (But, as a check on this, it might be good to try to ask these same questions on a sample of actual EA’s and effective givers, and a comparison group!. #surveyexperiments)

Two projects on the same data

I envision two related projects on the same data: 1. Building a 'portable' model for prediction to aid targeting and 2. Building a 'deeper' model to aid understanding

I’m hoping that looking for predictors of (or ‘coherent factors explaining’) these responses in the SOEP data would prove useful for organizations like GWWC to consider ‘which groups to target in doing outreach’ (and perhaps especially ‘which groups to rule out’)

I hope we can do a sort of ‘leak-proof validated predictive ML model for this’
perhaps especially relevant for the German/EU context

Thomas: After talking it over with some colleagues, I think this approach is our best bet in terms of developing something with practical utility that still has a chance of being published in an academic journal. This is not my area of expertise, but if I remember correctly you have some R code already written. So I should quickly be able to put something together.

2. An (exploratory model) to help understand key factors that might be driving EA-adjacent attitudes and behaviors, offering insight into ‘what drives people towards or away from this mindset’.

Here we could engage the richer set of SOEP variables and consider latent factors

Anonymous colleague; caveats on 'the two goals'

if one simply wants to target people for giving to some specific EA-aligned cause in terms of a donation. In that case of the hypothetical African Christian women are likely to give, and it doesn't matter so much how they get to that decision. Quite a different set of metrics is desired (the kind of things we are trying to get at) if one is trying to actually select/find 'effective altruists'[RT2] if one simply wants to target people for giving to some specific EA-aligned cause in terms of a donation. In that case of the hypothetical African Christian women are likely to give, and it doesn't matter so much how they get to that decision. Quite a different set of metrics is desired (the kind of things we are trying to get at) if one is trying to actually select/find 'effective altruists'

Red team

But I'm less sure about: ..."would prove useful for orgs like GWWC to consider ‘which groups to target in doing outreach’ (and perhaps especially ‘which groups to rule out’)"

[suppose] you measure something like 'interested in giving to people in poverty in Africa' (or, at best, cosmopolitanism), and you find that the people highest in this are [Classical music fans], but the people most interested in EA stuff are [Techno ravers]. I think there are lots of reasons why this might occur. It could be that interest in EA is a combination of cosmopolitanism + interest in maximising effectiveness, but differences in the latter swamp the former. (If so the reasoning would at least be along the right lines, but would potentially be very practically misleading to GWWC)...

But I think what could be going on could be even worse, i.e.:

The measures measure something like 'not being so parochial that you won't give to a non-German charity', which is (ex hypothes) a necessary condition, but so minimal it's not really informing us about the much more demanding thing
... it measures something more specific/narrow that may be orthogonal or even antagonistic to EA (e.g. interest in overseas charity/poverty specifically [even if it doesn't maximise effectiveness]). Thought experiment: how would a libertarian-leaning AI-safety concerned German EA respond to the questions?

[still, this] seems worthwhile... I'd just be very tentative about inferring anything about what GWWC should do etc

Red team analogy

(I think of this case as a bit like studying interest in Marxism by asking about whether people are interested in helping the poor (or some such) In one sense you might think of this as a necessary condition / people who don't have any concern for this are not likely to be interested in Marxism. OTOH you'll probably mostly be picking up the 99% of people who are interested in helping the poor but not interested in the much more niche / slightly weirder thing that is also closely related to helping the poor, but is also associated with slightly counterintuitive views like 'donating to the poor is not good, you need to be concerned with [systemic change and global revolution / AI safety] etc.)

Red team:

[red team]
I guess it will be interesting to find out through your analysis:
Are these measures predicted by plain altruism + cosmopolitanism (which a priori we might say are more likely to be connected to EA)
Or are these measures predicted by egalitarianism + belief we should repay the third world / belief the rich should help the poor (which seem like they may be less closely connected with EA)*
*of course EAs are overwhelmingly liberal/egalitarian, but liberal/egalitarians are overwhelmingly not EA, which I think is an important complication"

DR and TP response to red team

Good points, and I even think “global redistribution” might rub some actual EAs the wrong way, as well as many EAs rejecting the 'repay our collective guilt' aspect.

Still, GWWC and TLYCS are pushing more for behaviors (esp. giving) than for intellectual alignment with EA. They are also pushing the traditional global poverty part of the EA agenda. I suspect the Fehr/Soep measures will pick up people more receptive to this than to longtermist 'avant garde' EA.
- Thomas: This is the main point to highlight. We probably need to limit our generalizability to the people-oriented neartermist worldview bucket. As the comments above note, I'm not sure this worldview necessarily maps onto the longtermist individual concerned about, say, AI safety risk. However, as you point out, there is still utility in focusing on understanding individuals that have this worldview for GWWC and other EA orgs, and this worldview (according to the EA survey) is currently the largest in the community.
  - DR: Agreed, but we probably need to make sure not to water it down too much; ideally we would retain some notion of 'the importance of prioritization and cost-effectiveness' in the worldview we are targeting
- TP: As you mentioned, it would be interesting to replicate this survey with explicitly EA endorsing individuals. Particularly, in seeing how well the ML model can predict cohorts that fall into the three different worldview buckets.
  - DR: yes but the model that predicts "EA/global poverty supporting types within a general population may be unlikely to predict groups *among explicit EA's*" ... still, the comparison could be interesting (and we've done a bit of this already with the EA survey)
- TP: Also, as a long-term idea, it could be useful to consider developing more EA-oriented items for SOEP-IS (the survey Fehr and colleagues used) that take into account all the issues listed here.
  - DR: that would be great!

RT2: Is there any way you can think of to get at EA more like a style of thinking/justification of choices as opposed to possibly the highly context-dependent choices are themselves? Some kind of relevant psychometric things are probably possible e.g., need for cognition or something similar RT1:

One option create or use measures of maximising + cosmopolitanism + altruism (or of maximising cosmopolitan altruism) ... maybe we are getting at 'EA style of thinking'. And if we can show that these more abstract measures are connected to behavioural or otherwise more concrete measures of EA inclination (whether that's decisions/choices, signing up for mailing list or something else) then it does seem reasonable to think of these as capturing EA inclination.
The risk otherwise is that theoretically we think these 3 things correspond to EA thinking... and actually they don't ...
Consider NFC, IRT, Rationality Quotient etc. as predictors of EA-inclination \

DR: My conception was maximizing + cosmopolitanism + _altruism + willing-to-sacrifice/non-competitiveness …_I think many people think “I should work to help humanity” but also think ‘yeah but I’ll be a sucker if I give to charity while my neighbor gets a new swimming pool and Hawaii holiday…’That’s where “willing-to-sacrifice/non-competitiveness” comes in, in my mind. (It needs a better name?)I think this last trait more important for effective giving than for EA-intellectual-engagement… and it may not be important at all for the latter.

Thomas: In psychology, altruism captures this notion. Prosociality is a concept of helping others but allows for self-concern, while altruism is distinguished by a purer form of selflessness (I have a paper under review that goes into detail about this, which I can privately share....argh, the closed doors of academia). Fortunately, altruism is widely studied and there are even a few items that capture it in the SOEP dataset. \

Value of incentivized measures here

(DR ideas)

IMO it would be nice to have some meaningful behavioral (incentivized) measures on top of the ‘psych’ ones. The ‘donation to the very poor’ measure in Fehr et al gets at this a bit … although its a pretty small probablistic sacrifice. And I suspect it measures all three of the above except maximizing. And I don’t think these things are all separable, so I think that the fact that it measures ‘altruism and willing to sacrifice in a cosmopolitan-relevant context’ is good.
It would also be pretty nice to have a behavioral/incentivized measure of ‘maximizing in an altruistic context’ …If Fehr ea had asked them to (e.g.) allocate giving among a German poor person, an African poor person, and themselves, this might have been a decent measure.
(We have this choice in some other contexts though … not as rich data but maybe worth digging into). Why might that choice have been better (in some ways) than a hypothetical choice? Because I imagine in a hypothetical choice some people would be like “OK they obviously want me to say support the poor person in Africa, and I see the maximization arguments, so, fine.'But when it involves real money, and even their own money, I expect that for some people, other motives will outweigh the ‘maximizing motive’…“wait, I’d rather keep the money than give it to an African who will waste it”“wait, if this is real, I’d rather help someone local”.

Analysis Plan, sample, and variables under consideration (01/31/22, Ptashnik)

DR: See sidebar comments

Analysis plan

Lasso regression to identify the most salient cluster [DR: how is this defined?] of predictors for effective giving

I will use k-fold cross-validation to compare a lasso model with ridge regression and OLS to confirm it is the best method for handling our data [DR: 'best in what sense? I recommend the elastic net approach if possible.]

Bayesian and latent lasso

TP: There is now a Bayesian form of lasso, but the R packages to run this analysis are in their infancy and the results between the methods are strikingly similar (Steorts, 2015). So, on the first pass I will just use one of the methods above but may rerun the analysis time-permitting to check my assumption that results won’t change.

Similarly, there is latent lasso regression, but most of our constructs have only one indicator and the R package for this analysis also appears to be at a nascent stage.·

Sample

To start, I’m just considering the 2017 survey and the control group (i.e., those who weren’t notified of their position in the national and global income distribution (~700 individuals). We can expand to the 2018 survey and the treatment group in future analyses using the same method (although some items may not be included across surveys).

Outcome Variable

Q280 and 281 in the SOEP-IS dataset developed by Fehr et al. (2019)

You were paired with another household in Kenya or Uganda. This household belongs to the poorest 10 percent of households worldwide. Now, you have 50 EUR at your disposal and can split this amount between the other household and you in any way you want. If this task is selected for payout, you will receive the amount you decided to keep at the end of the interview. The amount you want to give the other household will be given in full to the other household (without transaction costs) at the end of the field period by Heidelberg University via a charitable organization. In full means that every given euro will be received by the other household 1:1. A leaflet with information about the donations will be given to you after you have made your decision. I ask you to make this decision alone now.”

“How much of the 50 EUR do you want to keep and how much do you want to give the other household?”

2017 survey questions: https://paneldata.org/soep-is/inst/soep-is-2017-f

Variables Under Consideration

Below I list variables below in terms of what the intended construct I’m trying to get at and the proxy measures that are available within the SOEP dataset.

Theoretical rationale for construct from 'charitable giving' review

Theoretical rationale for these constructs comes from the most comprehensive review on predictors of charitable giving I could find (Bekkers & Wiepking, 2007; also see Bekkers & Wiepking, 2011 and Wiepking & Bekkers, 2012 for follow-ups on this review). These reviews seem like a reasonable starting point because they are cross-disciplinary and only consider studies that involve real money to real charitable organizations. There were a surprising number of what I think of as common-sense variables that weren’t included in these reviews that I add in the table below (i.e., those without an asterix).

There were several variables omitted because I did not think they were relevant or other constructs exist that better get at the underlying effect. ...

Home ownership: Appears to just be an indicator of wealth, so using income is preferrable.

Perceived financial position: Bivariate studies (Bennet & Kottasz; Havens et al., 2007) conclude those who perceive their financial situation as more positive are more generous donors. However, Fehr et al. (2019)—which has a more robust design—reports that “we find no evidence that perceived rank in the global income distribution affects support for global redistribution, donations to the global poor, globalization or immigration. If anything, when thinking about these policy preferences, it matters more how one compares to other people nationally than to others around the globe.” Given these findings and the fact that we are using the same data, it is probably sensible to omit this variable. Although studies have found confidence in the economy (Okunade, 1996), so an interesting pivot could be to measure optimism (both domain-specific and general forms).

Place of residence and years of residence: Mixed findings and it appears to be a weak predictor regardless.

Immigration and citizenship status: Better captured by other variables. “Osili and Du (2005) found that immigrants in the United States are less likely to give to charitable organizations and also give less, but that these differences are due to differences in racial background, lower levels of income, and education” (Bekkers & Wiepking, 2007: 15).

Youth participation: Impacts donations through socialization, which is better captured through parental background. It also strengthens social bonds of the children in the community, making them less likely to make effective donations over local causes.

Volunteering: In simple bivariate analysis, volunteers are usually found to donate more to charity. However, differences between volunteers and non-volunteers often vanish in multiple regression analyses controlling for joint determinants of giving and volunteering (Bekkers, 2002, Bekkers, 2006a, Wiepking & Maas, 2006). Given SOEP only asks about time spent volunteering and does not categorize where one volunteers, this variable seems like a blunt tool that is likely to be insignificant.

Awareness of need: A strong predictor of general philanthropy, but Fehr et al. (2019) did not find significant effects for effective giving. DR: I think 'failing to find significant effects' shouldn't be reason to exclude this!

[DR: I think 'previous failire to find significant effects' shouldn't be reason to exclude!]

Variables held constant by the survey design (see Bekkers & Wiepking 2007 for detailed explanation): Solicitation, benefits, reputation, and efficacy.

Construct *outlined in review articles

Brief Rationale for Inclusion

Items from SOEP

Religious involvement*

One of the most studied variables in philanthropic studies. However, a large body of research finds that religious involvement is not related (or even inversely related) to secular giving (Brooks, 2005; Lyons & Nivison-Smith, 2006; Lyons & Passey, 2005). Still, given its prominence (and that fact that there are religious EA groups), it is worth including in our analysis.

“Do you belong to a church or religious group?”

-----------------

“What church or religious group do you belong to?”

Level of education*

Has been found to have a positive relationship with secular giving (Yen, 2002), more EA-aligned giving (e.g., development aid versus emergency aid; Srnka et al., 2003), and there are conflicting results on whether education impacts the amount donated (c.f., Schervish & Havens, 1997; Brooks, 2002).

“What type of vocational training or university degree did you receive?”

Field of study*

A handful of studies have found graduates of different fields to be differentially generous, although which groups are at the top is inconclusive (c.f., Bekkers & De Graaf, 2006; Belfield & Beney, 2000)

Not available for SOEP-IS

Income*

Higher income households donate higher amounts than lower ones, however, the relationship with discretionary income is complex and unresolved (McClelland & Brooks, 2004). Income elasticity has been shown to be a salient predictor (Brooks, 2005), but for our purposes, general net income seems like the most sensible since this is information EA organizations might be able to obtain or estimate.

“How satisfied are you with your household income?”

“How satisfied are you with your personal income?”

----------

“I earned [net income]”

----------

“What do you think is your monthly gross salary in one year?”

Age*

Unclear relationship: generally, appears to increase over time and level off around retirement, but this relationship is highly dependent on covariates such as church attendance, number of children, and marital status.

Should be available. I’m waiting for confirmation.

Number of children*

Positively related to philanthropy in most studies, but the age of the children may influence the direction and magnitude of the effect, specifically when they are younger than 14 (Okten & Osili, 2004) and 18 (Okunade & Berl, 1997).

According to ‘My Infratest’, these are the children in your household that were born in 2001 or later. Please state whether these children still live in your household.”

----------

…accompanied by companion question: “Do more children live in your household which were born in 2001 or later?”

Marital status*

Mostly found to be positively related to giving, although a number of studies finding null effects (Apinunmahakul & Devlin, 2004; Carroll et al., 2006) call into question the magnitude of this effect.

“What is your family status?”

Employment*

The employed generally donate more than the unemployed (Chang, 2005a&b); those who work more (days and hours) donate more (Bekkers, 2004; Yamauchi & Yokoyama, 2005); retirees are highly charitable; self-employed are less generous (Carroll et al., 2006); and public service employees are more likely to engage in philanthropy than for-profit workers (Houston, 2006).

…could confirm officially unemployed: “Are you registered as unemployed at the Employment Office?”

“What is your current occupational status as a self-employed?”

…closest question I could find that gets at something other than for-profit work: “Do you work for a public sector employer?”

Gender*

Mixed findings in general and no finding when looking at one-person households (Andreoni et al., 2003). Still, given the ubiquity of this variable, it is sensible to include it in the model even though I have little faith it will be significant.

Should be available. I’m waiting for confirmation.

Race*

Caucasians generally give more, but this finding is tempered by the cause (non-whites donate more to the poor and religious organizations; Brooks, 2004; Brown & Ferris, 2007; Smith & Sikkink, 1998).

Should be available. I’m waiting for confirmation.

Parental background*

Higher levels of parental education, parental religious

involvement, and parental volunteering in the past are related to higher amounts currently donated by children (Bekkers 2005a). While current parental income and church attendance also predict giving (Lunn et al., 2001; Marr et al., 2005).

I thought a proxy for parent’s occupational prestige might be a salient predictor. Questions 496-502 cover the mother’s background and have the exact same wording.

Questions split depending on occupation and all contain the header: “What was your father’s occupational status as…”

“A self-employed person?”

“A civil servant?”

“A white-collar worker?”

“A blue-collar worker?”

“What type of school leaving certificate did your father attain?”

“Did your father complete vocational training or a university degree?”

Personality*

Donations have been found to increase with emotional stability and extraversion (Bekkers, 2006b), as well as openness to experience (Levy et al., 2002). General social trust has also been found to be a salient predictor (Brooks, 2005; Micklewright & Schnepf, 2007). Empathy has been found to be related to donations (Bekkers & Wilhem, 2006), as well as altruism.

Big Five Personality traits:

Agreeableness: “is considerate and kind to others”

Openness to experience: “is eager for knowledge”

The self-control scale. Sample item: “I am good at resisting temptation.” 10-item scale split between two links below.

Cognitive ability*

Persons with higher verbal scores (Bekkers & De Graaf, 2006), IQ (Millet & Dewitte, 2007), GPA (Marr et al., 2005), and ability to think in abstract terms (Levy et al., 2002) donate more.

Innovation exercise to assess emotional intelligence.

“What emotion was shown by the individual? For every emotion, please rate how strongly you perceived it. If you saw a group, please rate the emotion of the individual in the middle.”

For questions assessing quantitative skills (probabilities):

“Out of 1,000 people in a small town 500 are members of a choir. Out of these 500 members in the choir 100 are men. Out of the 500 inhabitants that are not in the choir 300 are men. What is the probability that a randomly drawn man is a member of the choir? Please indicate the probability in percent.”

Items 888-928 assess the ability to do expected utility calculations:

“Please imagine the following situation: You have the choice between a safe payment and a lottery. In detail: Do you prefer a 50% opportunity to win 300 Euro while you do not win anything by 50% or a safe payment of 160 Euro.”

Quantitative skills:

“Now answer another question within 20 seconds. Continue the multiplication tables of the base 17 as far as possible. Starting with 17, 34, etc. The time is running - now.”

Context*

Donations are influenced by behavior of coworkers in the same salary quartile (positive; Carman, 2006), income inequality (negative; Okten & Osili, 2004), individualistic cultures (positive; Kemmelmeier et al., 2006), and the stock market (positive; Drezner, 2006).

Stock market optimism: “Initially we focus on the next year (next 12 months). Do you expect the DAX [German blue-chip index] to show rather profit or loss compared to the current value?”

Numeric version: “Expressed in numbers: What [Profit/Loss] do you expect for the next year overall in percent?”

This same question stem of stock market optimism is used for items about the next two, ten, and thirty years

Occupational prestige*

Generally, positively related to donations (Carroll, McCarthy, & Newman, 2006).

Current occupation (open question):

----------

Occupation (answer choices included):

----------Political orientation*

Each occupation choice is then further refined:

Blue-collar worker

White-collar worker

Civil servant

Apprentice/intern

Self-employed

Political orientation*

Previously, no differences were found for secular donations (Brooks, 2005), but Fehr et al. (2019: 26) find that “for right-of-center respondents, there are indications that higher national relative income is related both correlationally and causally to more giving to poor Germans and Kenyans.”

Item designed by Fehr et al. (2019):

“In politics people often talk about ‘left’ and ‘right’ to mark different political attitudes. If you think about your own political attitude: Where would you place yourself?”

Locus of control*

Persons with an internal locus of control are more likely to engage in philanthropy and other formal helping behaviors (Amato, 1985).

Ten item scale with the stem: “The following statements describe different attitudes towards life and the future. To which degree do you personally agree with the individual statements?”

Health*

People in better health donate more (Bekkers 2006b, Bekkers & De Graaf, 2006).

“How would you describe your current health?”

“How satisfied are you with your health?”

Mood*

Positive affect facilitates giving, while negative moods may also facilitate giving in specific circumstances but it is conditional on lots of factors (e.g., helping contains minimal barriers and when prompted to think about the negative feelings that would result from not helping; Cunningham et al., 1980; Weyant, 1978).

Short scale of emotions (angry, afraid, happy, sad):

“Thinking back on the past four weeks, please state how often you have experienced each of the following feelings very rarely, rarely, occasionally, often, or very often. How often have you felt...”

Values*

Endorse of prosocial values has a positive association with charitable giving. This is also true of individuals who are less materialistic (Sargeant et al., 2000) and care about justice (Todd & Lawson, 1999).

Questions 172-175 on justice. For example, the stem “To begin with it is about situations which result in others advantage and your disadvantage, because you were penalized, exploited or treated unfair. To what extent do you agree with the following statements?” Followed by “It makes me angry when other are undeservingly better off than me.”

----------

Prosocial work values, particularly of interest: “Socially responsible and important work” and “Having much influence.”

Previous donations*

Charitable giving is to some extent habitual behavior (Barrett, 1991; Barrett et al., 1997).

Not available for SOEP-IS

Optimism

Belief that the future could be better might provide motivation to influence it in becoming better.

“When you think about the future, are you…”

Likelihood of events (e.g., financially successful, not get any serious illness, successful at work, content in general) happening compared to other people the same age and gender.

Life satisfaction

Spending money on others has been shown to have a consistent, causal impact on well-being (Aknin, Barrington-Leigh, Dunn, Helliwell, Biswas-Diener, Kemeza, Nyende, Ashton-James, & Norton, 2010). “One possibility is reverse causality, that is, that those who are inherently happier by nature are also more likely to help individuals” (Moynihan, DeLeire, & Enami, 2015).

“In conclusion, we would like to ask you about your satisfaction with your life in general. How satisfied are you with your life, all things considered?”

Risk propensity

Cluelessness has been cited as a case against longtermism (Greaves & MacAskill, 2021). Thus, individuals that are predisposed to EA but are risk-adverse may be more likely to make global health and development donations.

Stem: “What do you think about yourself: How prepared to take risks are you in general?”

“not ready to take risk at all ... ready to take risk”

“What did you think of when you made your estimate (i.e., the value) regarding your preparedness to take risks?”

DR comments:

A very interesting list of features
were these all asked before the charity questions? (I'm worried about reverse causality otherwise)
maybe remove 'unavailable' rows for space\

We should discuss how the fitted model will be used and interpreted ... maybe identifying a few collections of useful subsets:

Profiling 'existing traffic'

Google Analytics (and other tools) collects or predicts the demographics and market profile of traffic on sites like givingwhatwecan.org.

This gives us a sense of the

'existing demographics' of those committed and/or interested ... in ways not picked up in (e.g.) the EA survey
who is 'interested but not committed' ... possibly low-hanging fruit

(In)effective Altruistic choices: Review of theory and evidence

Introduction...

Differentiating our work (previous research in psychology, economics) we write down what our basic consensus and knowledge
Existing theories, effective altruistic actions existing
What are the problems and questions we are dealing with?
What questions do we have what challenges are we facing?
What previous work has been done to investigate these questions?
What evidence is there so far on these questions?
What are the relevant theories of behavior for this work?

The challenge: drivers of effective/ineffective giving

Outcomes: Effective gift/consider impact)

Some interventions are aimed at getting people to consider effectiveness in their giving and make donations to effective causes in particular

(Effectiveness information and its presentation)

How can we best present information about effectiveness (dollar per impact, impact-per-dollar, GW ratings, etc)?

See discussions of previous work:

(Outcome: Pledge, give substantially (& effectively))

Some interventions are aimed at getting people to make substantial contributions, or pledge to do so (e.g., GWWC pledge) ... to effective charities

(Moral duty (of well-off))

Appendix

Innovationsinfundraising.org

Innovations in Fundraising was an academic impact project and resource. innovationsinfundraising.org was hosted as an interactive Dokuwiki.

It aimed:

To explain and promote practical fundraising innovations stemming from academic research, to encourage trials and experiments, to promote effective giving and encourage collaboration and knowledge-sharing.

A key resource was a linked interactive database of 1. relevant papers, and 2. relevant 'tools'. Our automation tools allowed us to update this content via an Airtable, integrating it into the formatted DokuWiki table.

The project is no longer being hosted. Please contact David Reinstein to request access to any of the resources (or the underlying Airtable).

I (David Reinstein) took down innovationsinfundraising.org for several reasons including:

I didn't have time and funding to keep it updated, and I didn't want this to 'crowd out' others' work
Hosting costs (roughly $400 per year)
It was largely superceded (at least in my own work) by other resources and projects, including "EA Market Testing" (the present Gitbook, and linked resources)

I would consider reviving this in the future, and would be happy to join it with other maintailed resources. Please contact me if you would like to pursue this.

Key details of the Innovations in Fundraising Project (as of roughly 2017)

Director: Dr. David Reinstein, Senior Lecturer, University of Exeter

Project partner: George Howlett, head of CEA Workplace Activism project

Purpose: To explain and promote practical fundraising innovations stemming from academic research, to encourage trials and experiments, to promote effective giving and encourage collaboration and knowledge-sharing

Key innovations and ideas: Give if You Win, default recognition, give more tomorrow

Funding and support: ESRC Impact Acceleration; University of Exeter; Centre for Effective Altruism (CEA)

We are partnering with: Employers, fundraisers, philanthropists, third-sector organisations

Scientific advisors and co-authors

Psychology: Dr. Nick Fitz, Ari Kagan (Duke)
Economics and Finance: Dr. Gerhard Riener (Dusseldorf), Dr. Paul Smeets (Maastricht)
Statistics and Data science: Dr. Mark Kelson (Exeter)

Research and technical assistance: Katja Abramova, Janek Kretschmer. Previous contributors: Audrey Utchen, Agata Siuchinska, Samuel Dexter, Alexis Carlier, Louis Philipp Lukas, David Serero, Daisy Newbold-Harrop.

Wider project: The Innovations in Fundraising impact project aims to unlock generosity and increase the level and impact of charitable giving in the UK and abroad, while enhancing donors’ understanding and appreciation of their generosity.

IIF's key activities and resources included...

Knowledge exchange, tools

Build Innovations in Fundraising Wiki interactive knowledge and resource base on core issues
… including employee giving (schemes), incentive pay, philanthropy in the UK, practical fundraising research findings
Produce reports and guidelines to explain research results, industry knowledge, and best practice to a wider audience

Engagement, innovation

Engage banks, investment firms, fundraiser and other large employers, individually and in small groups, to discuss goals, processes, and opportunities for promoting employee giving in innovative ways, with a focus on high-impact charities
Hold meetings and focus groups to identify the necessary requirements and potential obstacles to implementing and testing systems allowing employees to commit potential incentive pay (bonuses) to charity

(We will collaborate with CEA in organising these, and these will also incorporate CEA initiatives and priorities)

Run pilots and controlled trials of ‘Give if You Win’ and other employee-giving innovations inside firms

Followup with Thomas Ptashnik

Further scoping, access, PhD partner

Thomas Ptashnik is a Psychology PhD student interested in working on this with us. He is using the SOEP-Core data and familiar with SEM/Latent variable methods.

We have gained access to the relevant data

Here's the link to the Fehr appendix that contains the survey items they created (starting at Appendix B on page 33).\

Some salient example content:

These items correspond to the SOEP-IS surveys, which can be found here (use item names, like Q132, to search quickly
2017: https://paneldata.org/soep-is/inst/soep-is-2017-f
2018: https://paneldata.org/soep-is/inst/soep-is-2018-f
These links also mention that individuals with preexisting data access can apply for expanded access. I [Thomas] have access to SOEP-core version 36 (1984-2020 surveys),..

DR: Some interesting content (at a quick peek)

From 2017...

Q380: What you value in your work likerts ... includes "Having much influence" and "Socially responsible and important work"

Q160: Optimism/pessimism about the future

Q162: ... bunch of Likerts on "attitudes towards life and the future" (e.g., 'The options that I have in life are determined by social circumstances.')

From 2019

... they seem to collect genetic data

A proposed project

DR notes on 15 Dec 2021 meeting with

Does the Fehr/SOEP data provide valuable 'outcome measures' of EA and effective giving support?

Two projects on the same data

I envision two related projects on the same data: 1. Building a 'portable' model for prediction to aid targeting and 2. Building a 'deeper' model to aid understanding

I’m hoping that looking for predictors of (or ‘coherent factors explaining’) these responses in the SOEP data would prove useful for organizations like GWWC to consider ‘which groups to target in doing outreach’ (and perhaps especially ‘which groups to rule out’)

I hope we can do a sort of ‘leak-proof validated predictive ML model for this’
perhaps especially relevant for the German/EU context

Thomas: After talking it over with some colleagues, I think this approach is our best bet in terms of developing something with practical utility that still has a chance of being published in an academic journal. This is not my area of expertise, but if I remember correctly you have some R code already written. So I should quickly be able to put something together.

Here we could engage the richer set of SOEP variables and consider latent factors

Anonymous colleague; caveats on 'the two goals'

Red team

But I'm less sure about: ..."would prove useful for orgs like GWWC to consider ‘which groups to target in doing outreach’ (and perhaps especially ‘which groups to rule out’)"

But I think what could be going on could be even worse, i.e.:

The measures measure something like 'not being so parochial that you won't give to a non-German charity', which is (ex hypothes) a necessary condition, but so minimal it's not really informing us about the much more demanding thing
... it measures something more specific/narrow that may be orthogonal or even antagonistic to EA (e.g. interest in overseas charity/poverty specifically [even if it doesn't maximise effectiveness]). Thought experiment: how would a libertarian-leaning AI-safety concerned German EA respond to the questions?

[still, this] seems worthwhile... I'd just be very tentative about inferring anything about what GWWC should do etc

Red team analogy

Red team:

[red team]
I guess it will be interesting to find out through your analysis:
Are these measures predicted by plain altruism + cosmopolitanism (which a priori we might say are more likely to be connected to EA)
Or are these measures predicted by egalitarianism + belief we should repay the third world / belief the rich should help the poor (which seem like they may be less closely connected with EA)*
*of course EAs are overwhelmingly liberal/egalitarian, but liberal/egalitarians are overwhelmingly not EA, which I think is an important complication"

DR and TP response to red team

Good points, and I even think “global redistribution” might rub some actual EAs the wrong way, as well as many EAs rejecting the 'repay our collective guilt' aspect.

Still, GWWC and TLYCS are pushing more for behaviors (esp. giving) than for intellectual alignment with EA. They are also pushing the traditional global poverty part of the EA agenda. I suspect the Fehr/Soep measures will pick up people more receptive to this than to longtermist 'avant garde' EA.
- Thomas: This is the main point to highlight. We probably need to limit our generalizability to the people-oriented neartermist worldview bucket. As the comments above note, I'm not sure this worldview necessarily maps onto the longtermist individual concerned about, say, AI safety risk. However, as you point out, there is still utility in focusing on understanding individuals that have this worldview for GWWC and other EA orgs, and this worldview (according to the EA survey) is currently the largest in the community.
  - DR: Agreed, but we probably need to make sure not to water it down too much; ideally we would retain some notion of 'the importance of prioritization and cost-effectiveness' in the worldview we are targeting
- TP: As you mentioned, it would be interesting to replicate this survey with explicitly EA endorsing individuals. Particularly, in seeing how well the ML model can predict cohorts that fall into the three different worldview buckets.
  - DR: yes but the model that predicts "EA/global poverty supporting types within a general population may be unlikely to predict groups *among explicit EA's*" ... still, the comparison could be interesting (and we've done a bit of this already with the EA survey)
- TP: Also, as a long-term idea, it could be useful to consider developing more EA-oriented items for SOEP-IS (the survey Fehr and colleagues used) that take into account all the issues listed here.
  - DR: that would be great!

One option create or use measures of maximising + cosmopolitanism + altruism (or of maximising cosmopolitan altruism) ... maybe we are getting at 'EA style of thinking'. And if we can show that these more abstract measures are connected to behavioural or otherwise more concrete measures of EA inclination (whether that's decisions/choices, signing up for mailing list or something else) then it does seem reasonable to think of these as capturing EA inclination.
The risk otherwise is that theoretically we think these 3 things correspond to EA thinking... and actually they don't ...
Consider NFC, IRT, Rationality Quotient etc. as predictors of EA-inclination \

Value of incentivized measures here

(DR ideas)

IMO it would be nice to have some meaningful behavioral (incentivized) measures on top of the ‘psych’ ones. The ‘donation to the very poor’ measure in Fehr et al gets at this a bit … although its a pretty small probablistic sacrifice. And I suspect it measures all three of the above except maximizing. And I don’t think these things are all separable, so I think that the fact that it measures ‘altruism and willing to sacrifice in a cosmopolitan-relevant context’ is good.
It would also be pretty nice to have a behavioral/incentivized measure of ‘maximizing in an altruistic context’ …If Fehr ea had asked them to (e.g.) allocate giving among a German poor person, an African poor person, and themselves, this might have been a decent measure.
(We have this choice in some other contexts though … not as rich data but maybe worth digging into). Why might that choice have been better (in some ways) than a hypothetical choice? Because I imagine in a hypothetical choice some people would be like “OK they obviously want me to say support the poor person in Africa, and I see the maximization arguments, so, fine.'But when it involves real money, and even their own money, I expect that for some people, other motives will outweigh the ‘maximizing motive’…“wait, I’d rather keep the money than give it to an African who will waste it”“wait, if this is real, I’d rather help someone local”.

Analysis Plan, sample, and variables under consideration (01/31/22, Ptashnik)

DR: See sidebar comments

Analysis plan

Lasso regression to identify the most salient cluster [DR: how is this defined?] of predictors for effective giving

Bayesian and latent lasso

Similarly, there is latent lasso regression, but most of our constructs have only one indicator and the R package for this analysis also appears to be at a nascent stage.·

Sample

Outcome Variable

Q280 and 281 in the SOEP-IS dataset developed by Fehr et al. (2019)

You were paired with another household in Kenya or Uganda. This household belongs to the poorest 10 percent of households worldwide. Now, you have 50 EUR at your disposal and can split this amount between the other household and you in any way you want. If this task is selected for payout, you will receive the amount you decided to keep at the end of the interview. The amount you want to give the other household will be given in full to the other household (without transaction costs) at the end of the field period by Heidelberg University via a charitable organization. In full means that every given euro will be received by the other household 1:1. A leaflet with information about the donations will be given to you after you have made your decision. I ask you to make this decision alone now.”

“How much of the 50 EUR do you want to keep and how much do you want to give the other household?”

2017 survey questions: https://paneldata.org/soep-is/inst/soep-is-2017-f

Variables Under Consideration

Below I list variables below in terms of what the intended construct I’m trying to get at and the proxy measures that are available within the SOEP dataset.

Theoretical rationale for construct from 'charitable giving' review

There were several variables omitted because I did not think they were relevant or other constructs exist that better get at the underlying effect. ...

Home ownership: Appears to just be an indicator of wealth, so using income is preferrable.

Place of residence and years of residence: Mixed findings and it appears to be a weak predictor regardless.

[DR: I think 'previous failire to find significant effects' shouldn't be reason to exclude!]

Variables held constant by the survey design (see Bekkers & Wiepking 2007 for detailed explanation): Solicitation, benefits, reputation, and efficacy.

Construct *outlined in review articles

Brief Rationale for Inclusion

Items from SOEP

Religious involvement*

“Do you belong to a church or religious group?”

-----------------

“What church or religious group do you belong to?”

Level of education*

“What type of vocational training or university degree did you receive?”

Field of study*

Not available for SOEP-IS

Income*

“How satisfied are you with your household income?”

“How satisfied are you with your personal income?”

----------

“I earned [net income]”

----------

“What do you think is your monthly gross salary in one year?”

Age*

Should be available. I’m waiting for confirmation.

Number of children*

According to ‘My Infratest’, these are the children in your household that were born in 2001 or later. Please state whether these children still live in your household.”

----------

…accompanied by companion question: “Do more children live in your household which were born in 2001 or later?”

Marital status*

“What is your family status?”

Employment*

…could confirm officially unemployed: “Are you registered as unemployed at the Employment Office?”

“What is your current occupational status as a self-employed?”

…closest question I could find that gets at something other than for-profit work: “Do you work for a public sector employer?”

Gender*

Should be available. I’m waiting for confirmation.

Race*

Caucasians generally give more, but this finding is tempered by the cause (non-whites donate more to the poor and religious organizations; Brooks, 2004; Brown & Ferris, 2007; Smith & Sikkink, 1998).

Should be available. I’m waiting for confirmation.

Parental background*

Higher levels of parental education, parental religious

I thought a proxy for parent’s occupational prestige might be a salient predictor. Questions 496-502 cover the mother’s background and have the exact same wording.

Questions split depending on occupation and all contain the header: “What was your father’s occupational status as…”

“A self-employed person?”

“A civil servant?”

“A white-collar worker?”

“A blue-collar worker?”

“What type of school leaving certificate did your father attain?”

“Did your father complete vocational training or a university degree?”

Personality*

Big Five Personality traits:

Agreeableness: “is considerate and kind to others”

Openness to experience: “is eager for knowledge”

The self-control scale. Sample item: “I am good at resisting temptation.” 10-item scale split between two links below.

Cognitive ability*

Persons with higher verbal scores (Bekkers & De Graaf, 2006), IQ (Millet & Dewitte, 2007), GPA (Marr et al., 2005), and ability to think in abstract terms (Levy et al., 2002) donate more.

Innovation exercise to assess emotional intelligence.

“What emotion was shown by the individual? For every emotion, please rate how strongly you perceived it. If you saw a group, please rate the emotion of the individual in the middle.”

For questions assessing quantitative skills (probabilities):

Items 888-928 assess the ability to do expected utility calculations:

Quantitative skills:

“Now answer another question within 20 seconds. Continue the multiplication tables of the base 17 as far as possible. Starting with 17, 34, etc. The time is running - now.”

Context*

Stock market optimism: “Initially we focus on the next year (next 12 months). Do you expect the DAX [German blue-chip index] to show rather profit or loss compared to the current value?”

Numeric version: “Expressed in numbers: What [Profit/Loss] do you expect for the next year overall in percent?”

This same question stem of stock market optimism is used for items about the next two, ten, and thirty years

Occupational prestige*

Generally, positively related to donations (Carroll, McCarthy, & Newman, 2006).

Current occupation (open question):

----------

Occupation (answer choices included):

----------Political orientation*

Each occupation choice is then further refined:

Blue-collar worker

White-collar worker

Civil servant

Apprentice/intern

Self-employed

Political orientation*

Item designed by Fehr et al. (2019):

“In politics people often talk about ‘left’ and ‘right’ to mark different political attitudes. If you think about your own political attitude: Where would you place yourself?”

Locus of control*

Persons with an internal locus of control are more likely to engage in philanthropy and other formal helping behaviors (Amato, 1985).

Ten item scale with the stem: “The following statements describe different attitudes towards life and the future. To which degree do you personally agree with the individual statements?”

Health*

People in better health donate more (Bekkers 2006b, Bekkers & De Graaf, 2006).

“How would you describe your current health?”

“How satisfied are you with your health?”

Mood*

Short scale of emotions (angry, afraid, happy, sad):

Values*

----------

Prosocial work values, particularly of interest: “Socially responsible and important work” and “Having much influence.”

Previous donations*

Charitable giving is to some extent habitual behavior (Barrett, 1991; Barrett et al., 1997).

Not available for SOEP-IS

Optimism

Belief that the future could be better might provide motivation to influence it in becoming better.

“When you think about the future, are you…”

Likelihood of events (e.g., financially successful, not get any serious illness, successful at work, content in general) happening compared to other people the same age and gender.

Life satisfaction

“In conclusion, we would like to ask you about your satisfaction with your life in general. How satisfied are you with your life, all things considered?”

Risk propensity

Stem: “What do you think about yourself: How prepared to take risks are you in general?”

“not ready to take risk at all ... ready to take risk”

“What did you think of when you made your estimate (i.e., the value) regarding your preparedness to take risks?”

DR comments:

A very interesting list of features
were these all asked before the charity questions? (I'm worried about reverse causality otherwise)
maybe remove 'unavailable' rows for space\

We should discuss how the fitted model will be used and interpreted ... maybe identifying a few collections of useful subsets:

EA market testing (public)

Introduction/overview

Introduction & explanation

What is the "EA Market Testing Team"?

We believe this is the first organized collaboration of its kind. We aim to...

What is our mission? Updating...

Goals and FAQ

What have we accomplished?

"Testimonials"

How to get involved?

Next, check out the Gitbook content overview.

Meet the team

Content overview

Introduction

Five Key Sections

1) Partner Organizations and Trials 🤝

2) Marketing & Testing: Opportunities, Tools, Tips 🪧

3) Research Design and Methodology 🎨

4) Profiling and Segmentation Project 🧮

5) (In)effective Altruistic Choices: Review of Theory and Evidence 📋

You can find references, tech support, and other resources in the appendix.

What is this 'Gitbook' meant for?

Progress/goals (early 2023)

Partners, contexts, trials

Introduction

Our primary approach and goals

Key themes, priorities, and 'high-value questions'

Giving What We Can

Presentation: overview

Ideas and opportunities

Stages of the funnel:

Some key questions

Completed studies: See sections below

Pledge page (options trial)

Summary of trial and results

General idea and main hypothesis

Background and context

Points of contact

Timing of trial (when will it/did it start and end, if known)

Digital location where project 'lives'

Environment/context for trial

Participant universe and sample size

Key treatment(s)

Treatment assignment procedure

Outcome data

Ex-post: Reporting results (brief)

Implementation and data collection

Basic results/outcomes

Quick interpretation

These differences are meaningful–consider the 'posteriors':

Intuitive interpretation

Caveats

Giving guides - Facebook

Summary

Key caveats

Message Test (Feb 2022)

Summary

Pre-trial reporting template

General idea, main 'hypothesis' (if there is one)

Point of contact (at organization running trial)

Timing of trial (when will it/did it start and end, if known)

Digital location where project 'lives' (planning, material, data)

Environment/context for trial

Participant universe and sample size

Key treatment(s)

Treatment assignment procedure

Outcome data

YouTube Remarketing

YouTube Remarketing

Understanding assignment, proposing experimental design @Joshua Lewis’s questions:

Results summary (Early, JS Winchell; may need update)

One For the World (OftW)

Background on OftW

Some key goals

Who/what/how to test, learn, and adapt

Pipeline/groups/segments

Goals/actions

Interested in knowing more about

Communications contexts

Typical donor journey:

What is our mission? ...