šŸ’”
EA market testing (public)
  • Introduction/overview
    • Introduction & explanation
    • šŸ‘‹Meet the team
    • šŸ“•Content overview
    • Progress/goals (early 2023)
      • EAMT progress & results
      • Goals, trajectory, FAQs
  • šŸ¤Partners, contexts, trials
    • Introduction
    • Giving What We Can
      • Pledge page (options trial)
      • Giving guides - Facebook
      • Message Test (Feb 2022)
      • YouTube Remarketing
    • One For the World (OftW)
      • Pre-giving-tues. email A/B
        • Preregistration: OftW pre-GT
    • The Life You Can Save (TLYCS)
      • Advisor signup (Portland)
    • Fundraisers & impact info.
      • ICRC - quick overview
      • CRS/DV: overview
      • šŸ“–Posts and writings
    • University/city groups
    • Workplaces/orgs
    • Other partners
    • Related/relevant projects/orgs
  • 🪧Marketing & testing: opportunities, tools, tips
    • Testing Contexts: Overview
    • Implementing ads, messages, designs
      • Doing and funding ads
      • Video ads/Best-practice guidelines
      • Facebook
      • Targeted ad on FB, with variations: setup
    • Collecting outcome data
      • Facebook ads interface
        • Pivot tables
      • Google analytics interface
      • Google A/B, optimize interface
      • Reconciling FB/GA reports
      • Survey/marketing platforms
    • Trial reporting template
  • šŸŽØResearch Design, methodology
    • Methods: Overview, resources
    • "Qualitative" design issues
    • Real-world assignment & inference
      • Geographic segmentation/blocked randomization
      • Difference in difference/'Time-based methods'
      • Facebook split-testing issues
    • Simple quant design issues
    • Adaptive design/sampling, reinforcement learning
    • 'Observational' studies: issues
    • Analysis: Statistical approaches
  • 🧮Profiling and segmentation project
    • Introduction, scoping work
    • Existing work/data
      • Surveys/Predicting EA interest
      • Awareness: RP, etc.
      • Kagan and Fitz survey
      • Longtermism attitudes/profiling
      • Animal welfare attitudes: profiling/surveying
      • Other data
    • Fehr/SOEP analysis... followup
      • Followup with Thomas Ptashnik
    • Further approaches in progress
      • Profiling 'existing traffic'
  • šŸ“‹(In)effective Altruistic choices: Review of theory and evidence
    • Introduction...
    • The challenge: drivers of effective/ineffective giving
      • How little we know...
    • Models, theories, psych. norms
    • Tools and trials: overview
      • Tools/interventions: principles
      • Outcomes: Effective gift/consider impact)
        • (Effectiveness information and its presentation)
        • (Outcome: Pledge, give substantially (& effectively))
          • (Moral duty (of well-off))
        • Give if you win/ conditional pledge
      • Academic Paper Ideas
  • Appendix
    • How this 'gitbook' works
      • Other tech
    • Literature: animal advocacy messaging
    • Charity ratings, rankings, messages
    • "A large-scale online experiment" (participants-aware)
  • Innovationsinfundraising.org
Powered by GitBook
On this page
  • Facebook trials: "divergent delivery" --> limited inference
  • Divergent delivery and "the A/B test deception"

Was this helpful?

Edit on GitHub
Export as PDF
  1. Research Design, methodology
  2. Real-world assignment & inference

Facebook split-testing issues

Facebook trials: "divergent delivery" --> limited inference

The main point

Facebook serves each ad variation to the people it thinks are most likely to click on it.

Thus, in comparing one ad variation to another... you may learn:

  • "Which variation performs best on the 'best audience for that variation' (according to Facebook)"

  • But you don't learn "which variation performs better than others on any single comparable audience."

Update 4 Oct 2022: We may have found a partial solution to this, with ads targeting 'Reach' rather than optimizing for other measures like 'clicks'. We are discussing this further and will report back.

Researchers are interested in running trials using Facebook ads. However, inference can be difficult. Facebook doesn't give you full control of who sees what version of an advertisement.

  1. With A/B split testing etc: They have their own algorithm, which presumably uses something like Thomson sampling to optimize for an outcome (clicks, or a targeted action on the linked site with a 'pixel'). Statistical inference is challenging with adaptive designs and reinforcement learning mechanisms. As the procedure is not transparent, it is even more difficult to make statistical inferences about how one treatment performed relative to another.

  2. Segmentation and composition of population: Facebook's 'PageRank' algorithm determines who sees an ad. I don't think you can turn this off.

    1. We haven't found a way to be able to set it to "show all versions of an ad to comparable populations"

    2. (And even if you could, it would be difficult for you to specifically describe "which population" your results pertain to.)

Divergent delivery and "the A/B test deception"

Further notes

Orazi, D. C., & Johnston, A. C. (2020). Running field experiments using Facebook split test. Journal of Business Research, 118, 189-198.

"Haven’t heard of an update since. They do something to mitigate the effects of targeting different audiences with the different treatments, but it’s still not quite random assignment"

"Bottom line: good news, bad news. I'm confirming that you're right: The "latest best possible settings" are still not giving you results that reflect the random experiment that a researcher in consumer psychology or advertising would be expecting. But the problems are worse than they may have seemed to you initially."

Notes on Facebook ā€œLift tests/Lift Studiesā€ with ’Multiple Test Groupsā€

Do Facebook ā€œLift tests/Lift Studiesā€ with ’Multiple Test Groupsā€ give us the freedom we want to …

  • Randomize/balance different ad content ā€˜treatments’ to comparable groups?

  • Make inferences about ā€˜which treatment (ad) performs better, holding the audience constant’?

No. ****Josh: "what it says is something importantly different: you can compare the number of people who do the action you are interested in ... according to whether or not they see a given ad. So, you do have random assignment when comparing the effect of an ad to the effect of no ad. ... if we compare the lift for two different treatments (What these multi-cell lift tests are doing), we are doing almost exactly the same thing as we were without the lift functionality...

A and B are displayed to different audiences, so this test does not have random assignment."

Essentially this allows you to get the correct 'lift' of A and B, on their own distinct audiences, by getting the counterfactual audiences for each of these correct. But you cannot compare the lift of A and B on any comparable audience.

To help understand the context... "Facebook often randomizes the whole audience into different cells and THEN targets the ad WITHIN that audience. So there is random assignment at the initial stage, but that's irrelevant, because not everyone in the potential audience sees each ad"\

PreviousDifference in difference/'Time-based methods'NextSimple quant design issues

Last updated 2 years ago

Was this helpful?

See ":"

šŸŽØ
ā€˜Meta for developers’ on Lift Tests
The A/B Test Deception: Divergent Delivery, Response Heterogeneity, and Erroneous Inferences in Online Advertising Field Experiments
Logo
Inferno: A Guide to Field Experiments in Online Display Advertising
Logo