【中英双语】营销组合建模正卷土重来

朱利安·朗格(Julian Runge) 哈普里特·帕特(Harpreet Patter) 伊戈尔·斯科坎(Igor Skokan) | 文  

2024年03月28日 09:25  

A New Gold Standard for Digital Ad Measurement?

自从尼尔·博登(Neil Borden)在1949年创造了“营销组合(marketing mix)”这个术语以来,企业一直在寻找方法来分析和完善他们营销和推广其产品的方式。在很长一段时间里,针对这个问题的主要分析方法是“营销组合建模”,它使用总的销售和营销数据来建议对公司的营销工作进行战略调整。但在数字广告衡量领域,这种方法在很大程度上被认为是一个过时的庞然大物,很容易被新技术赋予的即时、精确和确定性属性击败。

Ever since Neil Borden coined the term “marketing mix” in 1949, companies have searched for ways to analyze and refine how they market and promote their products. For a long time, the leading analytic approach to this problem was “marketing mix modeling,” which uses aggregate sales and marketing data to suggest strategic adjustments to a firm’s marketing efforts. But in the realm of digital ad measurement, this approach was largely taken for an outdated behemoth, easily outmaneuvered by the immediate, precise, and deterministic attribution new technology enabled.

 

然而,营销组合建模现在正卷土重来。

Now, however, marketing mix modeling is making a comeback.

 

原因何在?一方面,数字广告生态系统的根本变化——比如,苹果公司对广告商能够追踪的内容的新限制——意味着用户层面对数字广告效果的确定性衡量只会变得更具挑战性。随着这些数据的枯竭,那些不适应的企业有可能突然发现自己一无所知。在这种新形势下,营销组合模型(MMM)有一种特有的优势: 它们能够纯粹从总体数据的自然变化中产生可靠的衡量和见解,而且不需要用户层面的数据。

Why? For one, fundamental changes to the digital ads ecosystem — such as Apple’s new limits on what advertisers are able to track — mean that deterministic user-level measurement of digital advertising effects is only going to get more challenging. As this data dries up, companies that don’t adapt run the risk of suddenly finding themselves in the dark. In this new landscape marketing mix models (MMMs) have a specific advantage: They’re able to produce dependable measurements — and insight — purely from natural variation in aggregate data, and don’t require user-level data.

 

然而,让MMM成为你的营销分析工具包的一部分并不像按开关那样容易。在错误的条件下,如果没有仔细的指导,它们可能不精确,而且可能误导公司的营销决策。

Making MMMs part of your marketing analytics toolkit isn’t as easy as flipping a switch, however. Under the wrong conditions and without careful guidance they can be imprecise and can misinform a company’s marketing decisions.

 

希望开始——或重新开始——使用MMM的企业需要使用广告实验来调整他们的数字营销方法。我们对数字广告商进行的一系列实地研究表明,有必要使用实验校准模型这一程序来减轻MMM评估中的潜在不精确性。在本文中,我们会深入探讨为何你应该做这件事,以及如何才能做到——并在数字广告衡量的新格局中蓬勃发展。

Companies that want to start — or restart — using MMMs need to use ad experiments to dial in their digital marketing approach. A set of field studies that we conducted with digital advertisers suggests that the process of using experiments to calibrate models is needed to alleviate potential imprecisions in MMM’s estimates. In this article, we dive into why you should, and how you can, do just that — and thrive in the new digital ad measurement landscape.

 

为何实验很重要

Why Experiments Are Important

 

MMM之所以出色,是因为它们利用总数据处理工作。可是,当你的广告策略与相关的关注动态、竞争动态在不同的广告渠道中差别很大时,MMM可能会步履维艰。高度个性化的广告活动,就像在数字渠道上经常使用的那样,会使这后一点变得特别突出。然而,有一种方法可以解决这个问题:通过实验校准来完善你的MMM,在一个得到充分理解的衡量计划指导下,你对它所提供的信息可能会感到更加自信。

MMMs are great because they work with aggregate data. But they can struggle when your ad strategies and related attentional and competitive dynamics vary a lot across ad channels. Highly personalized ad campaigns, as are often used on digital channels, can make this latter point particularly salient. There’s a way to account for this, however: by refining your MMM through experimental calibration, guided by a well-understood measurement plan, you can feel more confident in the information it’s giving you.

 

我们是怎么知道这一点的?在过去的两年里,我们对北美和欧洲的应用程序广告商进行了18个案例研究,对基于MMM的衡量和基于实验的衡量进行了比较。我们发现了一些重要的见解。

How do we know this? Over the last two years, we conducted 18 case studies with app advertisers in North America and Europe, comparing MMM-based with experiment-based measurements. We found a few important insights.

 

首先,通过广告实验进行校准是有效果的。在我们的案例研究中,校准将基于MMM的广告支出回报率估值平均修正了15%。其他报告发现,在包括快销消费品、家用电器、电信、房地产和汽车等众多垂直行业中,以及在包括亚太、美国、巴西、俄罗斯和南非的众多地区,平均校正率为25%。

First, calibration via ad experiments pays off. In our case studies, calibration on average corrected MMM-based return-on-ad-spend estimates by 15%. Other reports have found an average calibration correction of 25% across a multitude of verticals, including fast-moving consumer goods, home appliances, telecommunications, real estate, and automotive, and across a multitude of regions, including APAC, the U.S., Brazil, Russia, and South Africa.

 

第二,针对性越窄的数字广告似乎需要越多的校准。美国的定制受众广告需要的整体校准调整最高,达到56%。这表明,只依赖少数渠道的企业和拥有小众细分市场的小品牌可能希望更频繁地进行实验以完善他们的模型。

Second, more narrow targeted digital ads appear to require more calibration. Custom audience ads in the U.S. required the highest overall calibration adjustment of 56%. This suggests that companies that rely on just a few channels and smaller brands with niche market segments may want to run experiments to refine their models more frequently.

 

你可以期待未来会进行的广告实验

Ad Experiments You Can Expect to Run in the Future

用户层面的精确广告实验正四面楚歌,就像用户层面的广告衡量一样。随着在网站和应用程序中确定性地观察用户行为的能力下降,广告实验要么需要注重现场结果(如浏览量、点击量和其他现场指标),依靠差分隐私来对非现场结果和现场行为进行匹配,要么利用所谓的集群随机化。通过集群随机化,实验性广告的分配不再控制在用户层面,而是在地理区域等不甚精细的尺度上。

Precise user-level ad experiments are coming under siege the same way that user-level ad measurement is. As the ability to deterministically observe user behavior across websites and apps decreases, ad experiments will either need to focus on on-site outcomes (such as views, clicks, and other on-site metrics), rely on differential privacy to match off-site outcomes with on-site behavior, or make use of so-called clustered randomization. With clustered randomization, assignment of the experimental ads is no longer controlled at the user level, but at less granular scales, such as geographic regions.

 

比如,在地域广告实验问题上,身处某些邮政编码区、指定市场区域、州、甚至国家的消费者会看到实验性的广告活动,而其他地区的消费者则看不到。有活动的地理单元与没有活动的地理单元之间在销售和品牌认知上的差异被用来衡量实验性广告的逐步影响。地域广告实验可以提供一个基本事实作为校准MMM的依据。这种方法在谷歌和Meta的衡量套件中提供,长期以来一直用于电视广告,并已被Asos等领先的数字广告商采用。

For example, with geo ad experiments, consumers in certain ZIP codes, designated market areas, states, or even countries will see experimental ad campaigns, and consumers in others will not. Differences in sales and brand recognition between exposed and non-exposed geo units are used to measure the incremental impact of the experimental ads. Geo ad experiments can provide a ground truth to calibrate the MMM against. This approach is offered in Google’s and Meta’s measurement suites, has long been used in TV advertising, and has been adopted by leading digital advertisers such as Asos.

 

在一个更受数据限制的数字广告环境中,广告实验的其他途径可以通过差分隐私等技术来实现。差分隐私允许在不同的数据集之间进行信息匹配(在不同的应用程序和网站上观察到的),而不透露个人的信息。然后,在一个应用程序/网站上诱发的随机性(在某一数据集中)可以与在另一个应用程序/网站上(在另一个数据集中)观察到的购买等结果相匹配。

Other avenues for ad experimentation in a more data-constrained digital advertising environment may come via technologies such as differential privacy. Differential privacy allows for matching of information between different datasets (observed on different apps and websites) without revealing information about individuals. Randomization induced on one app/website (in one dataset) could then be matched to outcomes such as purchases observed on another app/website (in another dataset).

 

校准MMM

Calibrating an MMM

那么,你如何才能使用广告实验来校准你的MMM呢?我们想要强调三种校准方法,它们在执行的宽严尺度方面有所不同:

So how can you use ad experiments to calibrate your MMM? We would like to highlight three ways for calibration that differ in rigor and ease of implementation:

 

1. 比较MMM和广告实验的结果,以确保它们“相似”。这种方法是定性的,易于实施。 相似可能意味着,至少,两种方法会选择同样吸引人的广告变量/策略,或者意味着二者在方向上一致。如果结果不同,就对MMM进行调整,直到达到一致。

1. Compare the results of MMM and ad experiments to ensure that they are “similar.” This approach is qualitative and easy to implement. Similar can mean that, at a minimum, both approaches pick the same winning ad variant/strategy or that the two directionally agree. Should results be dissimilar, tweak and tune the MMM until agreement is achieved.

 

2. 利用实验结果在模型之间做出选择。作为对此定性方法更严格的扩展,营销分析团队可以建立一个不同模型的集合,然后决策者可以选择与广告实验结果最一致的模型来获得吸引人的关键结果(比如,每次逐步换算的成本)。

2. Use experiment results to choose between models. As a more rigorous extension to the qualitative approach, the marketing analytics team can build an ensemble of different models, then decision-makers can pick the one that agrees most closely with the ad experiment results for the key outcome of interest (e.g., cost per incremental conversion).

 

3. 将实验结果纳入MMM。这里,实验结果被直接用于对MMM的估计,而不仅仅是用来与MMM输出进行比较(上述第1点)或帮助进行模型选择(上述第2点)。这样做需要对统计建模有更深刻的理解。实验结果要么可以作为一个先行因素输入MMM(比如,假如你使用了贝叶斯模型),要么可以用来对模型的系数规定一个允许的范围。比如,假如你在特定渠道上的广告实验显示,广告支出的回报率为150%,而置信界限的下限为120%,上限为180%;你可以“强制”让该渠道的MMM系数估值处于该范围内。

3. Incorporate experiment results into the MMM. Here, the experiment results are used directly in the estimation of the MMM and not just to compare with the MMM output (#1 above) or to help with model selection (#2 above). Doing so requires a deeper understanding of statistical modeling. The experiment results can either enter your MMM as a prior (e.g., if you use a Bayesian model), or they can be used to impose a permissible range on the model’s coefficients. For example, say your ad experiment on a specific channel shows a 150% return-on-ad-spend with a 120% lower and 180% upper confidence bound; you can “force” your MMM coefficient estimate for that channel to be within that range.

 

第三种方法是最严格的,但也是最难实施的策略。如果你选择采用它,我们建议你将其与第二种方法结合使用。换言之,1) 确定一组候选模型,这些模型相对于实验输出会产生合理的评估;2) 将实验结果纳入MMM评估;3) 根据其他实验结果和专家评估,选择可以产生最平衡的结果的模型。

The third approach is the most rigorous, but it’s also the most difficult strategy to implement. If you choose to adopt it, we recommend doing so in conjunction with the second approach. In other words, 1) identify a set of candidate models that produce reasonable estimates vis-à-vis the experiment output; 2) incorporate the experiment results in MMM estimation; and 3) pick the model that produces the most balanced results against other experiment results and expert assessments.

 

在校准你的MMM时,还要注意MMM和实验运行可能在范围上有差别——比如,所有广告与仅限在线广告——也要留意可能存在互动效应——比如,线上、线下广告与销售之间,反之亦然。同时,要注意广告库存等动态效应。

When calibrating your MMM, also be mindful that MMM and experiment runs can be different in scope — for instance, all advertising vs. online only — and that there can be interaction effects — for instance, between online and offline ads and sales and vice versa. Also, be aware of dynamic effects such as ad stock. 

 

你应该多久校准一次?

How Frequently Should You Calibrate?

 

这是一个重要却又棘手的多层面问题。那些深深接受渐进衡量的广告商可能会选择一个“永远开机”的解决方案。在这种方案中,广告始终通过实验进行验证。这种方法对于那些大型跨国公司来说是十分有效,它们能够负担得起在任何特定的时间、在选定的地理区域“进入休眠”。根据我们过去几年与数字广告商的合作中所看到的情况,我们试图拼合一个粗略而简单的矩阵,以对校准频率的决定提供信息。

This is an important, but tricky and multifaceted question. Advertisers who deeply embrace incrementality measurement may choose an “always-on” solution where advertising is consistently experimentally validated. This approach can work well for large international companies that can afford to “go dark” in select geographies at any given time. Based on what we’ve seen over the last years working with digital advertisers, we’ve tried to put together a rough-and-simple matrix to inform decisions on calibration frequency.

该表旨在为刚接触实验校准MMM及基于MMM进行渐进衡量的营销人员提供一个粗略的指导——请对其持谨慎态度。根据我们的经验,并基于我们开展过的案例研究,你的广告越有针对性,你的广告策略越小众,你就越希望确保通过实验对支持你在某一渠道上进行营销决策的MMM进行校准。此外,你在某一渠道上花的钱越多,你资金的风险就越大,因此,对于广告支出较高的渠道,你应该确保更频繁地校准你的MMM。

The table aims to provide a rough guide to marketers new to experimental calibration of MMMs and MMM-based incrementality measurement — take it with a grain of salt. In our experience, and based on the case studies we’ve run, the more targeted your ads and the more niche your ad strategy, the more you want to make sure to experimentally calibrate the MMM supporting your marketing decisions on a channel. Further, the more you spend on a channel, the more money you put at risk, and hence, for channels with higher ad spend, you will want to make sure to calibrate your MMM more frequently.

 

企业应该根据他们对机构的了解、对运营的持续见解和优先事项,仔细审查、调整和充实这份指南。在任何情况下,在“不那么重要”的时候(所以,不要在销售旺季、新产品发布或超级碗之类的重大外部事件期间)以及在不太处于某一品牌广告战略中心位置的地方进行实验,这样做可能很有道理。

Companies should scrutinize, adapt, and enrich this guidance based on their institutional knowledge and ongoing operational insights and priorities. In any case, it can make sense to run experiments during “less-important” times (so, not during peak sales seasons, new product launches, or big external events such as the Superbowl) and in locations that are less central to a brand’s advertising strategy.

. . .

由于隐私权的进步从根本上改变了数字广告衡量的格局,我们建议将MMM作为营销分析工具包的一个关键部分。有一些优秀的供应商在销售或多或少可以即插即用的解决方案。此外,如果你不具备预先存在的MMM内部专业知识,有经验的顾问可以帮助你成功地与供应商整合,并建立一个内部基准模型。特别是在你严重依赖在线广告的情况下,请定期使用广告实验来校准你的MMM,以确保你衡量的准确性,确保你的数字营销决策有充分的根据。

As privacy advances fundamentally change the digital ad measurement landscape, we recommend embracing MMM as a key part of the marketing analytics toolbox. There are good vendors selling more or less plug-and-play solutions out there. Additionally, if you don’t harbor pre-existing internal MMM expertise, an experienced consultant can be helpful to successfully integrate with a vendor and set up an internal baseline model. Especially if you rely heavily on online advertising, regularly calibrate your MMM using ad experiments to make sure your measurements are accurate and your digital marketing decisions are well-informed.

 

如上所述,MMM和实验校准的结合很可能成为数据受限的在线环境中进行广告衡量的“新黄金标准”。至少,它提供了可靠和有效的衡量,直到差分隐私和可互操作的私人归因等新兴技术在数字广告衡量中真正站稳脚跟。

The combination of MMM and experimental calibration as described above may well become a “new gold standard” for ad measurement in data-constrained online environments. At a minimum, it provides reliable and effective measurement until nascent technologies such as differential privacy and interoperable private attribution gain a true foothold in digital ad measurement.

 

朱利安·朗格是一位行为经济学家和数据科学家。在三家消费科技公司组建数据科学团队后,他曾担任斯坦福大学和杜克大学的研究员,现在是波士顿东北大学的营销学助理教授。朱利安在应用数据科学与学术数据科学的联系方面发表了大量文章。他还建议企业如何利用算法、行为科学和实验来建立和发展更好的数字体验。

哈普里特·帕特是总部位于加利福尼亚州门洛帕克的Meta公司全球“营销科学”(Marketing Science)团队的一员。在加入Meta之前,哈普里特接受的是数学和计量经济学教育,在英国的银行部门担任过各种数据科学职务。他专事于研究因果推理框架、增益模型和广告实验。

伊戈尔·斯科坎是伦敦Meta公司全球“营销科学”团队的一员。他具有核科学和数学背景,在加入Meta之前曾在宏盟集团(Omnicom)各机构担任高级分析职务。他的工作和研究重点是当代计量经济学方法、整体归因和广告实验的交叉领域。

时青靖 | 编辑

最新评论
  • 梅普
    aggregate sales
更多相关评论