Beware value capture

Tech apps influence our behaviour through metrics. Don't follow them blindly.

Trungphan2

Jan 19, 2024

Thanks for subscribing to SatPost.

Today, we will talk about how the design of tech apps influences our behaviours and values.

Also this week:

MBAs can’t find jobs
Apple Vision Pro problems
…and them fire posts (including Peacock app)

AI-generated image prompt: “Someone looking at 100 metrics, behind” (DALL-E)

One of my favourite online hobbies is downloading 50+ page academic papers.

Do I read them? Mostly no. Instead, the PDF files eat up precious memory on my 2014 MacBook Air ~~along with pirated TV shows~~ and taunt me as a try-hard.

There is one exception: I have read a number of papers by C. Thi Nguyen, a philosophy professor from the University of Utah. Nguyen’s work primarily focuses on game design and “ways that our social structures and technologies shape how we think and what we value.”

A recent paper titled “Value Capture” looks at how certain metrics — which are often “simplified, standardized, and quantified” — can overly influence our behaviour in ways we might not actually want.

The term is a riff on “regulatory capture”, which is when a government regulator sets policies that align more with industry it’s overseeing rather than the public interest (example: officials at the Federal Communications Commission (FCC) lightly regulating the telecoms industry because they intend to get jobs at these corporations afterwards).

Nguyen cites a few examples of how people internalize values based on metrics created by an external party. To emphasize the point, I’ll include one that is not directly tech-related (law school) and two that are (FitBit, Twitter/X).

US Law Schools

What is the metric? The US News & World Report Law School Rankings.
What is the goal? To rank the quality of law schools so that students can make informed career choices.
How does value capture happen? Students and prospective legal employers use school rankings as the key criteria for hiring decisions. The rankings are a useful heuristic, but Nguyen writes that the over-reliance on this metric diminishes the "personalized process of evaluation" in which "some students might care about the research prowess of the faculty, while others care about the school's connections to high-paying corporate firms and still others about the school's strong support for social activism." A lot of people are making 5+ year commitments (time studying, clerking, first job) based on a single metric, which is kind of wild when you think about it. Meanwhile, the schools themselves prioritize resources for things that improve their rankings rather than what is best for legal education. To be sure, this happens across countless colleges and universities (not just law schools).

FitBit

What is the metric? Step count.
What is the goal? To incentivize activity, movement and exercise.
How does value capture happen? Nguyen tells a relatable story about a friend who went on vacation with his partner and another couple. The other couple had FitBit step goals and — even though they were on vacation — would skip dinners and outings if they hadn’t hit their daily step count. In this case, the couple allowed the value embedded in FitBit (steps count) to override the value that usually comes with vacations (creating new memories, hanging out with friends, smashing onion rings etc.).

Twitter/X

What is the metric? Likes and retweets.
What is the goal? Engagement.
How does value capture happen? People want to learn new ideas or persuade others on important topics. Yet, to achieve that sweet engagement — which delivers dopamine through likes and retweets — it is much better to post extreme takes on hot-button issues rather than a sober long-form analysis. This phenomenon happens across all types of media. Remember the old newspaper aphorism, "if it bleeds, it leads" (meaning people gravitate to news stories about sex, violence, and tribal issues). The instant engagement is so addictive that a user internalizes the value system of Twitter (likes, retweets) over deep and nuanced thinking (long-form writing).

Why do these metrics work? Because they are easy to understand and track.

Why is ease of understanding and tracking important? Because consumer-facing products are more addictive when gamified. And gamification relies on metrics such as points, leveling up, rankings, rewards or badges.

Also, well-constructed games — which provide increasingly difficult challenges — engage dopamine pathways to keep users motivated.

But do we actually want the reward that a metric is motivating us to strive for? If not, then it could be seen as a form of value capture, as Nguyen writes, and could potentially be harmful "by making the formulation of our values responsive, not to our own interests, psychology, and experience, but to the interests of large-scale institutions."

To understand why large-scale institutions — such as a university or tech firm — create these types of metrics, we turn to the work of political scientist James C. Scott. In his 1999 book “Seeing Like A State”, Scott explains how government bodies have increased the “legibility” of their states over the centuries.

TLDR: in pre-modern societies, the governing authority didn’t actually know much about its citizens (eg. who owned how much land, who had how much wealth). Over time, various identifiers were created to track such details including last names, population censuses and land registries. The result is that the population became more “legible” and the government could perform activities such as levying taxes, providing services and creating a land-holding system (modern cities are built like grids because it is more legible).

While “legibility” allows the state to govern larger towns and cities, it also makes the governing process very impersonal (we are all literally a number — social security, driver license’s — in a database).

There is clearly a tension between what large-scale institutions want and what individuals want.

Let’s look at Twitter/X to see why the need for legibility can lead to value capture.

Thankfully, another banger paper from Nguyen is about the app and is entitled “How Twitter Gamifies Communication”.

One important idea: Twitter/X flattens the reward structure for how people communicate with each other.

What does “flattening reward structures” mean?

First, we need to understand that in order to manage, measure, and track the sentiment of over 300 million users, Twitter/X can only have so many buttons on its app. "Likes" and "retweets" are the most prevalent interactions, while "replies" and "quote-retweets" require more effort and are used much less.

Second, let me walk you through two types of tweets. The first one is an extremely dumb meme that makes you laugh for about two seconds. The second one is a deeply philosophical musing that you’re still thinking about two weeks later. Even though the second one was more impactful, you probably did the same action for both: pressed the like or retweet button.

This is a flat reward structure, because it uses the same action even though one piece of content is significantly more impactful.

In a podcast with Ezra Klein, Nguyen uses the example of the movie review site Rotten Tomatoes to explain the problem with flattening rewards.

[Matt Strohl has this incredible blog post] that really helped me think about these things. And the blog post is called “Against Rotten Tomatoes”.
…the way that Rotten Tomatoes aggregate scores is they don’t care about whether someone had a profound experience. All it cares about is whether the review was slightly in the positive or slightly on the negative…
…if you go on Rotten Tomatoes and it just aggregates things and it just compares, then something that’s divisive like that will show up as a 50%, which is a failure. On the other hand, some movie where everybody vaguely likes it just a little bit— they’re like, oh, that was fine. That was pretty good. That was entertaining enough. If everyone has that same reaction, Rotten Tomatoes registers that as 100% likes and that rises to the top.
What you can see happening in the Rotten Tomatoes case is that all these rich, qualitative reactions are flattened because they’re passing through this binary data collection filter.

For many, the goal of using Twitter/X is to go viral and gain new followers.

Achieving these goals and creating good content are often in conflict due to the flattened reward structure. Consider the viral — but kind of crappy — content you have seen, such as clickbait, listicles, or life-hack threads.

I have been very guilty of posting cringey-ass stuff to gain new followers. Often going for engagement rather than posting something I truly care about. That's value capture right there.

The desire for engagement drives quirky behavior across all social media. Take a look at the gym thirst traps on Instagram. The bizarre thumbnails on YouTube. The cringe-worthy humblebrags on LinkedIn. And the guaranteed-to-cause-injury challenges on TikTok.

It's difficult to resist engaging in this wacky behavior due to the expert design of these apps. Never forget that these platforms are run by the smartest and best-paid behavioral psychologists, UX/UI designers, and software engineers in the world.

This genre of content only exists due to social media incentives and has also been called audience capture or algorithm capture.

My heavy usage of Twitter/X has definitely resulted in some type of value capture and I've noticed changes in my daily life:

Always searching for “content”: While reading a book, watching a film or listening to a podcast, my mind is constantly thinking “oh, that would get some sweet sweet engagement” (instead of just purely enjoying the activity).
Looking for a meme: Related to the previous point, I’m always thinking about how to turn a news story into a meme template (download the Mematic app if you want this obsession, too).
The itch: If I go a few days without posting a banger — in the ballpark of 5,000 likes or more — I’ll start to stress out and scramble to find some viral content (embarrassing).
Argumentative: I've always enjoyed healthy debates, but I've noticed that my fuse has been shorter since I've been more active on Twitter/X. The stimuli-reaction loop moves much faster online, and I have conditioned myself to respond quickly. When you add in the fact that many replies can be triggering, my brain defaults to thinking "this person is trolling me and I'll respond in kind," even in real life conversations with friends (yes, I realize this isn't cool).
Time allocation: I frequently break up deep work sessions to see if I can score a quick viral tweet. The value in writing an article worth reading and sharing (which takes time) is overtaken by the allure of some instant dopamine (one low-hanging meme away).
Slot machine effect: Every single time I post, my brain enters a 10-15 minute fog as I watch the early engagement come in. The slot machine effect hooks people through variable rewards. Each play has a different payout and it is the anticipation that keeps us playing. Otherwise, boredom sets in. Smartphone apps have all incorporated this technique and, boy, does it work.

But here is the thing.

I have always been semi-addicted to some form of digital dopamine that has a "number go up" aspect. It used to be Facebook (scrolling through the newsfeed and posting lit photos from university pub crawls). Then gaming (that FIFA life). Then fantasy sports (lost money here). Then online poker (lost a lot of money here). Then day trading (lost even more money here).

I will probably always be chasing some metrics and just have to deal with them better.

Andy Grove — the legendary 3rd CEO of Intel — wrote about the need to balance metrics (or in his parlance “indicators”):

“Indicators tend to direct your attention toward what they are monitoring. It is like riding a bicycle: you will probably steer it where you are looking. If, for example, you start measuring your inventory levels carefully, you are likely to take action to drive your inventory levels down, which is good up to a point. But your inventories could become so lean that you can’t react to changes in demand without creating shortages. So because indicators direct one’s activities, you should guard against overreacting. This you can do by pairing indicators, so that together both effect and counter-effect are measured.”

Grove's text was intended for business managers. However, there are two useful ideas here for dealing with value capture: 1) "guard against overreacting"; and 2) "pairing indicators".

Basically, can you balance what an app wants you to do with what you really want?

Despite all of the value capture mentioned about Twitter/X, the platform has been a huge net positive for me.

I’ve learned a ton, laughed my ass off and met so many interesting people including many of you readers. Compared to Twitter/X, none of the aforementioned digital distractions — FIFA, fantasy sports, day trading — has provided the same laughs, network, audience and insights (history, art, pop culture, tech, finance and “8 secret Microsoft Excel hacks I should have known yesterday”).

I know that many users have a love-hate relationship with Twitter/X, but the experience really comes down to curating your feed and choosing where to engage. I avoid politics and if I notice that an exchange is headed towards a fruitless, time-consuming argument, I simply move on (I'm willing to exchange ideas but replies on Twitter/X can quickly turn ad hominem).

Chasing metrics isn’t inherently good or bad. The key is knowing the game you’re playing.

As the saying goes, “play stupid games, win stupid prizes.”

These are the questions to ask about the metrics you are pursuing: What is the reward? Does it incentivize behaviors that align with who you are? Do the pros (connection, learning, opportunities, fulfillment) outweigh the cons (addiction, stress, productivity)?

Charlie Munger famously said, "Show me the incentive and I'll show you the outcome." Metrics often set the incentive and poorly designed ones can lead to unintended consequences.

The canonical example is dubbed the Cobra Effect: in India during British rule, the story is that colonial authorities were worried about a growing venomous cobra population. So, it offered a bounty for every cobra head. People were claiming their bounties but the cobra population actually went up. Why? Because entrepreneurial locals were breeding cobras to bring the heads in for the payout.

A more recent example is Domino's Pizza, which used to offer a 30-minute guarantee on pizza deliveries. If the pizza was not delivered within half an hour, the customer could receive it for free (with the cost being covered by the delivery driver). This metric encouraged drivers to speed and resulted in a number of car accidents. One victim of a crash sued Domino's and was awarded $79 million. As a result, the chain discontinued the promotion.

Likewise, we are at risk of unintended consequences when we substitute externally-designed metrics for what we actually want.

Let’s look at other apps besides FitBit and Twitter/X, and try to understand how value capture might be occurring.

For single people who want to find a long-term connection, Tinder flattens the value to “are you swipe-able based on four photos and a short bio”.

For people who want to be entertained, Netflix flattens the value to “here is what 100 other people with your profile saw on our platform — which doesn’t have many films prior to 2000 — and watched for at least 20 minutes”.

For tourists that want to immerse themselves in a new culture, TripAdvisor flattens the value to “here are five tour vendors that offer generic enough experiences appealing to the most travellers who are willing to give a 4-star review.”

For folks who want to learn a new language, DuoLingo flattens the value to “get this random badge without ever actually exchanging words with a native speaker of the language they are learning.”

Again, there is nothing wrong with these value sets if it is actually what you want. However, we often blindly let externally-set metrics guide our values.

Why can't we have better metrics?

Referring to James C. Scott’s work on “legibility”, Nguyen writes that large-scale institutions create metrics that are “narrowed by design” and “trade away informational nuance, richness, and contextual sensitivity” so that individuals are easier to track and the data is usable at scale. The “richness” of any single user’s needs is not important. And, to be fair, if the value is important enough, we should not fully outsource it anyways.

Nguyen suggests considering which values you are comfortable with allowing an external party to decide.

“…we engage in real value outsourcing all the time, because we simply don’t have the time to think through everything for ourselves. When I bought my dishwasher, I just looked at some reviews and bought what the experts recommended. I am outsourcing my determination of ‘best’ dishwasher to those experts, it seems, just like the prospective law school student outsources their determination of the ‘best’ law school to the USN&WR.
But outsourcing my dishwasher values seems significantly different from outsourcing my values in health, education, or career satisfaction. First, dishwasher values are fairly thin. The functionality of dishwashers is, if not utterly one-dimensional, far less multidimensional than the value of an education, health, or career satisfaction.
Dishwashers are simple tools with clear, generally agreed-upon functions. The good of that functionality has low entanglement with the complex, subtle, and variable phenomena of our individual experience. Such objects —which aim at simple, impersonal targets — are good candidates for outsourcing, when we need to save on some cognitive resources.”

Let me end with a meta-point: even this newsletter embeds a value system.

The metrics for this reward system are e-mail replies from you readers, high open rates and new subscribers.

I like to believe that I follow my own curiosity.

But during periods when the newsletter metrics have lagged, I’ll write an article based on a subject line that I think will get clicks rather than writing about the thing that most interests me (you’ll know I’m on a really cold streak if you see the subject line: “This 3-Minute Read Will Change Your Life”).

Knowing about value capture is one thing.

Actually shaking it is another challenge entirely.

Today’s SatPost is brought to you by Bearly.AI

Why are you seeing this ad?

Because I co-founded an AI-powered research app called Bearly AI. And I really like putting blue buttons in this email.

If you press this blue button below, you can save hours of work with AI-powered tools for reading (instant summaries), writing (ChatGPT) and text-to-image art (literally type some text and get a wild image).

It’s all available in one keyboard shortcut (and an iPhone app).

Try Bearly AI for FREE

Links and Memes

MBAs Can't Find Jobs: The most viral chart over the past week came courtesy of the Wall Street Journal, showing that graduates from top-tier MBA programs are struggling to secure employment. The key statistic here is the "share of job-seeking MBA graduates without a job three months after graduation", which has seen an increase in recent years. Juicy example: in 2021, 8% of Harvard MBA graduates were without a job after graduation, and this percentage has risen to 20% in 2023 (I'm using HBS as a bell-whether for other schools).

A potential short-term explanation for this trend is that recent MBA graduates are entering a tough job market for popular industries such as consulting, banking, and tech management. Despite this, hiring figures are not significantly different from the pre-pandemic period.

My longer-term take — and I say this as someone that dabble hard in the accreditation game with both an MBA and CFA — is to short the MBA outside of top 15 schools: 1) AI is coming for white-collar jobs and random management roles that were a dime a dozen during the zero-interest rate world are gone; 2) business schools are a cash cow and MBA programs have milked them too hard (lowering standards to increase admittance and nab that >$100K tuition); and 3) everything I wrote about Twitter/X earlier (as in: you can flex your expertise and insights right now by writing for free online without any gatekeepers holding you back).

Apple Vision Pro Hiccups: It has not been a great year for Apple so far. Microsoft recently passed the iPhone maker in market cap ($2.93T vs. $2.92T). US federal courts may force Apple to remove the blood oxygen monitoring feature in the Watch over a patent dispute. And the DOJ is set to hit the company with an antitrust suit.

Against that backdrop, Apple is about to release its most-anticipated product in a decade: the Apple Vision Pro, which just started pre-orders (Tim Cook dropped an incredible 90-second video of how the headset is made).

One major challenge for Apple is showing people how the headset works. Anyone who has seen a toddler play with an iPhone knows how intuitively it works. Per Bloomberg's Mark Gurman, Apple is training retail employees to conduct 25-minute demos for each customer, which includes:

Scan a user's face to find the correct foam cushion and band size
Choose among 25 "light seal" sizes to prevent light entering the headset
Scan lens prescription info for those that wear glasses (100s of different lenses are in-store for fitting)
Walk through the new UX interactions (eye pointing, finger gesturing)
Let the user try all the main uses cases (photo apps, spatial photos, app windows, immersive movies)

I'm long-term bullish on the Vision Pro, but that’s a lot of steps to fumble a $3,499 sale. Meanwhile, Spotify, YouTube and Netflix won’t launch native apps for V1 of the Vision Pro (these firms are hitting back at Apple for its 30% App Store iPhone tax but you can still use the web versions). And early reviews are complaining about the headset’s weight, which has led to some glorious jokes.

***

Some other baller links:

Zuck’s new goal…is to build artificial general intelligence (AGI). He tells The Verge that Meta is leaning hard into AGI and wants to open source it once it is “responsible” to do so. As part of the leap, Meta has ordered 340,000 of Nvidia’s H100 GPU chips ($10B-ish order). The only other company with a chip order of that size is Microsoft.
The Tesla bot folds a shirt…which is a very difficult task for AI bots. It does so in a controlled environment and is kind of slow, but the potential to prevent marriage fights is immense.
What happened to Boeing? The Odd Lots podcast breaks down how America’s leading aerospace firm went from an engineering-first culture to a financial-engineering one. This transition has led to a series of accidents, including the most recent one involving a mid-air door blowing out of the fuselage.

…and here them fire posts:

Finally, last Saturday saw a big streaming experiment: NBC purchased the rights to a single NFL playoff game (KC Chiefs vs. Miami Dolphins) for $110m. NBC then put the game on its streaming app Peacock and forced anyone who wanted to watch the game to download the app. The memes on X were out of control that day. Two popular running jokes were: 1) how Boomers wouldn’t be able to figure out how to download the app; and 2) that there would be a biblical amount of pirating.

By all accounts, the move was a huge coup. Peacock averaged 23m viewers (the app only has 30m subs) and it became the most-streamed US event in history. The Town podcast says it’s a win even the subscriber churn is large. The NFL is still an event and it is very hard for other team sports to pull this off.

SatPost by Trung Phan

Discussion about this post