The Enduring Challenges of Content Moderation

13 min readMar 19, 2023

Every minute, we generate 16 million tweets, 1 million hours of streaming video, and 66K Instagram photos. And all of that user generated content is garnering more of our attention each day. Americans, in particular, now spend nearly as much time watching user-generated videos as they do watching traditional TV. And if you consider all the tweets, online reviews, news & traffic alerts, feed posts, and product recommendations, you’ll understand how much we’ve come to rely on these troves of user generated content to navigate the real and digital world.

But with the emergence of social media over the last two decades, we’ve also become aware of the risks this content poses to society. The 2016 US presidential election made ‘fake news’ and ‘misinformation’ household terms. Gen Z and other digital natives are coming to age while navigating cyber bullying. And the ubiquity of mobile phone cameras has led to a stubbornly pervasive trove of grotesque and exploitative images.

Thanks for reading the full-stack humanist! Subscribe for free to receive new posts and support my work.

So why is it that, in the age of rapidly evolving AI capabilities, we can’t solve the content moderation problem?

What is content moderation?

Let’s start with a definition:

Content moderation is the process by which a given form of expression is evaluated against a set of predefined standards.

At its simplest, content moderation restricts speech or content would be (a) deemed harmful to members of a given online community or (b) would undermine the integrity of community medium.

In the early days of internet adoption, bulletin board systems (BBS) and user forums, were the main channels for engagement. Often BBSs and user forums relied on certain members to play the role of moderator. Moderators would review each post submitted by users and determine whether it would be acceptable to ‘release’ to the broader community.

a screenshot from the old days of AOL, courtesy of ForeverTwentySomethings.com

As technology improved, users moved to more dynamic forms of communication like Internet Relay Chat (IRC) and America Online (AOL) chat rooms. These platforms still relied on moderators, but rather than proactively screening content before it was made public, moderators would monitor content and retroactively remove content (or ban users) that violated the platform’s terms of service.

Today, the scope of content moderation is . . . more complex. We’re no longer just dealing with text in chat rooms, but we have picture, steaming audio, live video, virtual reality, NFTs and other forms of digital content. That content can now appear on mobile phones, smart watches, and Internet-of-Things (IoT) devices.

Also, there aren’t just more types of content. We have more people posting content than ever before, with over 5 billion people now online, representing 64% of the world’s population.

So how does content moderation work?

Broken down into its component parts, content moderation requires:

Written Policies
Enforcement System
Monitoring & Evaluation

Written Policies

Promoting Openness vs Preventing Harm

Before a line of code is written, a given platform will need to proscribe the terms of expected behavior on its site. This serves two purposes: (i) articulating a vision of how the platform should be used and (ii) setting boundaries on restricted or prohibited behavior.

For example:

Providing Clarity vs Preventing Circumvention

Policies for these online platforms aim to strike a balance between welcoming many different types of user posts while limiting harmful, abusive, or offensive experiences. These policies are often enumerated in a set of rules whose acceptance is part of the legal terms of service.

Drafting these policies is not trivial. Finding the right way to describe all the possible ways people can abuse your site is a herculean task. In general, platforms use these policies to explain the ‘spirit of the law’ rather than providing an exhaustive and detailed list of what content can and can’t be posted. This helps preserve the platforms’ flexibility in enforcement.

Flexibility is important because many of these harms are hard to define. In 1964, when Supreme Court Justice Potter Stewart was asked to explain his criteria for what constitutes pornography, he responded: “I know it when I see it.” Not the most illuminating clarification, but we understand his dilemma.

Nearly sixty years later, we still don’t have a better working definition. But we can add a growing list of content risks to define: profanity, incitement, misinformation, gaslighting, and revisionism, to start.

In the absence of exhaustive definitions, content moderators may resort to developing internal guidelines with tests and criteria that the public will never see for these hard-to-pin-down-harms. These guides may help platforms determine what content rises to the level of misinformation or profanity or pornography. But if these guidelines were published alongside the content moderation policies, malicious actors would seize upon the opportunity to circumvent those internal criteria. So platforms resort to broadly worded public policies and more detailed internal guidelines.

Consistency vs Cultural Differences

Providing a consistent user experience helps drive user growth. Consistency can include reliable product features as well as content moderation that meets users’ expectations. A consistently (positive) experience on one of these platforms makes it that much more likely that users will tell invite their friends to join.

But users in Rome and Riyadh may have different expectations about what kind of content is acceptable. Ignoring or violating these cultural norms risks alienating large swathes of users. Should an adult content policy allow or prohibit nudity? If adult content, however defined, is only accessible to users based on age, what age should that be? 16? 18? 21? Often the default answers to these questions mirror the cultural norms of those building the platforms (i.e. American product and engineering teams).

Promoting Freedom of Expression vs Preventing Harm

Technology products reflect the values of the people who build them. As such, major social media platforms founded in Western countries tend to carry the banner of free speech. Empowering users to express their creativity seems almost indisputably positive. And if your operating model aims to take advantage of network effects, then inviting more people to create and share more content also makes business sense. But companies also need to make sure their platforms aren’t being used to harm users or society at large.

Today, platforms in the United States are not liable for the content that users post thanks to Section 230 of the Communications Decency Act. But that may change. This year the Supreme Court is hearing two cases that may challenge the liability shield (I wrote about these cases and other potential legal changes in tech here). In those cases Google, Meta (formerly Facebook), and Twitter were sued by families of victims of terrorist attacks. These families claim the terrorist content (i.e. recruitment and training videos) available on these platforms aided the terrorist organizations’ cause and were a factor in the victims’ deaths. If any of the Section 230 provisions are rolled back, you can expect that platforms will not longer be as permissive on free speech matters.

Enforcement System

Assuming coherent policies have been crafted that reflect the platform’s principles, the next step is to enforce those policies. Enforcement considerations fall along two dimensions: method and posture.

Enforcement Method

The enforcement method describes the process for ensuring content is complying with the stated policies. Generally there are two options: manual reviews and dynamic reviews.

Manual Reviews

In a manual review, each piece of content is sent to someone who uses their judgment to assess the content against a given set of policies. As compared to dynamic reviews, manual reviews are often treated as providing the authoritative decision about whether a piece of content is compliant. For many policies, manual reviews are more accurate than dynamic reviews.

But, as we know, there is an overwhelming amount of user generated content created each day, so throngs of human reviewers are needed to keep up with the volume. And these human review teams are often distributed across countries, time zones, languages and cultures. In order to ensure consistent enforcement across them, each of the publicly available written policies will need to get translated into detailed guidelines for reviewers to follow.

These human reviewer teams come at considerable cost, but not just in terms of the platforms’ bottom line. Reviewers risk being exposed to the worst content on the internet. A 2019 report in the Verge, entitled The Trauma Floor highlighted that the trauma that some teams endure from “the daily onslaught of disturbing posts: the hate speech, the violent attacks, the graphic pornography.” The author explains:

The moderators told me it’s a place where the conspiracy videos and memes that they see each day gradually lead them to embrace fringe views. One auditor walks the floor promoting the idea that the Earth is flat. A former employee told me he has begun to question certain aspects of the Holocaust. Another former employee, who told me he has mapped every escape route out of his house and sleeps with a gun at his side, said: “I no longer believe 9/11 was a terrorist attack.”

Because of these financial, emotional, and psychological costs, platforms have come to also deploy dynamic reviews.

Dynamic Reviews

In dynamic (or automated) reviews, content is reviewed programmatically, either by a heuristic or machine learning models. Dynamic review systems can range in complexity from simple linear regressions to multi-billion parameter large language models. Their value derives from being able to review orders of magnitude more content than manual reviewers and much faster. A finely tuned dynamic review system can process billions of pieces content within a day. Each human reviewer may only get to several hundred in a day.

But the sheer volume of content that can be reviewed dynamically means mistakes happen at scale too. Imagine a dynamic review system that reviews 10 billion pieces of content a day. Even if it correctly labeled content 99% of the time (a very high threshold), that would still mean 100 million posts, images, videos and other pieces of content would be incorrectly enforced. In any other context, being right 99% of the time would be a mark of success. But in this case it can create a quality control nightmare, exposing many users to offensive content. At scale, the relationship between quantitative and qualitative success begins to fall apart.

Enforcement Posture

The other dimension at play in enforcement is the enforcement posture — how proactive or reactive you are in searching for non-complaint content.

Proactive Posture

Proactive enforcement intentionally searches or filters through user generated content deciding whether each piece of content is compliant or not. It’s akin to those practices in the early BBS days where a moderator reviewed each post before it was published or even when AOL chat room hosts terminated users whose messages violated the terms of service.

Proactive enforcement minimizes the risk of exposing users to non-compliant content. But it presumes that the policies are well-defined and the platforms have confidence that they’ll be able to accurately identify bad content.

Reactive Posture

Reactive posturing is best when enforcement is more nuanced and the platform may not be well-suited to detect when users are exposed to harm or offensive content. This may be the case if the actual harms occur ‘offline’. For example, let’s say you paid $100 for a high-end coffee maker from an ad on Amazon only to find that it was a shoddy knockoff after unboxed. Amazon wouldn’t know that the merchant/advertiser was a scammer until you flagged it to them. Only once you reported it, assuming the report was credible, could Amazon move quickly to enforce its policies on scams or misleading claims against the merchant.

Enforcement Matrix

Taken together, the enforcement method and enforce posture will determine how platforms choose to enforce each of their policies. People assume that content moderation on most big tech platforms comes from quadrant I (proactive & dynamic) with machine learning models scanning content as soon as it is submitted. But the options are more distributed. The matrix below illustrates how the choices play out.

Quadrant IV (Reactive & Dynamic): Even the best crafted policies can’t cover all new risks. In these cases, a platform’s best bet is to respond as quickly as possible. For example, when new patterns of hate speech or political misinformation/disinformation emerge regionally (particularly when limited to a particular language or subculture), there is often a lag before content moderation teams notice. But a reactive & dynamic enforcement approach means that once the pattern of abuse has been flagged, automated enforcement tools can learn to identify new instances of the pattern and remove similar forms of offending content in the future.

Quadrant III (Reactive & Manual): This is the natural starting point for most policy enforcement. In the early days, platform teams may not have a lot (or any) data about how each content risk manifests on its platform. Building a team of manual reviewers supports the ability to apply judgment to each of the early instances, so as not to set the wrong precedent. Over time, these human judgments can be used to train a machine learning model, which will eventually shift some of the enforcements to quadrant I.

Quadrant II (Proactive & Manual): Sometimes enforcement teams may identify a problematic piece of content that deserves a second review. In this scenario, the enforcement team may need a subject matter expert (e.g. legal, medical) to provide a second opinion before the enforcement decision.

Quadrant I (Proactive & Dynamic): The ideal end-state. Generally, this reflects a mature enforcement process that builds upon the merits of the other enforcement positions.

Each platform applies a mix of these positions across all of the policies they enforce. Whether a platform takes a proactive & dynamic approach to enforcing ‘hate speech’ or ‘adult content’ while be determined by the platform’s risk profile and resourcing capacity (i.e. engineering infrastructure or labor hours). And these decisions may will certainly change over time as content patterns evolve.

Monitoring & Evaluation

Despite criticisms of effectiveness of platforms’ content moderation efforts, these platforms invest heavily in understanding what’s working, what’s not working, and what to do about the latter.

Internal Metrics

The most direct form of feedback comes directly from within the platform. Although each platform’s content moderations teams are different, most will measure precision and recall metrics to monitor their performance.

Precision helps you understand how much you are overflagged (i.e., unnecessarily removing compliant content). Recall helps you understand how much you are missed (i.e. content that should have been removed). Content moderation teams will likely monitor these metrics regularly (monthly, weekly, or even daily), and the teams’ performance evaluations may be tied to improving these metrics.

But the techniques to improve one can often depress the other. If you wanted to improve your precision metrics for a hate speech policy, you might only decide to take down the most egregious and blatant content violations. But this would ignore more subtle forms of hate speech that may only grow more popular because they fall under the radar.

On the other hand, optimizing for recall may lead you to take an expansive view of hate speech, enforcing against any content that may be considered offensive and ignoring any safe havens for satire, sarcasm, or parody.

In practice content moderation teams, will often cycle between which metric they optimize in a given quarter, half or year based on business goals. When new product features are launched, content moderation teams might be instructed to play it safe and avoid bad press by optimizing for recall. But later on, as the product feature matures and companies want to compete for users, they may optimize for precision to bring in a larger universe of users and content to the platform.

User Reports

Nearly all platforms have a way for users to flag or report offensive or illegal content. Platforms want to make users happy. So content moderation teams often mine these reports to understand whether the users think the policies are too lenient or too strict.

But unstructured user report forms (“please share any feedback”) may generate responses that don’t provide enough evidence about how exactly the policies have been violated. Further, some of the platforms’ own product decisions make it harder to enforce on user reports. Features like end-to-end encryption in messaging, vanishing content or dynamically generated content, increase the difficulty in recreating the user experience that was reported.

Regulation & Litigation

Sometimes evaluating how well a platform is moderating content, isn’t up to the platform to decide! Given the outsized role these platforms play in political discourse, e-commerce, and culture, it’s no surprise that national governments and their courts have an interest in regulating what is or is not acceptable online.

The EU has taken the lead in regulating tech platforms that target its 450 million citizens. But its General Data Protection Regime (GDPR) ostensibly aims to improve privacy protections and data use, rather than content moderation.

And what if government regulation isn’t truly in the public interest? India has pressured big tech platforms to do more to combat misinformation. They’ve threatened regulations that require platforms to remove any accounts that the government thinks spreads fake news. Civil society groups, however, believe that such tactics are an attempt to co-opt tech companies in silencing critics. Google and other companies have pushed back, while launching their own initiatives to promote digital literacy and combat misinformation in the country.

Finally, in the United States, Congress has yet to regulate tech platforms, despite 44% of Americans supporting such action. It will be months before the Supreme Court returns a decision on Section 230 liabilities (and they may decide not amend the law at all). So it seems that for the foreseeable future, tech platforms will continue having to solve the content moderation challenges on their own.