Security

How Does State-Level Censorship Work? A Deep Dive into the Leaked Chinese Firewall

An analysis of a 500 GB data leak related to China's Great Firewall, revealing its architecture, components like Cyber Narrator and TSG Galaxy, and how the technology is exported to Kazakhstan, Ethiopia, Pakistan, and Myanmar.

Hi, Habr!

In this series of articles, we'll break down a recent data leak related to the Great Firewall of China (GFW). We'll figure out how it works, who it serves, and how people get around it.

We downloaded 500 GB of leaked data so you don't have to, and decided to study them. This article is an introduction — today we'll cover what the GFW is, the history of its creation, and how such systems work. And also what this "Geedge" thing is.

What Is the GFW?

For those who've been asleep for the last 10 years

The Great Firewall (GFW) is an enormous architecture of state internet control that has been developing for over two decades and has become the model for internet blocking systems worldwide.

It has been growing and expanding since the 1990s, and to understand how it works, it's worth diving a bit into its history.

The 1990s: The Era of First Blocks

When the internet came to China in 1994, the authorities faced a problem — how to control the flow of information in a country of 1.2 billion people? Even then they made the first attempts — blocking individual foreign websites through DNS servers and IP addresses.

By 1997–1998, the Ministry of Public Security of the PRC and the China Internet Information Center began creating the first filters. These were basic systems operating at the L3/L4 level. They included DNS filtering (redirecting requests to banned sites) and IP blocking (completely cutting off traffic to certain addresses).

The creator and developer of the Great Firewall is generally considered to be Fang Binxing, who has been at the origins of its development and continues to this day.

Fang Binxing — the father of the Chinese firewall

The 2000s: The Emergence of DPI

In the early 2000s, the internet began developing rapidly in China and it became clear that simple blocking wasn't enough. People found workarounds, used proxy servers and VPNs. The authorities realized a more thorough approach was needed.

That's when the first Deep Packet Inspection (DPI) systems appeared. DPI is a technology that allows you to analyze not just website addresses (at the packet header level), but also the content of traffic (payload) in real time.

At the time, most websites still used HTTP instead of HTTPS, making it very convenient for China to block traffic from them.

Starting in 2002–2003, the first DPI systems were installed at the nodes of China's main internet providers:

China Telecom
China Unicom
China Mobile

These three companies control the majority of the internet in the country, which allowed the state to control and filter content much more conveniently.

2003–2008: Building the Ecosystem Around the Great Firewall

From 2003 to 2006, the Chinese developed and installed an additional layer of protection in the GFW. They created the "Gold Shield Project" for mass surveillance of users.

Essentially, they developed yet another censorship system on top of the existing one and deployed it at ALL points where a user could access the internet.

The main firewall itself operated directly at the internet providers and filtered traffic at their level.

During those years, the GFW transformed from a set of tools into a unified architecture of state censorship.

The DPI systems were also modernized — they began filtering VPN handshakes and requests containing certain keywords.

For example, say our friend Lao goes to an internet cafe and decides to search for information about the military situation in Tiananmen and Hu Yaobang. The Gold Shield Project's "black box" installed at the hop before the provider would use DPI to read the keywords "independence" and "Tiananmen" — and now police might approach our Lao to check him for extremism.

In subsequent years, the GSP integrated with the social credit system, and searching for certain words would automatically lower our friend Lao's social reliability score by a certain number of points.

At the same time, the state created control centers from which filtering rules could be changed in real time and user activity could be monitored.

2008–Present

After 2008 (the Beijing Olympics, unrest in Xinjiang and Tibet), the GFW began integrating even more tightly with the state security system. One could say that censorship and surveillance became one and the same.

In parallel, they developed systems for:

Social media monitoring, by creating and popularizing their own alternatives (Weibo, WeChat, QQ)
Tracking individual users through their IP and MAC addresses
Throttling (slowing down) traffic to undesirable resources
Blocking VPNs — tools for circumventing censorship

2015–2020: Expansion

After 2015, with the strengthening of domestic political positions, the GFW received additional funding and human resources.

It began to be deployed at regional levels — each province got its own version of the GFW.

Additionally, AI and machine learning were integrated into the GFW for recognizing "dangerous content," and it was linked with the social credit system — people began being punished for certain internet activity. The ability appeared to block an individual user without blocking the entire site.

The GFW transformed from a single agency's tool into a model system.

2020–2024: Global Spread

By the 2020s, China understood the value of its system and began actively exporting not just the technology itself, but the entire model of state control. This became known from leaked documents of the company Geedge, which is the main company involved in R&D of technologies for the GFW.

The GFW model then became a product sold to countries in Central Asia, Africa, Southeast Asia, and other regions.

What Does the GFW Consist of Now?

About a month ago, 500 GB of data related to the GFW was leaked, which the community (including us) is actively analyzing. Based on this data, the GFW is currently being developed by two structures.

The first is MESA Lab — a state institution engaged in R&D of control technologies. The second is Geedge Networks, a commercial company founded in 2018 in Hainan. It is led by Fang Binxing as chief scientist, and the CTO is Zheng Chao, who was previously a lead researcher at MESA.

The connection between them is closer than it appears. Most of the core Geedge team came from MESA, as well as from universities like Harbin Polytechnic University and Beijing University of Posts and Telecommunications. When you look at the git commits of leaked Geedge code, you can see names of people who were on the MESA employee list.

In other words, people moved from a government structure to a commercial one but continued working on the same thing. MESA handles development for the state; Geedge packages those developments into commercial products and exports them to other countries.

Main Components of the GFW

Cyber Narrator

Cyber Narrator is a monitoring system that operates at the ISP level. Its main task is to record every user action on the internet. The system logs all visited sites, DNS queries, source and destination IP addresses, timestamps, traffic volume, and types of protocols used.

Essentially, it maintains a complete activity log for every user. Cyber Narrator runs in the background on ISP networks and collects information continuously.

TSG Galaxy

It aggregates information from all Cyber Narrator instances across the country. In one place — all data about who visited which sites, when, and how.

The system allows searching by IP addresses, time, and sites, builds user profiles, and identifies behavioral patterns.

Cyber Narrator is an ordinary data collector, but TSG Galaxy is the computation center: a storage and analytics system that shows the complete picture of internet activity at a national scale.

Tiangou Secure Gateway

Usually located in the capital or a major city where security agencies are based.

From Tiangou, administrators can change filtering rules across the entire country in real time:

Add or remove keywords from the blacklist
Block IP addresses and domains
Monitor user activity
View statistics and generate reports

Tiangou is the control point. All management of the censorship system passes through it.

TSGX (the DPI Device)

TSGX is the actual hardware appliance that uses DPI for blocking. It is server equipment physically installed in the ISP's network.

It's placed at critical points — on backbone channels where all traffic passes through a few nodes, at the country's border for controlling international traffic, and at major provider nodes.

The device intercepts all traffic passing through it, analyzes it using DPI, performs real-time filtering, blocks unwanted connections, and collects data for sending to TSG Galaxy and Cyber Narrator.

TSG-OS

TSG-OS is the operating system that runs on the DPI appliance. It is based on Linux and includes drivers for working with network equipment, DPI modules for traffic analysis, management interfaces for administrators, communication protocols for connecting with the main control point, and fast device update capabilities.

Through TSG-OS, administrators can customize the behavior of the appliance to meet the needs of a specific country or region.

How It All Works in Practice

When a user in a client country tries to open a blocked site, their request first passes through the DPI appliance — the "black box" installed at the provider.

The device runs TSG-OS, which analyzes traffic using DPI. If the system finds prohibited words or detects the use of VPN or Tor, it immediately blocks the connection.

Simultaneously, the appliance transmits information about the attempt to the main database (Cyber Narrator) — the IP address and event time are recorded there.

Then this data reaches the main computation center (TSG Galaxy), where all statistics are collected and processed.

And through the interface of the main security centers (Tiangou Secure Gateway), authorities see everything: who, when, and to which resource tried to connect, what words were in the request.

Regional Versions of the GFW in China

As we mentioned earlier, each province in China has its own GFW, and from the leaked documents it becomes clearer where exactly which filters are used:

Xinjiang (codename J24): Maximum control, Tor blocking. Highest severity.
Fujian: Traffic and VPN control. Medium-high severity.
Jiangsu: Content filtering. Medium severity.
Rest of China: Standard filtering. Standard severity.

Geedge's Clients and Technology Export

From the leaked documents, we can see where exactly Geedge has deployed or is deploying systems based on the GFW architecture:

Kazakhstan (codenames K18, K24): Active. The first major Geedge client outside China.
Ethiopia (codename E21): Active. Being considered as a model for expansion in Africa.
Pakistan (codename P19): Active. Used to control social unrest and block undesirable content.
Myanmar (codename M22): Active. Deployed to suppress protests after the coup.
Unknown country (codename A24): Early stages.

Funding for GFW Developers

MESA receives funding directly from the Chinese state budget as a government research institution. According to documents from 2016, the MESA team received contracts worth over 35 million yuan per year, and those were only the amounts documented. By 2024, the volumes are significantly higher, especially given the national importance of the projects.

Geedge operates on a commercial model. The company receives money through contracts with ISPs in client countries, through international organizations and intermediary companies (Thales Group, Investcom Holding, ATOM, and others), and through diplomatic initiatives like the "Belt and Road."

Using intermediaries allows hiding the direct connection between the Chinese state and the control system in the client country. Geedge doesn't call itself "a Chinese company that sells censorship" — it positions its products as "cybersecurity solutions" or "network traffic management."

Conclusions from Part One

The GFW has transformed from an internal tool into an exportable product. Geedge took the proven architecture and packaged it into modular products: Cyber Narrator for monitoring, TSG Galaxy for data analysis, Tiangou for management, and the DPI appliance as the physical hardware for installation. Quite the SaaS offering...

The result is a system that can be quickly deployed in any country. No years of development needed. Just install the "black box" at all providers, connect it to the control center, and the system starts working. Geedge has already deployed this in Kazakhstan, Ethiopia, Pakistan, and Myanmar. Each country gets the same architecture but with localized settings.

Where will these countries' cooperation with Geedge lead? We'll live and see.

In the next part, we'll cover GFW bypass methods and take a closer look at the contents of the services supplied by Geedge. We'll also analyze the DPI metrics used by the GFW and the methods they use to de-anonymize users. Stay tuned!