Hoàng Nguyên Phong is a PhD candidate at Stony Brook University, where he studies online security and privacy. He is also a visiting research fellow at University of Toronto’s Citizen Lab, where he focuses on internet censorship measurement. His recent paper “How Great is the Great Firewall? Measuring China’s DNS Censorship” explores China’s restrictive cyber policies, and how they impact both the Chinese and global internet. In this lightly edited interview, we discussed the two-way nature of the firewall, how China’s censorship regime differs from other systems, and why anti-censorship technology is important.
Q: What first motivated you to conduct this study and research censorship in China more broadly?
A: During my PhD, the main topic that I worked on was internet censorship and surveillance, because I believe that these two topics are very important nowadays. The internet has become an indispensable means of communication in modern life, and everyone relies on it, especially during this difficult time with Covid-19. And governments, not only the Chinese government, have been increasingly using technology to control the internet. China stands out among those internet censorship regimes as having arguably the most sophisticated system. And because they have this sophisticated system, it functions so differently from other systems, in countries like Iran or Russia. The way that China designed their system is so complicated, making it harder to detect, harder to circumvent. As a technical guy, that hooked me.
How exactly is China’s internet censorship system different?
Every single communication on the internet needs something called a DNS (Domain Name System) lookup. This is just like the way you call someone on your phone. Your phone would need to look up the number. Say, when you type ‘mom,’ it would need to look her number up. So when you type thewirechina.com, the computer needs to look up the IP address. Basically, DNS is the phonebook of the internet, and before every type of online activity, there’s a DNS lookup event.
In Iran, for example, they will block websites by tampering with that process. Instead of telling you the correct IP address, it will inject a fake IP address. In the case of Iran, it would be a static site IP address, which would make it easy to detect. But in the case of China, they don’t inject a static IP, they dynamically change this IP.
When they see a user look up a domain name that they want to censor — for example, thewirechina.com — the Great Firewall will send a response saying that this is the IP address of thewirechina.com, which is, of course, the wrong IP address. And the Great Firewall doesn’t just inject one forged DNS response. Usually two fake responses are injected back-to-back to make detection and circumvention harder. Some specific domains (like google.sm) can trigger the firewall to inject three fake responses, but those cases are rare.
How responsive is the Great Firewall? How quickly are topics or sites added as they become politically sensitive?
This is not something that they do constantly, like 10 domains every day or something. This fluctuates based on political events in China, because we know that the Great Firewall was built by the Chinese government, and it serves their interests. Over the course of these nine months that we’ve worked on this study, we found several coincidences. When political events happen, then within a couple of days, you see related domains get blocked. And when no political events happen, then in one week, you would see nothing.
For example, because I have been following you guys, I know exactly when thewirechina.com got blocked. You guys became public in April, 2020, and the first detections in my system show that the Great Firewall started poisoning the DNS resolutions of your website around September 27, 2020.
I’ll have to go back and see what was happening around then, and what we were publishing! So when a new website like thewirechina.com is added, is it manually added or are there certain terms that automatically get a site blocked?
That is a very good question. And no one actually knows, because there are no official technical documents published by the Chinese government and they never tell everyone how they do it, or why they do it. But in the paper, we tried to figure out whether this is automatic or manual. The answer is both. We found evidence that they scan for particular keywords, and then they block it. So when something is automated by machine, there will be errors in there.
For example, there was one case that got reported by a reporter from the [tech-focused news outlet] Rest of the World. The key word involved was JAV, which means Japanese Adult Video [a type of pornography], and pornography is illegal in China. So that is a keyword that triggers blocking. But there was one weird case in which the Jawa Pos [an Indonesian newspaper] also got blocked, just because it has similar letters. On that day, the reporter from Rest of the World helped me to rush out to the editor of the Jawa Pos and they said, ‘No, we have a very good relationship with Chinese Embassy here, they even invited us to their press conference and last week they contacted us to share some news.’ So they were really puzzled about why they got blocked. But again, when something is automatic, then, of course there are some errors in there.
Our system also found evidence that some things are blocked manually, because there are some websites that have been publishing negative stuff about China and things that the Chinese government doesn’t want to hear. And that doesn’t get blocked right away, it takes time, sometimes months or half a year, so we know it wasn’t automatically blocked.
How much manpower does maintaining and updating that censorship system require?
Probably thousands of people. There is a cyber army sitting there on the internet, just looking for content that could be offensive to the Chinese Communist Party and then they will report it to a central team. So these people are sitting there and going and reading the news every day. And when they detect that a news outlet is reporting something that is harmful to the government, that website is blocked. Just last week [in early June], the Washington Times, they had four or five reports in a row about the Uyghurs, then nuclear problems near Hong Kong, and then also the origins of Covid-19. And on the 15th of June, that website got blocked. Before that, it was accessible in China.
There is a cyber army sitting there on the internet, just looking for content that could be offensive to the Chinese Communist Party and then they will report it to a central team.
Is the process different for a whole website to be blocked versus one article on a website to be blocked?
Ten years ago, most websites were not encrypted. So when you went to a particular article on say, CNN or BBC, if that article was offensive, only the connection to that article could be blocked and the rest would be fine. But now that every website is encrypted, they cannot just selectively block one article. They have to block out the entire domain. More and more internet traffic is encrypted. I believe that 90 percent of websites out there use HTTPS now, which means they are encrypted. So it’s impossible to just pick a single article.
How about a specific comment on Weibo, for example? How does that get blocked?
That is not done by the Great Firewall. So if it is on a social media platform, those are done by a team of moderators. We call that self censorship. Those companies that operate in China have to adhere to the rule of law, and China says, ‘hey, if you guys run this platform, you have to have a team of people to sit down there and police what is posted.’ So those are removed by the administrators, or moderators of those platforms.
What was the goal of doing your recent study?
Until now, there have been several internet censorship measurement studies. Normally, they find a couple of thousand websites, which is really small, and then normally they report the blockings that have already happened. But what hooked me in is that the Great Firewall is evolving everyday. And I believe that the blocking policies of the Great Firewall are a timely reflection of Chinese government policy. So that’s why I built this system to get the news of the blockings out as soon as possible. Because if it is not a timely announcement, it is not really useful anymore.
Your study also illustrates how China’s censorship impacts the global internet. Would you introduce how that happens and why it is important?
Every country has something called the international internet gateway, where all the internet connections from the country go out and all the internet connections from the world get inside. So that is where they put the firewall. And then, ideally this firewall should only act in one way, it should only block and filter for people inside China. But the thing is that the Great Firewall also impacts people outside China. In the paper, we find out that even people from outside China — in nearby countries say Korea, or Taiwan, or other countries that are geographically close to China — then there’s a chance that the DNS system in their country will be polluted by this firewall, though their people have nothing to do with China. When a DNS resolution path from one country to another country (e.g., from Korea to Singapore) happens to pass via the Great Firewall, that DNS resolution will be manipulated if the DNS packet contains a censored domain. You can imagine Internet routing using a submarine cable map. If a packet is sent from Korea to Singapore, it has to pass through China, thus potentially being manipulated if censored content is detected in that packet. So the Great Firewall is two-way. And one of the new findings in the paper is that they also block websites that are hosted inside China, which means that people from outside China cannot visit these websites.
Why would they want to do that?
That is a great question. I’m about to conduct another study to really understand this. But one of the cases that we found was a government website, called beian.gov.cn. This domain is where website owners go and register for a license number. In China, in order to operate a website, you have to have a license. Any foreign business that wants to do business in China and wants to have their website run in China, they need this number. And by blocking this, then they basically block people from registering for a license. Now, only people in China can do that. If you are outside China, you can not go to that website because it is being censored.
So this firewall is a two-way firewall. Until this point, all of the previous studies have thought that the firewall is meant to be one-way, made to restrict people from inside China from accessing censored content outside. One of the new findings of this paper is that it is also restricting people from outside China from accessing websites inside China that the Chinese government doesn’t want you to go to.
I’ve definitely noticed this in my reporting. When I try to go on some government websites, it takes a long time to load and often it doesn’t load at all.
Yes, the slow loading comes from another method that’s called ‘front loading the bandwidth’, which makes it go so slow to the point that it just discourages you from continuing. I’m working with The Citizen Lab [a research laboratory at the University of Toronto] right now. Years ago, there were tons of Chinese government websites or websites in China that our researchers could just go on and look for information. Now, they just cannot visit those websites.
How can we mitigate the impact of the firewall on the global internet?
As I said earlier, the way that they design the DNS censorship, they dynamically change the way they poison the DNS response. Before this study, people have been thinking, it’s impossible to figure out what the fake IP address is that the Great Firewall is going to use. But in fact, in the paper we find that there is a set of IPs that the Great Firewall is using. Based on this, and the way that the Great Firewall is doing the poisoning, we propose ways for what’s called a DNS resolver, like Google and Cloudflare, to sanitize out those polluted records inside their results.
For end users, when you ask for the IP address of, for example, thewirechina.com, the Great Firewall will inject the fake response, but the actual correct response is still on the way to your computer. So by ignoring the fake response, and only accepting the correct response, your computer would be able to know the correct IP and still be able to connect to the website.
Use this analogy: you are talking with a friend and you are asking for some information, say, what is their phone number. And then there’s a guy sitting in between you and your friend, and he is yelling a number. And of course, he is sitting between you and the other person, so his voice is louder. And, of course, his voice is going to get to your ear before the voice from your friend. So by ignoring what the bad guy is telling you, and only getting what the friend is telling you, then you get the correct phone number.
The way the computer works is that it accepts whatever arrives first. So basically, that’s how the Great Firewall successfully censors, because the bad signal arrives at your computer first. And the DNS protocol says whatever packet arrives at your computer first, accept it.
So you’re suggesting that the protocol be changed to not accept the first packet that gets to your computer?
BIO AT A GLANCE | |
---|---|
AGE | 29 |
BIRTHPLACE | Tien Giang, Vietnam |
CURRENT POSITION | PhD candidate at SUNY Stony Brook, research fellow at the Citizen Lab – University of Toronto |
The protocol is something that has already been standardized for more than two decades. So you cannot go out and change the protocol for everyone on the internet. But this information can be useful for people who design censorship circumvention tools. So for users inside China who want to visit censored websites, then they can implement this in the software in their browser to filter out the bad response.
And it’s not just the user. This technique could be very important for big companies like Google and Cloudflare. Because right now, if you want to go to the Chinese government website that is being blocked, you cannot visit it. So if they [the companies] really want to improve the quality of their service, [this technique would help] . I wouldn’t be able to speak for these companies. And I know that many companies don’t want to get into this business of anti-censorship, or go against the Chinese government. So this is just a suggestion in the paper. But if they don’t [use the technique we describe] it would not be that surprising.
They may not want to get involved with censorship within China, but when you’re talking about South Korea or another nearby country, you’d think that they would want to make sure that the internet was open?
Yeah, exactly. Chinese users, the government has total control over, but the internet was created for openness, for collaboration. So you shouldn’t be creating the zone, both inside and outside China, where users see a different thing. To me, that totally violates the free flow of information.
[For] Chinese users, the government has total control over, but the internet was created for openness, for collaboration. So you shouldn’t be creating the zone, both inside and outside China, where users see a different thing. To me, that totally violates the free flow of information.
People talk a lot about the splintering of the global internet, but this paper actually shows how connected the global internet still is, because China’s actions impact the whole internet. Did you think at all about that?
For years, we have been seeing articles with titles like, “China is making its own internet” But now this paper shows that, No, it’s not just that. The way they’re doing it also impacts the entire internet. I believe the internet is a common medium, people around the world share it. So what China is doing is really harmful. And that is one of the points of the paper. We want to send a meaningful message to the operator of the Great Firewall: Of course, we cannot agree on everything. But at least design it in a way that it doesn’t impact people outside China.
Do you think there will be a future where China operates its own internet? Is that possible with the way the internet is put together?
That is a really hard question, especially because I am an advocate for the openness of the internet. The fathers of the internet, they created this technology to be open and useful for people around the world, not to be segmented. Since I published the paper, every couple of days, I tweet what domains get blocked. And sometimes there’s an error in the system. They block a domain and no one knows why. And I have been noticing that some [of the blockages] are getting removed. This is something that I have not publicly said yet. But I’m happy about this.
The creators of the firewall don’t have to agree with me on everything. They don’t have to remove everything from the block list, because they have different points of view, but at least when something goes wrong, fix it.
In March, I reported that Signal, the messaging app, was blocked. And the way they blocked it is, they block anything with signal.org in the domain. There was one website that I believe is totally innocent: signal.org.il. That is a charity website in Israel, it had nothing to do with Signal. It was a mistake. And then a couple months later, I saw it was removed.
So are you saying that people involved in the firewall are looking at your tweets?
I hope so. There are two cases of this, the Signal case and another charity website in the UK. I tweeted it, and then greatfire.org, which is the organization that tracks all of this blocking in China, retweeted my tweet and then a couple of weeks later, I saw that website block got removed.
Do you think the findings in your paper will help develop anti-censorship technology?
I strongly believe that the findings in my paper will help with the progress of developing effective anti-censorship technology. People in China, not just Chinese people, also foreigners living in China, they get impacted by this. So this development of anti-censorship is very important. And there’s a whole community out there that is doing this. They continue to develop new technologies, and I, as a computer scientist doing internet measurements, will keep running my platform so that they can build more effective anti-censorship mechanisms.
MISCELLANEA | |
---|---|
BOOK REC | Reset: Reclaiming the Internet for Civil Society by Ronald Deibert |
FAVORITE MUSIC | Vietnamese Bolero |
FAVORITE FILM | Coco |
PERSONAL HERO | My hard-working parents, who are currently living in Saigon. |
Some people argue that the VPN usage numbers in China are actually very low, not because people in China don’t have access to VPNs, but because they don’t want to jump the firewall. What do you say to that argument?
I still believe that anti-censorship tools are very important. Because if this tool is helping some people in China, then in the next ten years, we may be able to change that opinion. A very sad reality right now is that a lot of Chinese users don’t even know there’s another internet that the government doesn’t want them to see. From my personal experience, I was helping first year courses for teacher assistants. These are 18-year-old students. Some of them came from China, and others have been outside China for several years for high school before they got into the university. In class, they were debating whether Baidu or Google is better. Some said, we don’t need Google; Baidu does a better job searching for Chinese content, which is true in some ways. The other guy was like, ‘Yeah, but then you see the things that the government wants you to see, and you don’t see anything else.’ That shows how Chinese censorship is impacting people’s minds. I’m not saying that whatever we have in the West is the best, but people should be educated with critical thinking, and think about every subject from both sides, not just from a single point of view.
Do you have any other China related work in the pipeline?
Right now, I am conducting a follow up study to actively monitor what the Chinese government doesn’t want people from outside China to see. As you said earlier, you as a reporter sometimes go and look at Chinese websites, and they are blocked. Even within my community, people have been asking that question a lot. Like, what are they trying to hide?
Katrina Northrop is a journalist based in New York. Her work has been published in The New York Times, The Atlantic, The Providence Journal, and SupChina. @NorthropKatrina