PDF.
We show that large language models can be used to perform at-scale deanonymization. With full Internet access, our agent can re-identify Hacker News users and Anthropic Interviewer participants at high precision, given pseudonymous online profiles and conversations alone, matching what would take hours for a dedicated human investigator. We then design attacks for the closed-world setting. Given two databases of pseudonymous individuals, each containing unstructured text written by or about that individual, we implement a scalable attack pipeline that uses LLMs to: (1) extract identity-relevant features, (2) search for candidate matches via semantic embeddings, and (3) reason over top candidates to verify matches and reduce false positives. Compared to prior deanonymization work (e.g., on the Netflix prize) that required structured data or manual feature engineering, our approach works directly on raw user content across arbitrary platforms. We construct three datasets with known ground-truth data to evaluate our attacks. The first links Hacker News to LinkedIn profiles, using cross-platform references that appear in the profiles. Our second dataset matches users across Reddit movie discussion communities; and the third splits a single user’s Reddit history in time to create two pseudonymous profiles to be matched. In each setting, LLM-based methods substantially outperform classical baselines, achieving up to 68% recall at 90% precision compared to near 0% for the best non-LLM method. Our results show that the practical obscurity protecting pseudonymous users online no longer holds and that threat models for online privacy need to be reconsidered.


I’ll try to explain my thought.
The condition for markets to exist as self reproducing and self-stabilizing objects is government, usu. in the form of a state-entity, which itself is an economic actor that exists in competition with other states and in cooperation within free trade zones. Important note: government forms from market activity, specifically from the control of estates. Taxation is a form of rent, for example. I am not putting the state-before the market.
There is an interest for governments to:
Maximize economic output
To do so through cleverly tricking other economic actors outside of the own taxation system. I.e. trade agreements with built-in asymettries.
And to minimize damage to domestic production. Outsourcing can lead to cornerstones of the economy eroding.
Throw in the internet. We can now communicate and exchange with actors that are not in the same tax system. First and foremost this leads to issues with intellectual property. I’d cite geolocked internet radio stations and piracy. Japan doesn’t care about its citizens pirating manhwas, and vice-versa, Korea doesn’t care about anime piracy, and so on and so on. Then there is trade of physical objects. Say you need a laptop battery for your Linuxed MacBook M1 and a Chinese seller has batteries in stock that are cheaper and better than Apple’s own (happens rather frequently), with taxation at the border factored in you are still getting the most optimal deal. Some might find ways of circumventing customs which sweetens the pot further. Obviously there are issues to the domestic economy that can arise from this.
Trade speeds up and global supply chains gain importance as cross border communication speeds up. At the level of national governments there is a distinct threat presenting itself. There is less control over market activity leading to a speedup of the self-polluting nature of trade, in other words the boom and butts cycle shortens. As a national government you’d want to lengthen the boom and bust cycle as crises are the natural killer of states, along with expansionist nations.
Everything you are seeing, from Chat Control to China’s firewall are attempts to stabilize economies. The internet enables one to build structures that are wholly outside of state control. The state fails to direct the economy as planning starts happening between turfs. The internet due to its nation-decentralized function can aid in forming structures that oppose the state, should it falter.
Let’s not forget one of the biggest threats to the economy that is open source. Patents and DRM are threatened by the unstoppable pace of Blender, Open Office and co… It’s as if people said YOLO, let’s stop exchanging goods and services and at the same time solve very real and pressing issues, some of the biggest problems in fact. It works with much less friction than anything before, it exists as this hobbyist thing that we cannot call economical in any sense of the current understanding of the word and it would not exist if it wasn’t for the internet.
India and China have smartphone ownership rates of over 85%. There are no significant technological constraints if you are not someone who needs exorbitant download upload speed and low latency. The Chinese have pretty decent internet speeds, faster than most European countries. I also do not at all believe that there is a lack of demand for practical access. The internet is most generally a sensible thing to have access to no matter who you are.
Thanks for explaining your thoughts. So to paraphrase you: you are saying that the market and by proxy, nations too, Are still adapting to the concept of the internet. One way to cope with the effects is to restricted access?
I am saying that the internet is as an international object antithetical to nations as its control panel sits not in one nation but all and that nations therefore seek to nerf it, only for it to return stronger and even more difficult to regulate as more and more people adapt to internationalized organizational patterns. As a corollary, there is a real cultural unification happening across borders as a secondary effect. I’ve read people terming it a “discordization” because people are starting to talk the way people talk in Discord chatrooms.
Yes, so you do have to restrict access and notably deanonymize users. California is trying to force OSes to implement age checking, which is of course a way to unmask people online. Protectionism cannot merely be understood as a set of possible tax policies, it is exactly the regaining of nation-centralized control in any sphere of life. States do not want people to be able to choose who to hang out with if the pool is the entire world, states do not have an interest in letting subjects learn about reality beyond a certain threshold where the scope of a person’s understanding exceeds the boundaries of countries.
What I am getting at exactly is the social structure that humans find themselves in. When relations/hierarchies are on the brink of flattening, that is everyone is linked to the next in a symmetrical fashion, like in a family or within small communities 5000 years ago, states, companies and even small businesses will feel compelled to work in such a way that preserves their asymmetrical stance in society. As it happens the internet is extremely good at producing flat social structures, anonymity, reach, openness and near-infinite scalability make it possible. You may be able to neutralize one netizen or manipulate one online community, by the time that has happened five hundred heads of the hydra have regrown. Cost and expenses don’t work out.
You have a lot say on this. Its good that someone thinks about these thing. I’m sorry that I can’t really provide you with a good discussion. I don’t know enough about markets etc and I don’t want to spend too long online.
I agree that can’t really stamp out openness and anonymity online (which is beautiful in a way) but I think that will mostly be reserved to technically capable users in the cracks and niches of the web who can navigate the restrictions. This is a massive tragedy.
This brings us to the current state of the web with age restrictions popping up everywhere, deanonimization etc. I think that we are in agreement regarding where it is going. Where you think we should be heading. I’m sure you have opinios on that
I mean I have a lot to say. I don’t expect people to engage in discussions nor do I really want to create discussion as it eats a lot of time on my end as well.
You’re right, but we don’t know if the more technically capable users will create elegant solutions for the rest.
Opinions probably. I try not to judge things though or impose expectations.