I was thinking about this recently… By going to a federated system, one that essentially copies all of your content from one instance to another, when you delete a comment, does that comment get deleted on every instance? Is that even possible?
All online systems suffer from this problem.
Bots are scraping websites daily, including places like archive.org, where they compile everything and save it for posterity. Half the time, your data is already saved by a third party, even if you delete it off a website.
Further, all databases have the option to flag something as “Deleted” and keep the original data while not showing the data on the main web page. Just because you “deleted” something online doesn’t fucking mean anything at all materially. It just means they are hiding it from end-users. The data is very likely still there. This is why people who are bulk-deleting their comments on Reddit are shocked to find those comments later restored… because they were never actually deleted to begin with. They were just flagged in the database as “deleted” and to not be shown to end-users.
Unless you are running your own server and your own service, you are at the mercy of strangers who are in full control over whatever data you share with them.
This has always been true, since the beginning of the internet.
This is why parents in the 90’s told kids to not post personal stuff online.
Because once it is sitting on a hard drive on a server owned by someone else, it is not legally any longer your data, it is now the data of the person who owns/operates that server and the hard drive.
Sorry for this message being kind of aggressive, I am very tired of everyone just figuring this out for the first time and thinking somehow it only applies to the Fediverse.
It applies to every single service you sign up for on the internet. You’re storing your data with someone else, and you don’t control the server software, database software, or hardware. That data is no longer yours. You are effectively hanging out on someone else’s property, and what you do on their property is being recorded.
This is not a Fediverse problem, this is an Internet problem.
EDIT: Forgot to add, it’s also the problem that the Fediverse is trying to help solve by allowing individuals to run their own instances and thus be in greater control of what happens to their own data.
I disagree. It is not an internet problem, it is a result of the fundamental properties of data that we couldn’t change if we wanted to.
Yes, this is essentially the Analog Loophole. If you show something to someone it is effectively impossible to stop them from taking a copy.
I used this loophole in the early 2000’s when I tricked my PC into thinking my VCR was a second monitor connected by S-video. I didn’t have enough storage space for all those episodes of Sealab 2021 and Aqua Teen Hunger Force I was pirating, so I recorded them to VHS to save costly data storage space.
I still have those tapes squirreled away somewhere.
This is astute and correct, it’s not necessarily even an internet problem as much of a “this is simply how data and transferring data works” problem.
No, centralized social networks suffer less from this problem. If all data is stored on one platform, only that platform needs to delete it and it’s gone. If they don’t, they risk enforcement by authorities. In the fediverse, every instance has to delete it and there are too many to effectively enforce.
Just to add some nuance;
Companies do delete data on individuals when they have no more economic value to them unless they’re required by regulation to retain that data. Yes it’s true the world is storing terabytes more of data per day, but my company holds on to customer records for 5 years, if they don’t do business with us in this 5 years we will physically delete that data everywhere. There’s many use cases like this where old data isn’t stored because it doesn’t make economic sense to. Maybe when there’s a next gen parquet file that can store a decades worth of records in the size of a few KB, but at a certain point data does rot.
If it leaves your box it’s no longer yours. Even if it doesn’t leave on the wire and you delete it from disk there are readily found forensic tools that can recover lost data if you get an old drive in hand. It has been said the internet never forgets, and it keeps being proven true time and again whenever someone gets called out for something they said 10 years ago.
Expect the future, own your past, make your marks and grow as you go.
Is it okay to encrypt a home server hard drive in this case?
That’s always an option, and my usual go-to when disposing of drives at least. It gets a bit scary to do so with the main prod data though, lose a key and everything is toast. If you have a solid means to keep crypto keys secure and redundant though by all means. It can put a hit on CPU and disk performance depending on how many random read/writes it has to do. I wouldn’t think it’s a great plan with a lot of fedi services just because of that factor. My mastodon instance has something like 116GB of attachment data in almost half a million objects, that’s a lot of encrypt/decrypt action to maintain.
I’m not all that concerned with ACTUAL privacy/encryption but rather more concerned with lower-level things like stalking, harassment, employers doing research about their employees’ non-work habits, insurance companies, etc.
I’m not talking about doing anything illegal and hiding from authorities who can use forensics on your data. Just general anti-corporate snooping and anti-harassment privacy protection.
Like, I feel more inclined to sign up and use something more like Raddle.me instead of lemmy because the owner of that site has a philosophical mission in favor of privacy.
because the owner of that site has a philosophical mission in favor of privacy.
Daniel Micay, the head programmer of GrapheneOS thankfully stepped down from his position, but not after entirely torching the goodwill of Louis Rossman, who liked GrapheneOS because it respected his privacy. Louis was then accused by Daniel of trying to destroy the GrapheneOS project and threatened with “exposure” which Louis expertly documented and lead to the GrapheneOS developer stepping down because of how absolutely unhinged he looked accusing Louis of this.
https://www.youtube.com/watch?v=4To-F6W1NT0
How are you so sure that the owner won’t pop off on you in such a way in the future? Lemmy at least you can 1. run your own instance and be in tighter control of your data and 2. If you really want to make it more secure, contribute to the codebase or 3. Make your own fucking fork of the codebase that is more secure and privacy oriented. Raddle may be open source, but it doesn’t look like you’re encouraged to run your own Raddle.
Also, you’re still handing your data off to a stranger, who has made promises. What about those promises makes you think this stranger will keep them? It’s still inherently a risk, even if they never end up doing anything nefarious. You just don’t know their mind and can’t know their mind, and being just a user instead of someone who actually knows them in person, you’re only basing it on promises they’ve made in an attempt to try to draw people to use their service. Are you really sure the code that is running on Raddle.me is exactly the same as the open sourced codebase? This is a question that regularly gets asked in respect to Signal Messenger, is the code on the servers the same as what is actually released. How far does this “trust” based on words alone, go?
To quote Mark Zuckerberg about people sharing information with him and why:
people just submitted it
i don’t know why
they “trust me”
dumb fucks
You know whose mind you can know and trust? Your own. Thus making your own instance.
And last but not least… You’re already here. You’re making a post about this here. You have an account. You have 23 posts and 352 comments. Sorry to say but you’re just not that worried about this issue, so this feels a little like concern trolling.
Definitely not concern trolling. Just finally thinking about all this stuff. Thanks for the insight.
It’s one’s own line and what you’re looking to accomplish. Privacy can have a lot of different faces.
There’s public/profile data, does a site demand full identity authentication to get an account, is that info public on your profile, is your comment/browsing/post history public or concealed? All those things still generally will reside with the service and be readily available if someone asks.
There’s the privacy of data in flight, my ISP actually has it in their TOS that they reserve the right to collect browsing data and sell it to third parties after the FCC (US based) gutted what little network privacy/neutrality we had in the past administration, so since then virtually all outgoing traffic goes over a pair of VPNs just to avoid, or at least make more difficult being another data-point in the internet marketing machine.
There’s the privacy of data at rest, can anyone on my own network or that comes into contact with my systems read things that they shouldn’t be? File permissions or to the extreme end full disk encryption comes into play.
All personal preference and risk tolerances. Some are fine with putting all their personal info and that of their contacts in public hands, that’s why places like Facebook exist to begin with. I’m pretty far on the other end of that spectrum.
Never rely on being able to delete anything that has been published/posted. If you want privacy, don’t post it. Yes, some systems make it easier to delete a post, but you can never rely on it being deleted everywhere (someone could have made a screenshot, etc.).
At the very least your data will live on in some backups past the point of deletion for a while longer and might very well be restored (even accidentally due to some unrelated data loss) after you deleted it.
This. Not being able to ensure permanent deletion ≠ unprivate. It is a public community / set of websites, you have to lower your expectation of privacy (for services like this.)
I think you have a pretty weird understanding of “privacy” if you think that you have it when posting a comment in a publicly-accessible forum.
If you post it in a place I can find it, I can scrape it, store it, use it for my own putposes, in perpetuity. You might be able to convince a government to tell me to stop, but there is no guarantee I haven’t stored it somewhere you and they don’t know about.
That’s simply the nature of information. You don’t get to control my memory. Once you’ve put an idea in my head, you don’t get to take it back. That idea you put in my head is now my idea. It’s my thought.
You can’t unring the bell. You can keep a thought private, or you can post it. But once you’ve posted it, you can’t make it truly private again.
Then again this is the idea the European ‘right to be forgotten’ wants us to believe.
No, the right to be forgotten is about data that can be used to identify you stored by a service provider. It’s not a right to have every record on the internet purged.
I guess you could force Instances one by one to forget you, but a single provider only has to make sure they deleted the data they stored.
No, that right is to have info tracable to you personally removed, not to have every word you ever stated removed. As long as they anonymise it, they’re good legally and can keep all other data online.
They also only have to send a data delete request to those they shared it with. Any data that got scraped or taken in other ways from them by a third party is technically not protected under that law, and would require a deletion request from you to them. And let now that be the technique used to federate.
Not to forget that the law only counts for services hosted in or aimed at European Union citizen. For example, an American Lemmy instance aimed specifically at American citizen isn’t bound by it, even if you join as a European Union Citizen. If they market to the whole world or such, then they are bound by it. But then, with a US-based server it’s already nearly impossible to be GDPR compliant as US-law is by default against GDPR. Hence big SNS’s having EU subsidaries and servers (and still have huge disagreements, lawsuits and fines about how data gets shared between those and non-EU servers). Point being, with defederated systems, there are bound to be servers with your data that are outside the scope of the GDPR. The whole thing is more complex than “I live in the EU so all sites need to comply when it regards me”.
Wishful thinking. They’ve deluded themselves into thinking data can be externally controlled. The fact that the Pirate Bay is still in operation should have given them a hint.
I beg to differ. It’s indeed possible to scrape and store any comment indefinitely, but there are certainly ways to limit the size and prevalence of that happening. With rate limiting, bot detection and legal enforcement you can reduce the likelihood that someone will scrape and store all your comments. By accepting that everything will be scraped, you are unnecessarily conceding privacy.
What the hell are you even talking about?
A post in a publicly accessible forum is a billboard on the highway. You put it up and anyone can read it. You have zero expectation of privacy after having done that.
Changing the speed limit on the highway (“Rate Limiting”) in no way affects the fact that you put up the billboard on the first place. People may be driving by a little slower, but they’re only reading what you chose to present for them to read.
Scraping does not infringe on privacy. The privacy infringement is that you made the post in the first place. Under normal circumstances, you are the only person at all capable of infringing on your privacy. Exceptions would be someone spoofing your credentials to create the post without your authorization, but someone who does that victimizes both you and the forum hosting your post.
What you’re talking about is more closely related to intellectual property protections like copyright. A musician can play their song over the radio without surrendering copyright protection. Nobody else can make (commercial) use of that song just because it has appeared in a public space.
Besides the technical considerations others have posted: who exactly owns what you post? If you send a letter to someone, is it you or the receiver who owns the letter? If you post something publicly, is it you or the collective public that owns that post now?
Thinking about this via copyright is not very helpful as small text are rarely complex enough to fall under copyright. However most large social media force you to sign over the copyright of anything you post to them anyways.
And the law IANAL does not consider public texts to be “personal data”. This is reserved to you real name, address, email, IP etc. Stuff that can be used to identify you, but not more.
From what I recall copyright applies to any original composed media at the point of its creation. There are fair use exemptions in the US for things like critique and parody but not for replication and redistribution. That all said, I would think logically that the public posting of a private message would potentially fall under a violation if someone really pushed it, but it gets fuzzy. A public statement no longer has any reasonable expectation that it be constrained to any particular person so I think at that point the horse left the stable.
An area that I find interesting in a legally dubious sense is the space of revenge porn sites. If the person posting it was also the one filming it, with consent at the time I would think that technically they own the rights to it to do as they will, but that’s not the case according to plenty of places with laws against it.
Not a lawyer by any means, but I tend to make them nuts when talking to them just with all the ‘what if’ corner scenarios my mind cooks up.
If you’re talking to the public, nothing you say is private. That includes federated systems like Mastodon and Lemmy. If you want privacy and federation, using an encrypted Matrix chat. There’s still of course the caveat that the people you’re talking with can leak your chats, since they have a copy of them, so don’t talk to glowies.
What’s a glowie?
deleted by creator
You can’t truly delete anything period, anything posted publicly can be copied. What’s more important is if it’s verifiable. I can trivially edit your post locally and take a screenshot and pretend it’s you, but there’s nothing verifying you actually said it.
It’s possible through encryption to verify that something was actually said, but most of the time we verify things through trust, we trust centralized services to have an accurate record of what happened. We trust social networks to not alter the original content posted to it. We trust archive organizations to store an original copy securely as it was at the time.
But that trust can be broken. u/spez himself has admitted to altering comments (happen in 2016, huge red flag), and we can only trust that archivers did their job properly.
You can prove that a post was truly made and unedited via encryption, but even then you’re still trusting that all the clients you are using are not doing anything nefarious in between. Unless you read the source code and compile your own applications you can’t know for sure, so still, trust is a big part.
But if you can prove a post was made, how do you unprove it? I don’t really see how that’s mathematically possible. So when you “delete” something on the internet, you can’t really remove it completely.
So what does “deleting” something actually mean? What it really means is “please stop hosting this and monetizing it on your server”, and it’s not even possibly to be sure they deleted all of it internally, you can only really check that they are no longer showing it to the public. That’s easy enough to do when it’s a centralized service, but for anything decentralized it means going to every single server and getting them all to delete it. You can send out a signal asking them nicely to delete it, and I don’t know if Lemmy has this, but even if they did it’s unenforceable to get a server to fully delete something, but you could put some rules in place that it needs to be publicly inaccessible otherwise the instance gets defederated or something, but I don’t know how hard it would be to implement something like that. The resources required to verify that all instances have stopped serving it and don’t begin to serve it later may be far too high to be practical.
Posting something on a public forum will never be private, no matter where exactly. There’s so much ways for this content to get “saved” like web scrapers, web archive, screenshots etc.
It could be possible if all instances get the “delete”-message and adhere to it. Federated servers are not the best for privacy, but is any service that let’s you post to the general public? You don’t even need an account to view content on Lemmy or /kbin. And everybody can setup something malicious to federate everything and do an analysis or whatever on it. Furthermore you need to trust your instance owner, since that person knows the most about you (also your PM’s within the server, your IP, email, etc.). So you now need to trust a person instead of a company. But I think the fediverse is a nice alternative to all the corporate versions. And at the moment we are not the product. But someone will benefit from the openess of the fediverse.
I’m also pretty sure that message edits and deletions are federated, but as other have said; it completely relies on the recieving server respecting those messages.
I mean, everything you post is archived somewhere else in the same way. There’s multiple copies of everything on Reddit elsewhere, even if you delete your account and comments successfully without them restoring it.
In every other way than the one you mention, federation is more private. Especially since the data isn’t being collected to be sold.
The data isn’t being collected to be sold /yet/.
Eh, if i wanted to i could scrape various instances and do whatever i want. The question is would i be able to pin information to identities without the accompanying meta data or would i only get a big text corpus.
The benefit of the fediverse is that no instance can sell the whole package, because most instances only have the posts of most users and nothing more. Reddit has way more additional data that can be used to track and identify someone.
Agreed, and the emphasis on the Lemmy GitHub page that it has “full delete” doesn’t have any kind of disclaimer about this. Newcomers to the project just don’t realize it.
I agree with everyone that this is an issue that affects all data on the Internet and everyone should treat what they publish with this in mind.
However, it may be worth requesting a feature for broadcasting delete requests to federated instances, though it’s not going to be a high priority as the devs prioritize scaling, bug fixes, and mobile app development.
Also, one would hope you could trust your admins to defederate from instances who aren’t respecting delete requests.
AFAIK deletes do get federated. Also, it would be trivial to implement soft delete where the data is retained but bot shown.
In some cases soft deletes are good and useful because it allows things “deleted” for spam and rule breaking to be retained and used to build a case against a bad user/bot or train spam filters, etc.
Public chats are, well public. If you are in a public chat then everyone can see what you say. Encryption or any other attempt to make it private are silly here. If you are in a private, encrypted group, then only those people can see what you say (unless someone leaks). If you are in a e2ee personal chat with one other person, then only the 2 of you know what is being said. If you send a regular email that is the same as a postcard and anyone can look at what it says. You choose where and how you want to speak and adjust accordingly.
Not just posts. Any message you are sending to anyone is unencrypted and can easily be looked at by instance owners.
Unextended ActivityPub makes it so that everything is shared without privacy, any operation is “best-effort” and also depends on the goodwill of the target (ie: a server could refuse all delete on purpose or refuse to deliver messages/posts without any hints)