Well not quite but close. I’m holding a hard disk that has ALL of Wikipedia’s text in 10 different languages.
Yes you can download all of Wikipedia and yes it can easily fit in a hard drive. Isn’t that amazing? Text is incredibly dense compared to images and video. Around 22 GiB for English Wikipedia alone and 56 GiB for the 10 languages I downloaded.
I also have all of Wiktionary in the same hard drive. It’s around 16.4 GiB.
It also connects you to a huge swath of humanity and the editors that brought that content to you.
Yeah it’s pretty incredible. Wikimedia is the kind of project that almost feels like a small glimpse into a better world. What the internet could have been. It’s got some problems of course but it’s still a huge success.
Uh, wikipedia is what the internet is.
Wikipedia’s not a glimpse of a better world, it’s a glimpse of our current, existing world. Because wikipedia exists.
It’s not like that hard drive came through a portal from another universe.
You’re going to be the savior of humanity after the apocalypse
Not the sum. The summary.
Last time I looked into downloading Wikipedia it said it was 50gb for English text and 100 with images. How’d you get it for half the space?
It’s only the raw text in json line files. No media and no markup. I think I downloaded a compressed dump then used wikiextractor to extract the text.
Does it include each article’s edit history, talk page, etc?
The dictionaries for Aard2 are 21gb in .slob compressed format (text only).
No idea what that means. But thank you for adding more info.
OK yes, some supporting info is: Aard2 is an offline wikipedia app, that uses small compressed data files in .slob format.
Slob compression is best visualized as putting a sleeping bag into a stuff sack, except it’s all your possessions and you’re stuffing them into an old Chevy Metro