Well not quite but close. I’m holding a hard disk that has ALL of Wikipedia’s text in 10 different languages.
Yes you can download all of Wikipedia and yes it can easily fit in a hard drive. Isn’t that amazing? Text is incredibly dense compared to images and video. Around 22 GiB for English Wikipedia alone and 56 GiB for the 10 languages I downloaded.
I also have all of Wiktionary in the same hard drive. It’s around 16.4 GiB.
Last time I looked into downloading Wikipedia it said it was 50gb for English text and 100 with images. How’d you get it for half the space?
It’s only the raw text in json line files. No media and no markup. I think I downloaded a compressed dump then used wikiextractor to extract the text.
Does it include each article’s edit history, talk page, etc?
The dictionaries for Aard2 are 21gb in .slob compressed format (text only).
No idea what that means. But thank you for adding more info.
OK yes, some supporting info is: Aard2 is an offline wikipedia app, that uses small compressed data files in .slob format.
Slob compression is best visualized as putting a sleeping bag into a stuff sack, except it’s all your possessions and you’re stuffing them into an old Chevy Metro