• NoSpotOfGround@lemmy.world
    link
    fedilink
    English
    arrow-up
    254
    ·
    edit-2
    3 days ago

    The real meat of the story is in the referenced blog post: https://blog.codingconfessions.com/p/how-unix-spell-ran-in-64kb-ram

    TL;DR

    If you’re short on time, here’s the key engineering story:

    • McIlroy’s first innovation was a clever linguistics-based stemming algorithm that reduced the dictionary to just 25,000 words while improving accuracy.

    • For fast lookups, he initially used a Bloom filter—perhaps one of its first production uses. Interestingly, Dennis Ritchie provided the implementation. They tuned it to have such a low false positive rate that they could skip actual dictionary lookups.

    • When the dictionary grew to 30,000 words, the Bloom filter approach became impractical, leading to innovative hash compression techniques.

    • They computed that 27-bit hash codes would keep collision probability acceptably low, but needed compression.

    • McIlroy’s solution was to store differences between sorted hash codes, after discovering these differences followed a geometric distribution.

    • Using Golomb’s code, a compression scheme designed for geometric distributions, he achieved 13.60 bits per word—remarkably close to the theoretical minimum of 13.57 bits.

    • Finally, he partitioned the compressed data to speed up lookups, trading a small memory increase (final size ~14 bits per word) for significantly faster performance.

  • Troy
    link
    fedilink
    English
    arrow-up
    72
    ·
    4 days ago

    Long article for one sentence of trivia and no info on the algo itself. The death of the internet is upon us.

    • Em Adespoton
      link
      fedilink
      English
      arrow-up
      48
      ·
      edit-2
      4 days ago

      Doesn’t even name the algorithm, and somehow spells LZMA wrong, despite just having written it out longhand.

      Well, it’s PC Gamer.

      [edit] I still can’t figure out if they’re referencing LZW encoding… the L and Z being the same Lempel and Ziv from LZMA, but with Welch having a different solution for the rest of the algorithm due to size constraints.

      • Troy
        link
        fedilink
        English
        arrow-up
        23
        ·
        4 days ago

        Probably mostly AI written.

    • GrabtharsHammer@lemmy.world
      link
      fedilink
      English
      arrow-up
      20
      ·
      4 days ago

      I’d like to imagine they took the short trivia fact and applied the inverse of the compression algorithm to bloat it into something that satisfied the editor.

    • rice@lemmy.org
      link
      fedilink
      English
      arrow-up
      7
      ·
      3 days ago

      The blog post it links to has all the info, but it is more of a series of changes to the dictionary instead of 1 set thing

  • 0x0@programming.dev
    link
    fedilink
    English
    arrow-up
    2
    arrow-down
    1
    ·
    3 days ago

    Only 1 GiB of RAM? Moooom!
    Shut up Johnny, Voyager’s still out there with way less.

    • rmuk@feddit.uk
      link
      fedilink
      English
      arrow-up
      2
      ·
      3 days ago

      Yeah, but I’ve not got two hundred Firefox tabs open on Voyager.