Basically, it’s a calculator that can take letters, numbers, words, sentences, and so on as input.

And produce a mathematically “correct” sounding output, defined by language patterns in the training data.

This core concept is in most if not all “AI” models, not just LLMs, I think.

  • howrar
    link
    fedilink
    arrow-up
    4
    ·
    2 months ago

    mathematically “correct” sounding output

    It’s hard to say because that’s a rather ambiguous way of describing it (“correct” could mean anything), but it is a valid way of describing its mechanisms.

    “Correct” in the context of LLMs would be a token that is likely to follow the preceding sequence of tokens. In fact, it computes a probability for every possible token, then takes a random sample according to that distribution* to choose the next token, and it repeats that until some termination condition. This is what we call maximum likelihood estimation (MLE) in machine learning (ML). We’re learning a distribution that makes the training data as likely as possible. MLE is indeed the basis of a lot of ML, but not all.

    *Oversimplification.