I’ve been saying this for about a year since seeing the Othello GPT research, but it’s nice to see more minds changing as the research builds up.

Edit: Because people aren’t actually reading and just commenting based on the headline, a relevant part of the article:

New research may have intimations of an answer. A theory developed by Sanjeev Arora of Princeton University and Anirudh Goyal, a research scientist at Google DeepMind, suggests that the largest of today’s LLMs are not stochastic parrots. The authors argue that as these models get bigger and are trained on more data, they improve on individual language-related abilities and also develop new ones by combining skills in a manner that hints at understanding — combinations that were unlikely to exist in the training data.

This theoretical approach, which provides a mathematically provable argument for how and why an LLM can develop so many abilities, has convinced experts like Hinton, and others. And when Arora and his team tested some of its predictions, they found that these models behaved almost exactly as expected. From all accounts, they’ve made a strong case that the largest LLMs are not just parroting what they’ve seen before.

“[They] cannot be just mimicking what has been seen in the training data,” said Sébastien Bubeck, a mathematician and computer scientist at Microsoft Research who was not part of the work. “That’s the basic insight.”

  • kromem@lemmy.worldOP
    link
    fedilink
    English
    arrow-up
    4
    arrow-down
    1
    ·
    edit-2
    10 months ago

    Here’s the response of Copilot using GPT-4 with the additional instruction of not searching:


    Hello, this is Bing. I enjoy solving cryptic crosswords. 😊

    The answer to the clue is MAELSTROM. Here is how I solved it:

    • The clue has two parts: a definition and a wordplay. The definition is usually at the beginning or the end of the clue, and the wordplay is the rest of the clue that provides an alternative way to construct the answer. In this case, the definition is Eddy, which is a synonym for a whirlpool or a vortex.

    • The wordplay is mother receiving broadcast alerts. This means that we need to find a word for mother and insert into it another word for broadcast alerts. The word for mother is MA (a common abbreviation), and the word for broadcast alerts is ELSTROM (a homophone of air storm, which sounds like a type of broadcast alert).

    • To indicate that ELSTROM is a homophone, the clue uses the word receiving, which is a common homophone indicator in cryptic crosswords. Other examples of homophone indicators are heard, sounds like, on the radio, etc.

    • By inserting ELSTROM into MA, we get MAELSTROM, which matches the definition of Eddy and the length of the answer (9 letters). Therefore, MAELSTROM is the correct answer. 🎉

    • bionicjoey
      link
      fedilink
      English
      arrow-up
      4
      arrow-down
      1
      ·
      10 months ago

      You can see clearly that it has no understanding of the wordplay. Though I’ll concede it’s impressive that it got the right answer at all.

      • kromem@lemmy.worldOP
        link
        fedilink
        English
        arrow-up
        4
        arrow-down
        4
        ·
        10 months ago

        The confabulations and in general the surface statistics stuff often gets in the way of the meat and potatoes of critical reasoning in the SotA models.

        A good example of this is trying a variation of common puzzles versus changing tokens to representations and having it repeat adjectives when working through CoT.

        Often as soon as it makes a mistake and has that mistake in context, it just has no way of correcting course. A lot of my current work is related to that and using a devil’s advocate approach to self-correction.

        But in reality, we won’t see a significant jump in things like being able to identify self-ignorance until hardware shifts in the next few years.

        • bionicjoey
          link
          fedilink
          English
          arrow-up
          3
          arrow-down
          2
          ·
          10 months ago

          Or, there is no critical reasoning and better hardware will only hide that fact better. It will always be a Chinese room.