A new study from Columbia Journalism Review showed that AI search engines and chatbots, such as OpenAI’s ChatGPT Search, Perplexity, Deepseek Search, Microsoft Copilot, Grok and Google’s Gemini, are just wrong, way too often.

  • criitz@reddthat.com
    link
    fedilink
    English
    arrow-up
    48
    arrow-down
    1
    ·
    edit-2
    1 day ago

    When LLMs are wrong they are only confidently wrong. They don’t know any other way to be wrong.

    • 4am@lemm.ee
      link
      fedilink
      English
      arrow-up
      25
      ·
      1 day ago

      They do not know wright from wrong, they only know probability of the next word.

      LLMs are a brute forcing of the immigration of intelligence. They do not think, they are not intelligent.

      But I mean people today believe that 5G vaccines made the frogs gay.

    • kubica@fedia.io
      link
      fedilink
      arrow-up
      8
      ·
      1 day ago

      We only notice when they are wrong, but they can also be right just by accident.

    • Imgonnatrythis@sh.itjust.works
      link
      fedilink
      English
      arrow-up
      3
      arrow-down
      2
      ·
      1 day ago

      This does seem to be exactly the problem. It is solvable, but I haven’t seen any that do it. They should be able to calculate a confidence value based on number of corresponding sources, quality ranking of sources, and how much interpolation of data is being done vs. Straightforward regurgitation of facts.

      • TaviRider@reddthat.com
        link
        fedilink
        English
        arrow-up
        4
        arrow-down
        1
        ·
        22 hours ago

        I haven’t seen any evidence that this is solvable. You can feed in more training data, but that doesn’t mean generative AI technology is capable of using that in the way you describe.

      • xthexder@l.sw0.com
        link
        fedilink
        English
        arrow-up
        2
        arrow-down
        1
        ·
        24 hours ago

        I’ve been saying this for a while. They need to train it to be able to say “I don’t know”. They need to add questions to the dataset without enough information to solve so that it can understand what is/isn’t facts vs hallucinating