• antonim@lemmy.dbzer0.com
    link
    fedilink
    arrow-up
    4
    ·
    1 day ago

    to fool into errors

    tricking a kid

    I’ve never tried to fool or trick AI with excessively complex questions. When I tried to test it (a few different models over some period of time - ChatGPT, Bing AI, Gemini) I asked stuff as simple as “what’s the etymology of this word in that language”, “what is [some phenomenon]”. The models still produced responses ranging from shoddy to absolutely ridiculous.

    completely detached from how anyone actually uses

    I’ve seen numerous people use it the same way I tested it, basically a Google search that you can talk with, with similarly shit results.

    • archomrade [he/him]@midwest.social
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 day ago

      Why do we expect a higher degree of trustworthiness from a novel LLM than we de from any given source or forum comment on the internet?

      At what point do we stop hand-wringing over llms failing to meet some perceived level of accuracy and hold the people using it responsible for verifying the response themselves?

      Theres a giant disclaimer on every one of these models that responses may contain errors or hallucinations, at this point I think it’s fair to blame the user for ignoring those warnings and not the models for not meeting some arbitrary standard.

      • antonim@lemmy.dbzer0.com
        link
        fedilink
        arrow-up
        1
        ·
        edit-2
        9 minutes ago

        Why do we expect a higher degree of trustworthiness from a novel LLM than we de from any given source or forum comment on the internet?

        The stuff I’ve seen AI produce has sometimes been more wrong than anything a human could produce. And even if a human would produce it and post it on a forum, anyone with half a brain could respond with a correction. (E.g. claiming that an ordinary Slavic word is actually loaned from Latin.)

        I certainly don’t expect any trustworthiness from LLMs, the problem is that people do expect it. You’re implicitly agreeing with my argument that it is not just that LLMs give problematic responses when tricked, but also when used as intended, as knowledgeable chatbots. There’s nothing “detached from actual usage” about that.

        At what point do we stop hand-wringing over llms failing to meet some perceived level of accuracy and hold the people using it responsible for verifying the response themselves?

        at this point I think it’s fair to blame the user for ignoring those warnings and not the models for not meeting some arbitrary standard

        This is not an either-or situation, it doesn’t have to be formulated like this. Criticising LLMs which frequently produce garbage is in practice also directed at people who do use them. When someone on a forum says they asked GPT and paste its response, I will at the very least point out the general unreliability of LLMs, if not criticise the response itself (very easy if I’m somewhat knowledgeable about the field in question). This is practically also directed at the person who posted that, such as e.g. making them come off as naive and uncritical. (It is of course not meant as a real personal attack, but even a detached and objective criticism has a partly personal element to it.)

        Still, the blame is on both. You claim that:

        Theres a giant disclaimer on every one of these models that responses may contain errors or hallucinations

        I don’t remember seeing them, but even if they were there, the general promotion and ways in which LLMs are presented in are trying to tell people otherwise. Some disclaimers are irrelevant for forming people’s opinions compared to the extensive media hype and marketing.

        Anyway my point was merely that people do regularly misuse LLMs, and it’s not at all difficult to make them produce crap. The stuff about who should be blamed for the whole situation is probably not something we disagree about too much.