• kromem@lemmy.world
    link
    fedilink
    English
    arrow-up
    37
    arrow-down
    3
    ·
    edit-2
    3 months ago

    OP, you do realize that this paper is about image generation and classification based on related data sets and only relates to the image processing features of multimodal models, right?

    How do you see this research as connecting to the future scope of LLMs?

    And why do you think that the same leap we’ve now seen with synthetic data transmitting abstract capabilities in text data won’t occur with images (and eventually video)?

    Edit: Which LLMs do you see in the models they tested:

    Models. We test CLIP [91] models with both ResNet [53] and Vision Transformer [36] architecture, with ViT-B-16 [81] and RN50 [48, 82] trained on CC-3M and CC-12M, ViT-B-16, RN50, and RN101 [61] trained on YFCC-15M, and ViT-B-16, ViT-B-32, and ViT-L-14 trained on LAION400M [102]. We follow open_clip [61], slip [81] and cyclip [48] for all implementation details.

    • Xerxos@lemmy.ml
      link
      fedilink
      English
      arrow-up
      20
      ·
      3 months ago

      I don’t see how that paper has anything to do with OPs theory.

      • kromem@lemmy.world
        link
        fedilink
        English
        arrow-up
        6
        arrow-down
        1
        ·
        3 months ago

        I mean, if we’re playing devil’s advocate to the “WTF is OP talking about” position, I can kind of see the argument around how exponential needs for additional training data combined with the ways in which edge cases are underrepresented from synthetic data sources leading to model collapse could be extrapolated to believing that we’ve hit a plateau resulting from a training data bottleneck.

        In theory there’s an inflection point at which models become sophisticated enough that they can self-sustain with generating training data to recursively improve and whether we will hit that point or not is an open question with arguments on both sides.

        I agree that this paper in relation to the title isn’t exactly the best form of the argument, but I can see how someone only kind of understanding what’s being covered could have felt it was confirming their existing beliefs around where models currently are at and will be in the future.

        The only thing I’ll add is that I was just getting a nice laugh out of looking at if Gary Marcus (a common AI skeptic) has ever been right about anything to date, and saw he had a long post about how deep learning was hitting a wall and we were a far way off from LLMs understanding human text…four days before GPT-4 released.

        In my experience, while contrarian positions to continuing research trends can be correct in a “even a broken clock is right twice a day” sense, personally I wouldn’t put my bets on a reversal of a trend that in its pacing and replication seems to be accelerating, not decelerating.

        In particular regarding OP’s claim, the work over the past 18 months with synthetic data sets from GPT-4 giving tiny models significant boosts in critical reasoning skills during fine tuning should give anyone serious pause on “we’re hitting diminishing returns and model collapse.”

        • General_Effort@lemmy.world
          link
          fedilink
          English
          arrow-up
          1
          ·
          3 months ago

          In theory there’s an inflection point at which models become sophisticated enough that they can self-sustain with generating training data to recursively improve

          That sounds surprising. Do you have a source?