• 6 Posts
  • 274 Comments
Joined 7 months ago
cake
Cake day: August 27th, 2025

help-circle


  • World models aren’t just for robotics (though they definitely WILL be used for that). They’re for reasoning under uncertainty in domains where you can’t see the outcome in advance. Eg:

    Medical diagnosis: you can’t physically “embody” whether a treatment will work. But a system that understands disease progression, drug interactions, and physiological constraints (not by pattern-matching text, but by learning causal structure) - well, that’s fundamentally different from an LLM hallucinating plausible-sounding symptoms.

    Financial modeling, engineering simulations, climate prediction…all domains where the “embodied experience” is simulation, not physical interaction. You learn how the world actually works by understanding constraint and causality, not by predicting the next token in a Bloomberg article.

    The point isn’t “robots will finally work.” The point is: understanding causality is cheaper in the long run and more reliable than memorizing correlations. Embodiment is just the training signal that forces you to learn causality instead of surface patterns.

    My read is that LeCun’s betting that a system trained to predict abstract state transitions in any domain (be that medical, financial, physical) will generalize better / hallucinate less than one trained to predict text.

    Whether that’s true? Fucked if I know - that’s why it’s (literally) the billion-dollar question. If he cracks it…it’s big.

    But “it won’t cook dinner” misses the point (and besides which, it might actually cook dinner and change lightbulbs, so…)






  • As I mentioned elsewhere (below) I am currently conducting similar testing across 4 different 4B models (Qwen3-4B Hivemind, Qwen3-4B-2507-Instruct, Phi-4-mini, Granite-4-3B-micro), using both grounded and ungrounded conditions. Aiming for 10,000 runs, currently at 3,500.

    Not to count chickens before they hatch - but at ctx 8192, hallucination flags in the grounded condition are trending toward near-zero across the models tested (so far). If that holds across the full campaign, useful to know. If it doesn’t hold, also useful to know.

    I have an idea for how to make grounded state even more useful. Again, chickens not hatched blah blah. I’ll share what I find here if there’s interest. I’m intending to submit the whole shooting match for peer review (TMLR or JMLR) and put it on arXiv for others to poke at.

    I realize this is peak “fine, I’ll do it myself” energy after getting sick of ChatGPT’s bullshit, but I got sick of ChatGPT’s bullshit and wanted to try something to ameliorate it.



  • I dunno. Some strange relic from the 1980s?

    Kidding aside, it’s shocking how bad raw YT is. We watch it via Smartube (or PipePipe as needed). I can’t believe people watch “raw” youtube…it’s unwatchable.

    If they ever quash SmartTube and PipePipe…well…I imagine Peertube, Nebula, Libby and Curiosity stream will suddenly become a great deal more popular.

    I don’t think the powers that be fully grasp the (very delicate) knife edge they walk. They only stay in business so long as they aren’t annoying enough to be replaced. Actually, who am I kidding - they know that exactly and play the delicate “gently gently” boil frogs in a pot game like grandmasters.













  • That seems like… a little much. I do agree that upvotes/downvotes indeed gamify the system, but on the whole would say that the end-effect on Reddit results in a big bunch of hoomons acting in typical hoomon ways, which is with deep undercurrents of fickle, ignorant, selfish, feel-good behavior.

    I’d argue that’s a restatement of my position with better adjectives :)

    I’d say I agree with most of the things you wrote, but remain unconvinced that upvote/downvote is so absolutely toxic as to merit tossing. And of course, I don’t think it’s going to happen, anyway.

    Aggregate behaviour amongst naked apes? Yeah, I would tend to agree. Now what?

    Well, 2 options:

    1. Kill all the apes (or just wait 15 more minutes)

    2. Enjoy Lemmy

    I’m trending towards 2 myself