cross-posted from: https://lemmy.ca/post/37011397

[email protected]

The popular open-source VLC video player was demonstrated on the floor of CES 2025 with automatic AI subtitling and translation, generated locally and offline in real time. Parent organization VideoLAN shared a video on Tuesday in which president Jean-Baptiste Kempf shows off the new feature, which uses open-source AI models to generate subtitles for videos in several languages.

  • renzev@lemmy.world
    link
    fedilink
    English
    arrow-up
    11
    ·
    2 hours ago

    This sounds like a great thing for deaf people and just in general, but I don’t think AI will ever replace anime fansub makers who have no problem throwing a wall of text on screen for a split second just to explain an obscure untranslatable pun.

  • TheRealKuni@lemmy.world
    link
    fedilink
    English
    arrow-up
    9
    ·
    3 hours ago

    And yet they turned down having thumbnails for seeking because it would be too resource intensive. 😐

  • Phoenixz
    link
    fedilink
    English
    arrow-up
    26
    arrow-down
    2
    ·
    edit-2
    4 hours ago

    As vlc is open source, can we expect this technology to also be available for, say, jellyfin, so that I can for once and for all have subtitles.done right?

    Edit: I think it’s great that vlc has this, but this sounds like something many other apps could benefit from

  • m8052@lemmy.world
    link
    fedilink
    English
    arrow-up
    56
    ·
    5 hours ago

    What’s important is that this is running on your machine locally, offline, without any cloud services. It runs directly inside the executable

    YES, thank you JB

  • m-p{3}A
    link
    fedilink
    English
    arrow-up
    37
    ·
    edit-2
    6 hours ago

    Now I want some AR glasses that display subtitles above someone’s head when they talk à la Cyberpunk that also auto-translates. Of course, it has to be done entirely locally.

    • Obi@sopuli.xyz
      link
      fedilink
      English
      arrow-up
      8
      ·
      4 hours ago

      I guess we have most of the ingredients to make this happen. Software-wise we’re there, hardware wise I’m still waiting for AR glasses I can replace my normal glasses with (that I wear 24/7 except for sleep). I’d accept having to carry a spare in a charging case so I swap them out once a day or something but other than that I want them to be close enough in terms of weight and comfort to my regular glasses and just give me AR like overlaid GPS, notifications, etc, and indeed instant translation with subtitles would be a function that I could see having a massive impact on civilization tbh.

      • vvv@programming.dev
        link
        fedilink
        English
        arrow-up
        1
        ·
        2 hours ago

        I think we’re closer with hardware than software. the xreal/rokid category of hmds are comfortable enough to wear all day, and I don’t mind a cable running from behind my ear under a clothes layer to a phone or mini PC in my pocket. Unfortunately you still need to byo cameras to get the overlays appearing in the correct points in space, but cameras are cheap, I suspect these glasses will grow some cameras in the next couple of iterations.

      • m-p{3}A
        link
        fedilink
        English
        arrow-up
        2
        ·
        4 hours ago

        I believe you can put prescription lenses in most AR glasses out there, but I suppose the battery is a concern…

        I’m in the same boat, I gotta wear my glasses 24/7.

    • shyguyblue@lemmy.world
      link
      fedilink
      English
      arrow-up
      94
      arrow-down
      2
      ·
      7 hours ago

      I was just thinking, this is exactly what AI should be used for. Pattern recognition, full stop.

      • snooggums@lemmy.world
        link
        fedilink
        English
        arrow-up
        40
        arrow-down
        3
        ·
        7 hours ago

        Yup, and if it isn’t perfect that is ok as long as it is close enough.

        Like getting name spellings wrong or mixing homophones is fine because it isn’t trying to be factually accurate.

        • vvv@programming.dev
          link
          fedilink
          English
          arrow-up
          4
          ·
          2 hours ago

          I’d like to see this fix the most annoying part about subtitles, timing. find transcript/any subs on the Internet and have the AI align it with the audio properly.

        • TJA!@sh.itjust.works
          link
          fedilink
          English
          arrow-up
          20
          arrow-down
          2
          ·
          6 hours ago

          Problem ist that now people will say that they don’t get to create accurate subtitles because VLC is doing the job for them.

          Accessibility might suffer from that, because all subtitles are now just “good enough”

          • snooggums@lemmy.world
            link
            fedilink
            English
            arrow-up
            4
            ·
            1 hour ago

            Regular old live broadcast closed captioning is pretty much ‘good enough’ and that is the standard I’m comparing to.

            Actual subtitles created ahead of time should be perfect because they have the time to double check.

          • Railcar8095@lemm.ee
            link
            fedilink
            English
            arrow-up
            12
            arrow-down
            1
            ·
            5 hours ago

            Or they can get OK ones with this tool, and fix the errors. Might save a lot of time

          • TachyonTele@lemm.ee
            link
            fedilink
            English
            arrow-up
            5
            ·
            6 hours ago

            I have a feeling that if you care enough about subtitles you’re going to look for good ones, instead of using “ok” ai subs.

          • LandedGentry@lemmy.zip
            link
            fedilink
            English
            arrow-up
            3
            ·
            5 hours ago

            Honestly though? If your audio is even half decent you’ll get like 95% accuracy. Considering a lot of media just wouldn’t have anything, that is a pretty fair trade off to me

            • TheMachineStops@discuss.tchncs.de
              link
              fedilink
              English
              arrow-up
              2
              ·
              edit-2
              1 hour ago

              From experience AI translation is still garbage, specially for languages like Chinese, Japanese, and Korean , but if it only subtitles in the actual language such creating English subtitles for English then it is probably fine.

          • shyguyblue@lemmy.world
            link
            fedilink
            English
            arrow-up
            2
            ·
            edit-2
            29 minutes ago

            I imagine it would be not-exactly-simple-but-not- complicated to add a “threshold” feature. If Ai is less than X% certain, it can request human clarification.

            Edit: Derp. I forgot about the “real time” part. Still, as others have said, even a single botched word would still work well enough with context.

    • LandedGentry@lemmy.zip
      link
      fedilink
      English
      arrow-up
      6
      arrow-down
      1
      ·
      edit-2
      5 hours ago

      Yeah it’s pretty wonderful To see how far auto generated transcription/captioning has become over the last couple of years. A wonderful victory for many communities with various disabilities.

  • VerPoilu@sopuli.xyz
    link
    fedilink
    English
    arrow-up
    22
    arrow-down
    1
    ·
    7 hours ago

    I hope Mozilla can benefit of a good local translation engine that could come out of it as well.

        • viking@infosec.pub
          link
          fedilink
          English
          arrow-up
          1
          ·
          1 hour ago

          And it takes forever. I’m using the TWP plugin for Firefox (which uses external resources, configurable to google, bing and yandex translate respectively), and it’s near instantaneous. The local one from Mozilla often takes 30 seconds, and sometimes hangs until I refresh the page.

  • SuperCub@sh.itjust.works
    link
    fedilink
    English
    arrow-up
    8
    ·
    6 hours ago

    Haven’t watched the video yet, but it makes a lot of sense that you could train an AI using already subtitled movies and their audio. There are times when official subtitles paraphrase the speech to make it easier to read quickly, so I wonder how that would work. There’s also just a lot of voice recognition everywhere nowadays, so maybe that’s all they need?