• rockSlayer@lemmy.world
      link
      fedilink
      arrow-up
      38
      ·
      7 months ago

      Well that’s part of the thing. Web scraping doesn’t get covered by policies. Like, they could ban your ip or any accounts you have, but web scraping itself will always be acceptable. It’s why projects like NewPipe and Invidious don’t care about YouTube cease and desist letters.

        • krippix@feddit.de
          link
          fedilink
          English
          arrow-up
          1
          ·
          7 months ago

          In what way?

          HTML definitely provides more overhead than json if you only care about the data.

        • folkrav
          link
          fedilink
          arrow-up
          1
          ·
          7 months ago

          Parsing absolutely comes with a lot more overhead. Especially since many websites integrate a lot of JS interactivity nowadays, you oftentimes don’t get the full contents you’re looking for straight out of the HTML you’re getting out of your HTTP request, depending on the site.

    • freebread@lemm.ee
      link
      fedilink
      English
      arrow-up
      9
      ·
      7 months ago

      Still waiting for the news that they took down old.reddit. Without the third party apps, that was the only way it could still be usable.