Project idea for learning

Great Blue Heron · 1 year ago

Project idea for learning

some_guy@lemmy.sdf.org · 1 year ago

The best language for automation is the one you know best. The second best is one you have to learn.

I think you could do this in bash with YouTube-dl.

Diabolo96@lemmy.dbzer0.com · edit-2 1 year ago

Indeed. while my bash-fu is redimentary at best, I don’t think Bash can be used for web scrapping ? But I think he could use RSS to get the posts, then extract youtube links with Regex and use the dump feature of yt-dlp* to get the video category, title,etc by using jq to parse the json. Then, it’s probably just a matter of using curl to do the API calls and voilà.

*yt-dlp is better maintained than youtube-dl, or so I heard.

some_guy@lemmy.sdf.org · 1 year ago

I built two scrapers for a website that hosts images and videos using bash.

They’re educational, I swear! /s

I looked through the html and figured out regexes for their media. The scripts will parse all the links on the thumbnail pages and then load the corresponding primary pages with curl. On those pages, it then uses wget to grab the file. Some additional pattern matching names the file to the name of the post.

It’s probably convoluted, but you can accomplish a lot in bash if you want to.

Diabolo96@lemmy.dbzer0.com · 1 year ago

Man, there’s something really wrong with lemmy lately. I only got the notification for your comment 8 days after you sent it. It’s the third time this happens but this must be the longest time before the notification reaches me.

some_guy@lemmy.sdf.org · 1 year ago

Yes, there’s a discussion about this on my instance. Someone there provided a link to where this was getting addressed. Some aspects of federation have been broken for a bit.

https://github.com/LemmyNet/lemmy/issues/4288#issuecomment-1878442186

Diabolo96@lemmy.dbzer0.com · 1 year ago

Hope it get fixed soon.

some_guy@lemmy.sdf.org · 1 year ago

Seems like it. My inbox had five replies yesterday (after >1w of only local replies). Today, even more. Yesterday, the GUI was partially broken. Today looks normal.