I love Aria2, but I’m building a web scraper / crawler and I need to download hundreds of thousands of files. Aria2 locks up around the 20,000 file mark. Is there another download manager that could possibly be able to achieve what I’m trying to do? or a more recent fork of Aria2?

I have a workaround I believe, which is to use the API to determine how many files are in queue and sleep indefinitely until there is < 1000, but I’m not sure this is the most effective. It kind of significantly slows down the download pipe.

The issue seems to lie with connections timing out in aria2, which cause them to get locked up and they have to be manually cleared. I have my timeout set at 10 seconds, but that doesn’t seem to matter. I’ve considered running a schedule task to clean them up, but was going to give downloading with Python a try first.

Any suggestions would be appreciated.

  • DryPhilosopher8168@alien.topB
    link
    fedilink
    English
    arrow-up
    2
    ·
    11 months ago

    Pretty sure thats not a limitation of aria2. What system are you using? There a multiple options:

    1. Just build a queue
    2. Check if aria2 locks up because of system limitations. Max open files, max concurrent connections (there will be a practical limit no matter what you do)
    3. build a queue with a cluster to overcome such limitations
  • tetris11@lemmy.ml
    link
    fedilink
    English
    arrow-up
    1
    ·
    11 months ago

    Interesring. Source on the 20,000 file limit? It could just be that you need to increase the number of allowed file descriptors on your OS