Are there any good products for batch scanning and OCR of documents? Or services?
Fujitsu ScanSnap. Any model.
Do these do OCR?
It’s been a while since I used mine, but worst case, your PDF app does the OCR, whereas this machine scans large batches both quickly and reliably.
(I stopped using mine because I switched from Mac OS to Linux. I got 10 happy years from it.)
If it’s all pretty much letter/legal size (or international equivalent), the Canon imageFORMULA DR-C230 (and similar models) do a great job. The scanning software it comes with will scan to PDF and includes OCR.
Does the it put out like a txt or word doc or something?
If you want OCR, it outputs a PDF.
But it can also output to image formats like PNG (but images don’t have OCR).
There are lots of options in the software, but the defaults work for 90% of regular documents.
It’s been a minute, but when I was looking at these types of software (not this specific one mentioned) they can create a PDF with the scan images and OCR text overlaid on top of the scan.
Just use bedrock Claude or gpt, they both do that really well I’ve found.
came here to say the same thing. You can self-host your own even Gemma in my experience was also very good.
I can recommend naps2 as a scanning application. But you still need either a lot of time or a proper document scanner.
So, that’s surprisingly to setup.
https://docs.paperless-ngx.com/
Use docker. I recommend just getting a dedicated Synology Nas and using the container manager (that’s what they call docker in the os). Just get the two nvmes and some extra ram