Note: AS far as I know, Calibre does not do OCR, so a PDF with only scanned cont...

ggpsv · on Nov 18, 2023

I've had good luck using Tesseract [0] for scanned PDFs. If you're not CLI-inclined, there are several GUIs for it available [1]. I have had good luck downloading scanned PDFs from archive.org and running them through Tesseract.

Did not know about Calibre for this - I was relying on opening each search and searching it individually.

[0]: https://github.com/tesseract-ocr/tesseract [1]: https://www.opait.com/tessstudio/

kristofferR · on Nov 18, 2023

OCRmyPDF is a tool using Tesseract, specifically designed for PDFs. I would recommend that over pure Tesseract.

https://github.com/ocrmypdf/OCRmyPDF

kristofferR · on Nov 18, 2023

I recommend running any such PDFs through OCRmyPDF.

https://github.com/ocrmypdf/OCRmyPDF