I have a rather bad habit of taking screenshots of random things I want to remember. Something like a product I want to buy and compare price later. On my Mac I have Desktop folder full of these screenshots. And of course it is almost impossible to find anything there weeks or months later among hundreds of files.
When I needed to find something once again I thought that it would be nice to have a way to search these images by text. I spent a few moments searching for a tool that could do this, found nothing and decided to write some code instead. I suspect I didn’t search thoroughly enough and was just looking for an excuse to code that evening though 😄. Especially considering that I didn’t have any experience with OCR (Optical Character Recognition) libraries and it was a good opportunity to learn something new.
I did a quick research and picked EasyOCR library. It is lightweight, easy to use, supports multiple languages and provides quite good results according to various benchmarks. Since it was a Python library, it also predefined my choice of programming language for this task.
I ended up with a CLI app that has two subcommands:
load_and_index
Scans a folder for image files and processes each file with OCR. The file path and extracted text are saved to a PostgreSQL database. During subsequent runs, only new files are processed and added to the database. A list of languages to use for OCR can be specified in the config file.
search
Searches the database for a specified query and returns a list of files containing the query.
On my laptop it takes around 3 seconds to process a single image file, so it can take a while on initial run.
I’m quite happy with the results, it is good enough for my needs. I may implement actual full-text search instead of simple match using ILIKE
operator in future. Sounds like a good reason to try meilisearch
which I wanted to play with for a while.
The code along with instructions on how to use it can be found in this repository on GitHub. Feel free to use it or modify it to your needs.