Powerful PDF+Image Parsing — Mistral OCR

#mistral #ocr #genai #ai

Mistral AI has recently released a powerful OCR model — Mistral OCR — Their tagline for the model is 1000 pages can be parsed per dollar. Mistral OCR model is said to be multilingual and multimodal. Complex documents in the format of PDF and images can be parsed with the model.

Example Scenarios

Let us look into some of the example scenarios with different documents as inputs to the model and see how it works.

For the first case, I’ve taken a sample image from the internet, with some handwritten notes in English language. Below is the image that I used.

When this image has been passed as an input to the Mistral’s OCR model, it parsed very perfectly and described what’s in the image. Below is the output from the model.

This shows how awesome is the model with respect to handwritten images (handwriting in it is not very awesome :D ) But let’s not stop there, and we shall try another scenario with a different language.

As my mother tongue is Tamizh, I chose to try a document in that (Tamil Language). The document is basically about Indian Constitution with some amendments and welfare of the Indian government. It is a 21-page long document of around 4Mb file size. Below is the link to the document for reference.

Link : https://raw.githubusercontent.com/amrs-tech/storepdf/refs/heads/main/part5_compressed.pdf

The OCR model has truly spoken from the bottom of its heart 😁😄 Just kidding! From the below output from the model, we can tell that it is working very much cooler even for language other than English (proving it to be multilingual).

To play with these scenarios, I did not create a complex python code, I just used the cookbook example from Mistral’s Github — https://github.com/mistralai/cookbook/tree/main/mistral/ocr

Pre-Post-Disclaimer 🙂: You need an API key from Mistral Platform to run this

Happy Learning !!

5 Playwright CLI Flags That Will Transform Your Testing Workflow

0:56 --last-failed
2:34 --only-changed
4:27 --repeat-each
5:15 --forbid-only
5:51 --ui --headed --workers 1

Learn how these powerful command-line options can save you time, strengthen your test suite, and streamline your Playwright testing experience. Click on any timestamp above to jump directly to that section in the tutorial!

Watch Full Video 📹️

DEV Community

Powerful PDF+Image Parsing — Mistral OCR

Example Scenarios

5 Playwright CLI Flags That Will Transform Your Testing Workflow

Top comments (0)

5 Playwright CLI Flags That Will Transform Your Testing Workflow

Okay