Introduction
In this blog post, you will be guided with the steps on how to accomplish the scanned invoice parsing using the state of the art "Gemini Pro Vision" Large Language Model. You will be stunned at the way how the LLMs are capable of parsing and extracting the structured information.
Hands on
You will be now demonstrated the most excited part of getting the hands dirty in performing the invoice OCR. Please follow the below steps.
- Login to the Google Makersuite - https://makersuite.google.com/
- Create a new Free Form Prompt.
- Please choose the model as 'Gemini Pro Vision'
- On the prompt editor, mention the following set of prompt instructions for effectively parsing the invoice images.
Prompt 1: Identify metadata like invoice number, date, currency
Prompt 2: Extract supplier details like name, address, contact info
Prompt 3: Identify customer name and billing address
Prompt 4: Classify invoice type as products, services, rentals
Prompt 5: Parse out line items table from document
Prompt 6: Split line items into individual entries
Prompt 7: Extract item description from each line entry
Prompt 8: Identify units or quantity billed per line item
Prompt 9: Define rate/price per unit per line entry
Prompt 10: Calculate subtotal for each line item based on rate*quantity
Prompt 11: Sum all line item subtotals for grand total amount
Prompt 12: Extract total taxes for summed tax amounts
Prompt 13: Classify extracted information into schema
Convert the response to JSON format
- Include the below statement for outputting the response in JSON format.
Convert the response to JSON format
- Paste the invoice image which you wish to process on the Free Form Prompt editor just below the "Convert the response to JSON format" statement.
- Run the prompt to see the invoice structured JSON information.
{
"invoice_number": "52148",
"invoice_date": "2020-01-02",
"currency": "USD",
"supplier_name": "Brand Name",
"supplier_address": "24 Dummy Street Area, Location, Lorem Ipsum, 570x55x",
"customer_name": "Dwayne Clark",
"customer_address": "24 Dummy Street Area, Location, Lorem Ipsum, 570x55x",
"invoice_type": "products",
"line_items": [
{
"item_description": "Lorem Ipsum Dolor",
"quantity": 1,
"rate": 50.00,
"subtotal": 50.00
},
{
"item_description": "Pellentesque id neque ligula",
"quantity": 3,
"rate": 20.00,
"subtotal": 60.00
},
{
"item_description": "Interdum et malesuada fames",
"quantity": 1,
"rate": 10.00,
"subtotal": 20.00
},
{
"item_description": "Vivamus volutpat facibus",
"quantity": 1,
"rate": 90.00,
"subtotal": 90.00
}
],
"subtotal": 220.00,
"taxes": 0.00,
"total": 220.00
}
Conclusion
Hope you have learned the art of scanned invoice processing using the large language model via the crafted prompt instructions. Please be careful in using the LLMs. This blog post is for educational purposes, do not process with sensitive documents consisting of personal information. Please understand the Google Terms and Conditions in using the Maker Suit.
Top comments (0)