Paul Riviera

Posted on Feb 7, 2023 • Edited on Feb 23, 2023 • Originally published at paulriviera.com

Converting PayStub to SQL with Azure Form Recognizer

#announcement #devto #web3 #blockchain

Why

Like many people I set a series of goals in the beginning of the year. For 2023, one goal was to more accuratly measure aspects of my life, in this case my finances. I have tried using numerous financial tracking apps and I have never been pleased, if anyone has a recommendation please let me know. I decided to take a different approach and try to build my own solution; to start, I am putting the input (income from job) into a sql database so I can being to analyze it. The code outlined here is the first step in that process, you can find the complete codebase on my GitHub.

Begin by provisioning Azure resources

The scripts below prevision the following Azure resources:

NOTE: I choose to use Azure Key Vault to store my Azure Form Recognizer key and endpoint, but you can use any key management method you choose.

Create Azure Form Recognizer with Azure Key Vault

I began with provisioning the Azure Form Recognizer, the code for which is below.

Param(
    [Parameter(Mandatory = $true)]
    [String]
    $ResourceGroupName,
    [Parameter(Mandatory = $true)]
    [String]
    $FormRecognizerName,
    [Parameter(Mandatory = $true)]
    [String]
    $KeyVaultName,
    [Parameter(Mandatory = $true)]
    [String]
    $Location
)

# ------------------------------------------------------------------------------
# Variables
# ------------------------------------------------------------------------------
$RESOURCE_GROUP_NAME = $ResourceGroupName
$FORM_RECOGNIZER_ACCOUNT = $FormRecognizerName
$KEY_VAULT_NAME = $KeyVaultName

# ------------------------------------------------------------------------------
# Provision Resource Group
# ------------------------------------------------------------------------------
az group create `
    --name $RESOURCE_GROUP_NAME `
    --location $LOCATION

# ------------------------------------------------------------------------------
# Provision Azure Key Vault
# ------------------------------------------------------------------------------

az keyvault create `
    --name $KEY_VAULT_NAME `
    --resource-group $RESOURCE_GROUP_NAME `
    --location $LOCATION

# ------------------------------------------------------------------------------
# Provision Azure Form Recognizer
# ------------------------------------------------------------------------------

$FORM_RECOGNIZER_ACCOUNT_ENDPOINT = az cognitiveservices account create `
    --kind "FormRecognizer" `
    --name $FORM_RECOGNIZER_ACCOUNT `
    --resource-group $RESOURCE_GROUP_NAME `
    --location $LOCATION `
    --sku "S0" `
    --assign-identity `
    --yes `
    --query "properties.endpoint" `
    --output tsv

$FORM_RECOGNIZER_ACCOUNT_KEY = az cognitiveservices account keys list `
    --name $FORM_RECOGNIZER_ACCOUNT `
    --resource-group $RESOURCE_GROUP_NAME `
    --query "key1" `
    --output tsv

# ------------------------------------------------------------------------------
# Store Azure Form Recognizer Keys in Vault
# ------------------------------------------------------------------------------

az keyvault secret set `
    --vault-name $KEY_VAULT_NAME `
    --name "FormRecognizerEndpoint" `
    --value $FORM_RECOGNIZER_ACCOUNT_ENDPOINT

az keyvault secret set `
    --vault-name $KEY_VAULT_NAME `
    --name "FormRecognizerKey" `
    --value $FORM_RECOGNIZER_ACCOUNT_KEY

Create Azure SQL Server and Database

Next step is to provision the SQL Database, the script for which is below. If you choose to copy this script and run it in a pipeline, GitHub Actions as an example, you will need to update the external admin on the SQL Server. Similarly the Client IP Range should not be added when run in a pipeline, but rather when you are running the script locally. Feel free to modify the script to suit your needs.

Param(
    [Parameter(Mandatory = $true)]
    [String]
    $ResourceGroupName,
    [Parameter(Mandatory = $true)]
    [String]
    $SqlServerName,
    [Parameter(Mandatory = $true)]
    [String]
    $SqlDatabaseName,
    [Parameter(Mandatory = $true)]
    [String]
    $ClientIPStart,
    [Parameter(Mandatory = $true)]
    [String]
    $ClientIPEnd,
    [Parameter(Mandatory = $true)]
    [String]
    $Location
)

# ------------------------------------------------------------------------------
# Variables
# ------------------------------------------------------------------------------

$RESOURCE_GROUP_NAME = $ResourceGroupName
$LOCATION = $Location

$SQL_SERVER_NAME = $SqlServerName
$SQL_DATABASE_NAME = $SqlDatabaseName

$START_IP = $ClientIPStart
$END_IP = $ClientIPEnd

# ------------------------------------------------------------------------------
# Provision Resource Group
# ------------------------------------------------------------------------------
az group create `
    --name $RESOURCE_GROUP_NAME `
    --location $LOCATION

# ------------------------------------------------------------------------------
# Provision Server (for current signed-in user)
# ------------------------------------------------------------------------------
$SQL_ADMIN_NAME = az ad signed-in-user show `
    --query displayName `
    --output tsv

$SQL_ADMIN_USER_OBJECT_ID = az ad signed-in-user show `
    --query id `
    --output tsv

az sql server create `
    --name $SQL_SERVER_NAME `
    --resource-group $RESOURCE_GROUP_NAME `
    --location $LOCATION `
    --enable-ad-only-auth `
    --external-admin-principal-type User `
    --external-admin-name $SQL_ADMIN_NAME `
    --external-admin-sid $SQL_ADMIN_USER_OBJECT_ID

# ------------------------------------------------------------------------------
# Configure a server-based firewall rule
# ------------------------------------------------------------------------------
az sql server firewall-rule create `
    --resource-group $RESOURCE_GROUP_NAME `
    --server $SQL_SERVER_NAME `
    --name AllowMyIp `
    --start-ip-address $START_IP `
    --end-ip-address $END_IP

# ------------------------------------------------------------------------------
# Create a database
# ------------------------------------------------------------------------------
az sql db create `
    --resource-group $RESOURCE_GROUP_NAME `
    --server $SQL_SERVER_NAME `
    --name $SQL_DATABASE_NAME `
    --edition GeneralPurpose `
    --compute-model Serverless `
    --family Gen5 `
    --capacity 2

Now that resources are provisioned we can begin to build the application.

Building the application.

I chose to build a console app because it will be easy to integrate into my home automation, but in the future I would like to move it to an Azure Function.

lets review the goal again: to pass in a PDF (because thats what I have) and have the application extract the data and insert it into the SQL database.

Setup the project

I created a new console app using the .NET CLI, but you can use Visual Studio or Visual Studio Code to create a new console app.

dotnet new console --name "<Name of your project>"

NOTE: I suggest also adding a .gitignore for dotnet at the repository root as this command will not create one for you.

Add NuGet packages

dotnet add package Azure.Identity
dotnet add package Azure.AI.FormRecognizer
dotnet add package Azure.Security.KeyVault.Secrets
dotnet add package Microsoft.Data.SqlClient
dotnet add package AzFormRecognizer.Table.ToSQL

Before we begin to write code lets talk through the packages.

Azure.Identity: Used to authenticate to Azure services, in the code below I use the DefaultAzureCredential which require you to be logged into Azure through several methods like the Azure CLI.
Azure.AI.FormRecognizer: Used to interact with the Form Recognizer service.
Azure.Security.KeyVault.Secrets: Used to interact with the Key Vault service.
Microsoft.Data.SqlClient: Used to interact with the Azure SQL Server instance.
AzFormRecognizer.Table.ToSQL: This is a custom package I created to help with the conversion of the Form Recognizer output to SQL. The logic is currently minimal but I plan to extend the functionality with time. Feel free to explore the package and contribute on GitHub.

The Code

I break out the code into 5 core sections to make it easier to follow. To keep things simple, I will simply be placing these sections one after another in the Program.cs file. Feel free to break these sections into functions or different files as you see fit, afterall in its current form unit tests would be painful.

You can remove the Console.WriteLine("Hello, World!"); line as it is not needed.