AI to Categorize Bank & Credit Card Transactions

3 min. read


If you have ever manually categorized a lot of bank or credit card transactions, you know how tedious this process can be.  Generally speaking, you need to get a digital copy of the statement, then maybe try to get it into a program like Excel or Google Sheets so that you can look at each transaction and categorize it.  Maybe each transaction needs to be tagged to a department, or perhaps it needs to be flagged as personal or business related. Whatever the actual categorization reason, the process is time consuming, and prone to error. 

Automatically Extracting & Categorizing Transactions

Recently, a client reached out asking if there was a way for us to enable new functionality in their internal management system to extract expense items from credit card statements, and automatically tag them to the correct categories so that the transactions can be further processed based on the categorizations applied. After a quick discovery round, we outlined a solution that would allow their existing system to communicate with a dedicated microservice that performs the processing of documents and prediction on each transaction.

As shown in the above diagram, the workflow outlines a process that begins by the document being received by the microservice, and ends with the categorized transactions being stored in a database.  The architecture diagram below illustrates how the existing management system was maintained, and a couple new microservices that run outside of the main portal handles the processing and tagging of transactions.  From there, the main platform eventually reads that data and includes it in the organization’s regular reporting workflows through the main portal.

Technologies Used for Solution

To achieve the above solution, we created two microservices to handle document processing tasks and to perform the actual categorization of each transaction.  

  1. Optical Character Recognition (OCR):  we used the open source Tesseract library developed by Google to perform OCR related tasks.  Since data was in English and in a relatively straightforward format, we did not need to perform any special finetuning to achieve acceptable results from the base library.
  2. Natural Language Processing (NLP):  we developed a sentence categorizer model that uses a state of the art (SOTA) model, and fine tuned it to understand the sentences, which in this case are credit card transactions.  Once the model understands what each sentence is, it can then predict the relevant expense categories, based on the previous data samples that the model was trained on.  

Next Steps

If your organization needs to extract and process information from unstructured documents like bank or credit card statements, feel free to contact Wired Solutions to discuss how we can help.