Best strategies to OCR scanned documents for SharePoint Online

April 12th 2023

We often get asked about the best strategies to OCR scanned documents to SharePoint Online. So to be as transparent as possible, we want to share our answers with you here.

Firstly, what is OCR? It stands for Optical Character Recognition; it’s a process used to convert images (such as scanned documents) into machine-readable documents by adding a text layer. This is hugely valuable as SharePoint indexes this text layer making the inner contents of the document searchable. This enables enterprise-wide discovery and the extraction of data when you need it.

When deciding on the best OCR strategy for scanned documents, it boils down to two key KPIs that you’re aiming for:

  1. Quality of the OCR results
  2. Cost efficiency of the process

Quality of the OCR results

If you don’t research, there might be options that don’t hit the mark. However, with top providers such as Encodian, comparing quality is like ‘splitting hairs’ (if you’re familiar with the saying).

Look for your expected performance capabilities, such as:

  • image cleanup
  • deskew
  • despeckle
  • adjust brightness
  • adjust contrast
  • remove border
  • rotate
  • rotate confidence level
  • and similar.

Top providers will likely be using the same or similar technology engines behind the scenes that have been integrated into their solutions.

Our advice would be to go out there and validate your results. That’s why Encodian provides free trials for Indxr and Flowr. We also offer free plans across most of our products as well!

Unfortunately, there simply is a limit to the success you can achieve with poor-quality images or handwriting. Some solutions, such as Microsoft Syntex, may yield improved results. Just note you’ll be journeying into a realm of higher costs which will need justification. So if this is a must for your OCR projects, reach out to us here.

Cost efficiency of the process

OCR is a comparatively expensive process to execute in the cloud versus other document manipulation processes such as conversion, merging, resizing and others. That’s why despite Encodian having Flowr, the market-leading document process automation Power Automate connector with the capability to Automatically OCR Documents added to a SharePoint Library, we still decided to build Indxr as well as build in native OCR capabilities in our Filer product.

Why does Encodian offer three OCR products?

Indxr is our flagship OCR solution for high volumes of documents. We built it purely to offer you a cost-efficient route to achieve this. We did this by designing it to run locally on a desktop, laptop or VM, which removes the need to send documents to the cloud and cuts out that additional expense. This way, we provide you with a fixed price for unlimited users and usage to OCR as many documents in SharePoint Online as you wish.

Often the strategy for companies during a large document scanning or file migration project is to procure Indxr for a chosen period of time (a few months or a year) to overcome their initial high volume requirement. From there, you can judge if you can transition to a solution like Flowr to OCR as you go. Flowr automates the OCRing of lower volumes of documents as you upload to your SharePoint environment or when you trigger it for specific documents.

We have helped you to reduce costs with Flowr too. Our ‘Get PDF Text Layer’ or various ‘Split’ actions can be used to first, check if a text layer already exists not to repeat the work, and then split documents if you only need to OCR specific pages to save Actions.

Filer customers get the perk that all documents are automatically OCR’d upon ingestion.

We hope this blog gave you a helpful steer.

We talk about OCR all the time, so if you’d like to discuss your requirements with us further, please do reach out!

Author
Dan Kong

Sales Director

You might also be interested in...