Many organisations must redact sensitive data from documents, which can be manual and time-consuming. Combining the Power Platform’s AI Builder and Encodian Flowr‘s Redact PDF action, you can automate this process in a few easy steps, generating significant business value. So, if you need to redact PDF Documents with Power Automate, you’re in the right place!
So, let’s begin! This post provides a detailed example of detecting payment details in a PDF invoice before redacting those sensitive details using Encodian’s Redact PDF action.
AI Builder has many prebuilt AI models available within the Power Platform, including the ‘Invoice Processing’ model. Using a prebuilt model means you don’t need to provide and tag example documents because the model is already trained; however, you are limited to the tags supplied by the model. Although we are processing invoices, the prebuilt ‘Invoice Processing’ model does not detect payment information. For this example, we’ll use a custom ‘Document Processing’ model that allows the generation of our own tags.
When you create a custom model, you have to select the type of documents your model will process. For this example, we’ll choose the ‘Invoices’ type:
Next, you need to provide the information you want to tag. Because I have selected the invoice document type, invoice-related fields are already added to the model by default. I added these fields to detect the payment information:
See the table below for custom information:
CardholderName
SecurityCode
CardNumber
ExpirationDate
Text field
Number field
Number field
Date field
This is the full list of the different field types you can detect:
After defining the tags related to the information to be extracted, we need to provide example documents. With document processing, you cannot have more than one type of document trained to only one model. Different document types are called collections. For this blog, I have two different collections which are invoices in different formats.
The more example documents you provide at this stage, the better your model will be once trained. The minimum number of examples needed for each document type is 5.
After uploading your documents, you have to tag the pieces of custom information:
Once all the example documents have been tagged, the last step is to click ‘Train’!
It will take a few minutes for the model to train. Once it has finished, you will be provided with an accuracy score, and you can test the model before using it.
Before using the model, it must be published. The ‘Publish’ button will appear below the Accuracy score section (my model is already published, so I can see the ‘Use model’ button instead).
Now we have a custom AI Builder model, we can use it in Power Automate!
For this example, we ‘ll create a simple 4-step flow showcasing how to use the custom AI builder model to identify sensitive data which is then redacted by the Encodian Redact PDF action. As we are dealing with documents, we’ve opted for the ‘When a file is created’ in OneDrive trigger:
The second action uses the custom AI Builder model to process the document added to OneDrive, extracting any contained payment information. The file type needs to be PDF, JPEG or PNG. If your file needs converting to PDF first, you could add the Encodian Convert to PDF action before the AI Builder action.
If your model isn’t showing in the dropdown list, double-check that you have published it. Only published models can be used in apps and flows.
The output of the AI model provides us with the value and the text for each piece of information it has been trained to detect. As we can see below, for the ExpirationDate, the value is in Date Time format because it is a Date field, however the text value is as it appears in the document. We will be using these text values as the inputs to the redaction step because the redaction action will search for the exact text values provided to find them in the document for redaction. If we use the value (in this case in Date Time format), it doesn’t exist in the document so therefore wouldn’t be redacted.
To use the Redact PDF action, you can enter the details item by item or switch to array inputs. Each item that needs redacting will need to be added to the action using the dynamic content text values from the previous step.
Find out more information about Flowr’s Redact PDF action here:
Once the redaction is complete, a new redacted file is saved back to OneDrive. Ensure the redacted file location differs from the location used in the trigger, or you will enter a continuous loop!
Document Type 1:
Document Type 2:
As sensitive data is involved, if redacted documents are being passed on internally or externally, it is best practice to check the redaction before to ensure all the sensitive data has been redacted as expected.
Search 150+ Actions to see how Flowr can save you time
Sign up for your free 30-day trial; no cards, catches, or contracts.
No job is too big or too small for our Professional Services team!
UPDATE: We’re excited to announce some significant updates to Flowr for Power Automate! As of October 2024, we’ve improved by updating action names and splitting Flowr’s central Power Automate connector into nine specialized connectors. These changes will make your workflow faster, smoother, and more efficient. The new action names are more precise and intuitive, saving you time, while the focused connectors enhance performance and flexibility. This update also helps future-proof the platform for even more powerful features. Check out our updated action names blog.
Technical Evangelist