Clever redact and blacken with AI

CIB doXiview is an AI-supported application that automates some typical PDF processing steps: it blacks out sensitive data, creates ZUGFeRD invoices from simple PDFs and makes scanned documents machine-readable.

Ms. Andrea Trinkwalder, editor at c't - Magazin für Computertechnik, has tested our has tested our AI-assisted redaction in in CIB doXiview :

The Munich-based CIB Group develops digitization solutions for large customers and advertises high data protection standards and DSGVO compliance. Private users can use some of its standard PDF applications for the browser free of charge, including the simple editor CIB pdf standalone and its AI-supported companion CIB doXiview , which automatically processes PDF content. With the former, documents remain local to the user's device.

The latter uploads them to the cloud because the layout and character recognition (OCR) does not run locally; it mainly supports the blackening of sensitive data and helps to create and process forms and invoices faster. All content is deleted from the server after processing. From 2024, the manufacturer wants to introduce subscription models for private and small businesses. Until then, both private individuals and businesses can try out the software free of charge.

CIB Pop

Anonymization is very user-friendly: A click on "Anonymise data" starts the search for sensitive information, whereupon CIB doXiview marks suitable places in the document and lists the hits in the toolbar on the right. There you can deactivate them individually or collectively.

Helpful: The results are sorted by category. So if you don't want to black out names and addresses, but you do want to black out account data, you can make this decision for the entire document with just one click. The AI is also trained for ID cards and passports, but not (yet) for national insurance number.

In our tests, the function recognised account and address data quite reliably, but every now and then a name or a company name fell through the cracks. In any case, you should check the redaction suggestion carefully. Text passages that the AI has overlooked can be marked manually for redaction. In addition, it would be desirable to have a function that allows you to define your own search patterns - for national insurance numbers or file numbers, for example. The OCR acts very cooperatively. It marks uncertain candidates in yellow and presents them for review in a separate dialogue box: There you can move quickly from word to word, the original texts are shown enlarged during this correction loop.

The "Create Invoice" function analyses the content of simple PDF invoices; it also marks and extracts the invoice data and creates a ZUGFeRD invoice with structured XML from it.

The recognition rate varies greatly; in the test, the software found more items in scanned documents than in native PDFs. However, missing items can be quickly transferred to the corresponding fields using a marker pen. The technology also simplifies payment by extracting transfer information from image or PDF documents and converting it into a GiroCode or SEPA XML.

CIB doXiview automates repetitive work on PDF documents and thus relieves the user of tedious routines. However, especially after redacting sensitive data, a human must check the result at the end. The software supports this with well-prepared information and practical correction aids. The material is processed on the provider's servers, at least within Europe according to the privacy policy, and then deleted.

This article was published in c't 18/2023.

Andrea Trinkwalder

Writer / c't - magazine for computer technology