In spring 2019, the German Federal Ministry of Education and Research (BMBF) gave the green light for the PoP (Protect our Privacy) research project by approving the application for research funding under the “SME Innovation Offensive ICT” funding measure. Today, we can already present exciting research results.
The PoP research project deals with the protection of personal data. In concrete terms, this means that together with the Fraunhofer Institute for Intelligent Analysis and Information Systems IAIS, CIB is developing an AI-based software tool for the automatic anonymisation and pseudonymisation of personal data in digitised documents.
Approach and motivation
In order to be able to develop a software solution, PoP is researching methods and tools for the automatic identification, naming and masking of content in digitised documents that is worthy of protection. The Fraunhofer Institute IAIS is concerned with the identification of sensitive data, for example, where it is located in the document. At CIB itself, the implementation of anonymisation and pseudonymisation takes place. The focus here is in particular on obtaining training and test data that complies with data protection requirements in order to be able to drive forward further AI projects.
Anonymisation and pseudonymisation
Anonymisation is certainly already familiar to some users, because it allows text passages to be blacked out. With pseudonymisation, however, we go one step further. Here, sensitive data in documents is not removed or anonymised (i.e. blackened), but replaced by fictitious information. The aim is to eliminate the reference to a person. However, it is important that the actual meaning of the content remains intact. The fictitious information and documents should appear realistic so that they are suitable for AI training. When pseudonymising names, for example, care is taken to retain the gender and word length of the original name.
Current developments and goals
On the CIB side, there are already some interesting results. With our web-based multifunctional viewer CIB doXiview, users already have the option to black out text passages and also to make other areas permanently unrecognisable. In addition, further steps are planned for the future. For example, the AI system developed in the project will be used to recognise entities and the user guidance will be expanded. In addition, new features will be added, such as the removal of text with realistic reconstruction of the background.
So it remains exciting and we are looking forward to even more data protection with AI.