blackbar

Anonymization and Pseudonymization

Author

www.bnosac.be

Published

June 6, 2025

Project blackbar has been set up for hospitals which need to anonymize and/or pseudonymize their text records in order to comply with the GDPR regulation or to be able to share the information in the texts with researchers, external collaborators or third parties without disclosing the personally sensitive information and still being able to share the clinically relevant information.

In a typical hospital setting, a lot of information is available as plain text records. Common health records are:

notes about the entrance of the patient in the hospital
internal communication about a patient between nurses
reports on the health condition of patients
communication between the hospital and external doctors who follow up
pharmacy related notes
consultation messages and notes
letters when patient leave the hospital
…

Furthermore, in such a regulated environment, a lot of information is already digitalized. In order to remove the Personally Identifiable Information (PII) from the health records we can make use of the existing information which is available in the databases of the hospital to optimally remove the personally identifiable information from the health records.

Project blackbar uses a hybrid approach combining deep learning techniques with more advanced lookup based techniques to locate these Personally Identifiable Information (PII) and allows to run these models offline on your own infrastructure

The setup allows to

Manually tag plain text with personally identifiable information
- allowing to have validated anonymized documents
Automatically identify personally identifiable information
- using BiLSTM/CNN/Transformer-based(BERT) named entity recogntion deep learning models
- by looking up variants of names / addresses of the patient linked to the health records using the local alignment technique Smith-Waterman
Perform the pseudonymization
- replacing the detected information with fake names / addresses / … by patient
- replacing the dates and timestamps with time-shifted dates
The result of the anonymization/pseudonymization
- stores information about the exact locations in the text of the personally identifiable information
- create a new text which looks exactly the same as the original text where the personally identifiable information information is replaced by pseudo names and the mapping between the 2 is kept for traceability
The PII tags can be
- names of patients and health personnel, addresses of patients and health personnel, communes/locations, zip codes
- general dates, birth dates, ages
- ID’s, social security numbers, email addresses, professions and organisations
Explore the local use of AI tooling on the pseudonymized texts

The project is developed as part of Spectre HD and implements the anonymization and pseudonymization requirements to enable the secondary use of textual health records as defined in the European EHDS regulation