blackbar
Anonymization and Pseudonymization
Project blackbar has been set up for hospitals which need to anonymize and/or pseudonymize their text records in order to comply with the GDPR regulation or to be able to share the information in the texts with researchers, external collaborators or third parties without disclosing the personally sensitive information and still being able to share the clinically relevant information.
In a typical hospital setting, a lot of information is available as plain text records. Common health records are:
- notes about the entrance of the patient in the hospital
- internal communication about a patient between nurses
- reports on the health condition of patients
- communication between the hospital and external doctors who follow up
- pharmacy related notes
- consultation messages and notes
- letters when patient leave the hospital
- …
Furthermore, in such a regulated environment, a lot of information is already digitalized. In order to remove the Personally Identifiable Information (PII) from the health records we can make use of the existing information which is available in the databases of the hospital to optimally remove the personally identifiable information from the health records.
Project blackbar uses a hybrid approach combining deep learning techniques with more advanced lookup based techniques to locate these Personally Identifiable Information (PII) and allows to run these models offline on your own infrastructure
The setup allows to
- Manually tag plain text with personally identifiable information
- allowing to have validated anonymized documents
- Automatically identify personally identifiable information
- using BiLSTM/CNN/Transformer-based(BERT) named entity recogntion deep learning models
- by looking up variants of names / addresses of the patient linked to the health records using the local alignment technique Smith-Waterman
- Perform the pseudonymization
- replacing the detected information with fake names / addresses / … by patient
- replacing the dates and timestamps with time-shifted dates
- The result of the anonymization/pseudonymization
- stores information about the exact locations in the text of the personally identifiable information
- create a new text which looks exactly the same as the original text where the personally identifiable information information is replaced by pseudo names and the mapping between the 2 is kept for traceability
- The PII tags can be
- names of patients and health personnel, addresses of patients and health personnel, communes/locations, zip codes
- general dates, birth dates, ages
- ID’s, social security numbers, email addresses, professions and organisations
- Explore the local use of AI tooling on the pseudonymized texts
The project is developed as part of Spectre HD and implements the anonymization and pseudonymization requirements to enable the secondary use of textual health records as defined in the European EHDS regulation