Get started - pseudonymization

In this section we continue from here and show how to do the pseudonymization.

7. Pseudonymization

In order to show how the pseudonymization of texts works, you can extract the entities using the anonymization functions, and find suitable replacements for a certain locale (e.g. nl_BE, fr_BE). The pseudo_replacements function generates these replacements for each patient, these replacements can then inserted at the right positions in the texts.

from blackbar import Blackbar, blackbar_s3_download
info = blackbar_s3_download(name = "deid_v2", bucket = "blackbar-models")
deid = Blackbar(info)

Example of the pseudonymization logic

from blackbar import PseudoGenerator, pseudo_replacements, blackbar_example
import pandas as pd
text = blackbar_example(type = "document")
anno = deid.anonymize(text, type = "X", as_data_frame = True)

# Create a pseudo generator and create text which will be used as replacements
pseudo   = PseudoGenerator(locale = "nl_BE")
entities = pd.DataFrame({
    "doc_id"     : "doc_XYZ", 
    "patient_id" : "ABCD", 
    "entity_type": anno["entities"]["label_"], 
    "entity_text": anno["entities"]["term_detected"],
    "start"      : anno["entities"]["start"], 
    "end"        : anno["entities"]["end"], 
    "length"     : anno["entities"]["length"], 
    "model_name" : "example",
    "model"      : "example"})
replacements = pseudo_replacements(entities, pseudonimizer = pseudo, dateshift = {"years": -3, "months": -3, "days": -167, "weeks": 4, "days": 23})
replacements["pseudo"]
entity_type entity entity_replacement count blackbar_comment
0 03_Datum 09/04 02/02/2022 1 {'patient_gender': 'unknown'}
1 03_Datum 10/4/2021 03/02/2018 1 {'patient_gender': 'unknown'}
2 03_Datum 30/03 23/01/2022 1 {'patient_gender': 'unknown'}
3 01_Naam Dr. Jan Janssens dr Jimmy Bossuyt 2 {'patient_gender': 'unknown'}

Example for for multiple documents

  • Example for several documents where the anonymization is stored in the database
  • Perform the anonymization, get the detected entities
from blackbar import BlackbarDB, Blackbar, blackbar_s3_download, deid_anonymize, deid_anonymization_entities
from blackbar import PseudoGenerator, deid_pseudonymize
from rlike import *
db   = BlackbarDB('test')
docs = db.read_documents(ids = [1, 2, 3], type = "deid")
info = blackbar_s3_download(name = "deid_v2", bucket = "blackbar-models")
deid = Blackbar(info)
anno = deid_anonymize(deid, docs, type = "_", extended = True)
db.update_anonymization(anno, status = 1, type = "anonymization")
  • You can get a data frame with all detected entities
ents = deid_anonymization_entities(anno)
ents
doc_id patient_id model_name model entity_type entity_text start end length comments
0 1 Y871129di17X blackbar-models/deid_v2 nlp 03_Datum 10/4/2021 10 18 9 {'patient_gender': 'None'}
1 1 Y871129di17X blackbar-models/deid_v2 nlp 03_Datum 30/03 93 97 5 {'patient_gender': 'None'}
2 1 Y871129di17X blackbar-models/deid_v2 nlp 03_Datum 09/04 242 246 5 {'patient_gender': 'None'}
3 1 Y871129di17X blackbar-models/deid_v2 both 01_Naam Dr. Jan Janssens 343 358 16 {'patient_gender': 'None'}
4 1 Y871129di17X blackbar-models/deid_v2 both 01_Naam Dr. Jan Janssens 384 399 16 {'patient_gender': 'None'}
5 2 Y871129di17X blackbar-models/deid_v2 nlp 03_Datum 10/4/2021 10 18 9 {'patient_gender': 'None'}
6 2 Y871129di17X blackbar-models/deid_v2 nlp 03_Datum 30/03 93 97 5 {'patient_gender': 'None'}
7 2 Y871129di17X blackbar-models/deid_v2 nlp 03_Datum 09/04 242 246 5 {'patient_gender': 'None'}
8 2 Y871129di17X blackbar-models/deid_v2 both 01_Naam Dr. Jan Janssens 343 358 16 {'patient_gender': 'None'}
9 2 Y871129di17X blackbar-models/deid_v2 both 01_Naam Dr. Jan Janssens 384 399 16 {'patient_gender': 'None'}
10 3 A948023ZRXYZ98M blackbar-models/deid_v2 nlp 01_Naam DR. GAME KEEPERS 7 22 16 {'patient_gender': 'None'}
11 3 A948023ZRXYZ98M blackbar-models/deid_v2 nlp 03_Datum 22 januari 1999 51 65 15 {'patient_gender': 'None'}
12 3 A948023ZRXYZ98M blackbar-models/deid_v2 both 01_Naam_Dokter JANSSENS Jean 86 99 14 {'patient_gender': 'None'}
13 3 A948023ZRXYZ98M blackbar-models/deid_v2 smith_waterman ID_Patient A948023ZRXYZ98M 126 140 15 {'patient_gender': 'None'}
14 3 A948023ZRXYZ98M blackbar-models/deid_v2 smith_waterman 02_Adres_Locatie BRUSSELSESTEENWEG 1 143 161 19 {'patient_gender': 'None'}
15 3 A948023ZRXYZ98M blackbar-models/deid_v2 smith_waterman 02_Adres_Locatie 1785 MERCHTEM 190 202 13 {'patient_gender': 'None'}
16 3 A948023ZRXYZ98M blackbar-models/deid_v2 both 01_Naam Louis Jean JANSSENS 247 265 19 {'patient_gender': 'None'}
17 3 A948023ZRXYZ98M blackbar-models/deid_v2 nlp 03_Datum 18/01/1998 301 310 10 {'patient_gender': 'None'}
18 3 A948023ZRXYZ98M blackbar-models/deid_v2 both 01_Naam_Dokter Jean JANSSENS 784 796 13 {'patient_gender': 'None'}
19 3 A948023ZRXYZ98M blackbar-models/deid_v2 both 01_Naam DR. RODEL BAAN 898 911 14 {'patient_gender': 'None'}
20 3 A948023ZRXYZ98M blackbar-models/deid_v2 both 01_Naam DR. DE GROTE TEEN 961 977 17 {'patient_gender': 'None'}
  • Or if you have stored the results in the database, you can retrieve the anonymization back for a set of documents or for all documents of certain patients
anno = db.read_anonymization(ids = anno["doc_id"])
ents = deid_anonymization_entities(anno)
anno = db.read_anonymization(patientids = anno["patient_id"])
ents = deid_anonymization_entities(anno)
ents
doc_id patient_id model_name model entity_type entity_text start end length comments
0 1 Y871129di17X blackbar-models/deid_v2 nlp 03_Datum 10/4/2021 10 18 9 {'patient_gender': 'None'}
1 1 Y871129di17X blackbar-models/deid_v2 nlp 03_Datum 30/03 93 97 5 {'patient_gender': 'None'}
2 1 Y871129di17X blackbar-models/deid_v2 nlp 03_Datum 09/04 242 246 5 {'patient_gender': 'None'}
3 1 Y871129di17X blackbar-models/deid_v2 both 01_Naam Dr. Jan Janssens 343 358 16 {'patient_gender': 'None'}
4 1 Y871129di17X blackbar-models/deid_v2 both 01_Naam Dr. Jan Janssens 384 399 16 {'patient_gender': 'None'}
5 2 Y871129di17X blackbar-models/deid_v2 nlp 03_Datum 10/4/2021 10 18 9 {'patient_gender': 'None'}
6 2 Y871129di17X blackbar-models/deid_v2 nlp 03_Datum 30/03 93 97 5 {'patient_gender': 'None'}
7 2 Y871129di17X blackbar-models/deid_v2 nlp 03_Datum 09/04 242 246 5 {'patient_gender': 'None'}
8 2 Y871129di17X blackbar-models/deid_v2 both 01_Naam Dr. Jan Janssens 343 358 16 {'patient_gender': 'None'}
9 2 Y871129di17X blackbar-models/deid_v2 both 01_Naam Dr. Jan Janssens 384 399 16 {'patient_gender': 'None'}
10 3 A948023ZRXYZ98M blackbar-models/deid_v2 nlp 01_Naam DR. GAME KEEPERS 7 22 16 {'patient_gender': 'None'}
11 3 A948023ZRXYZ98M blackbar-models/deid_v2 nlp 03_Datum 22 januari 1999 51 65 15 {'patient_gender': 'None'}
12 3 A948023ZRXYZ98M blackbar-models/deid_v2 both 01_Naam_Dokter JANSSENS Jean 86 99 14 {'patient_gender': 'None'}
13 3 A948023ZRXYZ98M blackbar-models/deid_v2 smith_waterman ID_Patient A948023ZRXYZ98M 126 140 15 {'patient_gender': 'None'}
14 3 A948023ZRXYZ98M blackbar-models/deid_v2 smith_waterman 02_Adres_Locatie BRUSSELSESTEENWEG 1 143 161 19 {'patient_gender': 'None'}
15 3 A948023ZRXYZ98M blackbar-models/deid_v2 smith_waterman 02_Adres_Locatie 1785 MERCHTEM 190 202 13 {'patient_gender': 'None'}
16 3 A948023ZRXYZ98M blackbar-models/deid_v2 both 01_Naam Louis Jean JANSSENS 247 265 19 {'patient_gender': 'None'}
17 3 A948023ZRXYZ98M blackbar-models/deid_v2 nlp 03_Datum 18/01/1998 301 310 10 {'patient_gender': 'None'}
18 3 A948023ZRXYZ98M blackbar-models/deid_v2 both 01_Naam_Dokter Jean JANSSENS 784 796 13 {'patient_gender': 'None'}
19 3 A948023ZRXYZ98M blackbar-models/deid_v2 both 01_Naam DR. RODEL BAAN 898 911 14 {'patient_gender': 'None'}
20 3 A948023ZRXYZ98M blackbar-models/deid_v2 both 01_Naam DR. DE GROTE TEEN 961 977 17 {'patient_gender': 'None'}
  • You can generate pseudo text which will replace keep track of the replacements and to inspect them
pseudo       = PseudoGenerator(locale = "nl_BE")
replacements = pseudo_replacements(ents, pseudonimizer = pseudo, dateshift = {"years": -3, "months": -3, "days": -167, "weeks": 4, "days": 23})
replacements["entities"]
doc_id patient_id model_name model entity_type entity_text start end length entity_text_replacement entity_replacement
0 1 Y871129di17X blackbar-models/deid_v2 nlp 03_Datum 10/4/2021 10 18 9 03/02/2018 03/02/2018
1 1 Y871129di17X blackbar-models/deid_v2 nlp 03_Datum 30/03 93 97 5 23/01/2022 23/01/2022
2 1 Y871129di17X blackbar-models/deid_v2 nlp 03_Datum 09/04 242 246 5 02/02/2022 02/02/2022
3 1 Y871129di17X blackbar-models/deid_v2 both 01_Naam Dr. Jan Janssens 343 358 16 Prof Dr Hugo Reynaert Prof dr Hugo Reynaert
4 1 Y871129di17X blackbar-models/deid_v2 both 01_Naam Dr. Jan Janssens 384 399 16 Prof Dr Hugo Reynaert Prof dr Hugo Reynaert
5 2 Y871129di17X blackbar-models/deid_v2 nlp 03_Datum 10/4/2021 10 18 9 03/02/2018 03/02/2018
6 2 Y871129di17X blackbar-models/deid_v2 nlp 03_Datum 30/03 93 97 5 23/01/2022 23/01/2022
7 2 Y871129di17X blackbar-models/deid_v2 nlp 03_Datum 09/04 242 246 5 02/02/2022 02/02/2022
8 2 Y871129di17X blackbar-models/deid_v2 both 01_Naam Dr. Jan Janssens 343 358 16 Prof Dr Hugo Reynaert Prof dr Hugo Reynaert
9 2 Y871129di17X blackbar-models/deid_v2 both 01_Naam Dr. Jan Janssens 384 399 16 Prof Dr Hugo Reynaert Prof dr Hugo Reynaert
10 3 A948023ZRXYZ98M blackbar-models/deid_v2 nlp 01_Naam DR. GAME KEEPERS 7 22 16 DR HANNE HENDRICKX DR HANNE HENDRICKX
11 3 A948023ZRXYZ98M blackbar-models/deid_v2 nlp 03_Datum 22 januari 1999 51 65 15 17 november 1995 17 november 1995
12 3 A948023ZRXYZ98M blackbar-models/deid_v2 both 01_Naam_Dokter JANSSENS Jean 86 99 14 Jill Van Camp Jill Van Camp
13 3 A948023ZRXYZ98M blackbar-models/deid_v2 smith_waterman ID_Patient A948023ZRXYZ98M 126 140 15 A060206UL00F A060206UL00F
14 3 A948023ZRXYZ98M blackbar-models/deid_v2 smith_waterman 02_Adres_Locatie BRUSSELSESTEENWEG 1 143 161 19 PATRICIARING 1 PATRICIARING 1
15 3 A948023ZRXYZ98M blackbar-models/deid_v2 smith_waterman 02_Adres_Locatie 1785 MERCHTEM 190 202 13 4746 NANDRIN 4746 NANDRIN
16 3 A948023ZRXYZ98M blackbar-models/deid_v2 both 01_Naam Louis Jean JANSSENS 247 265 19 Frank Bouckaert Frank Bouckaert
17 3 A948023ZRXYZ98M blackbar-models/deid_v2 nlp 03_Datum 18/01/1998 301 310 10 13/11/1994 13/11/1994
18 3 A948023ZRXYZ98M blackbar-models/deid_v2 both 01_Naam_Dokter Jean JANSSENS 784 796 13 Wendy Pieters Wendy Pieters
19 3 A948023ZRXYZ98M blackbar-models/deid_v2 both 01_Naam DR. RODEL BAAN 898 911 14 DR HUBERT PEETERS DR HUBERT PEETERS
20 3 A948023ZRXYZ98M blackbar-models/deid_v2 both 01_Naam DR. DE GROTE TEEN 961 977 17 PROF. DR. KARIMA DEMUYNCK PROF. DR. KARIMA DEMUYNCK

Results

If you have several documents, you can apply deid_pseudonymize on the anonimized data and get the pseudonymized text records as well as the mapping between the original text and the replaced texts

pseudo  = PseudoGenerator(locale = "nl_BE")
results = deid_pseudonymize(anno, pseudo = pseudo, dateshift = {"years": -3, "months": -3, "days": -167, "weeks": 4, "days": 23})
results["documents"]
doc_id patient_id text text_anonymized text_pseudonymized blackbar_model blackbar_entities blackbar_options
0 1 Y871129di17X dagnota : 10/4/2021\n\nOpname D12\nRvO/ SAB op... dagnota : _________\n\nOpname D12\nRvO/ SAB op... dagnota : 03/02/2018\n\nOpname D12\nRvO/ SAB o... blackbar-models/deid_v2 {"doc_id": ["1", "1", "1", "1", "1"], "patient... {"locale": "nl_BE", "dateshift": {"years": -3,...
1 2 Y871129di17X dagnota : 10/4/2021\n\nOpname D12\nRvO/ SAB op... dagnota : _________\n\nOpname D12\nRvO/ SAB op... dagnota : 03/02/2018\n\nOpname D12\nRvO/ SAB o... blackbar-models/deid_v2 {"doc_id": ["2", "2", "2", "2", "2"], "patient... {"locale": "nl_BE", "dateshift": {"years": -3,...
2 3 A948023ZRXYZ98M \n\n\n\n\n\n\nDR. GAME KEEPERS\n\nElektronisch... \n\n\n\n\n\n\n________________\n\nElektronisch... \n\n\n\n\n\n\nDR. HANNE HENDRICKX\n\nElektroni... blackbar-models/deid_v2 {"doc_id": ["3", "3", "3", "3", "3", "3", "3",... {"locale": "nl_BE", "dateshift": {"years": -3,...

The mapping between the original entities and the replacements are available for auditing.

results["deid"]
patient_id entity_type entity_text entity_text_replacement blackbar_comment
0 Y871129di17X 03_Datum 10/4/2021 03/02/2018 {'patient_gender': 'None'}
1 Y871129di17X 03_Datum 30/03 23/01/2022 {'patient_gender': 'None'}
2 Y871129di17X 03_Datum 09/04 02/02/2022 {'patient_gender': 'None'}
3 Y871129di17X 01_Naam Dr. Jan Janssens Dr Hugo Reynaert {'patient_gender': 'None'}
4 A948023ZRXYZ98M 01_Naam DR. GAME KEEPERS DR. HANNE HENDRICKX {'patient_gender': 'None'}
5 A948023ZRXYZ98M 03_Datum 22 januari 1999 17 november 1995 {'patient_gender': 'None'}
6 A948023ZRXYZ98M 01_Naam_Dokter JANSSENS Jean Jill Van Camp {'patient_gender': 'None'}
7 A948023ZRXYZ98M ID_Patient A948023ZRXYZ98M A060206UL00F {'patient_gender': 'None'}
8 A948023ZRXYZ98M 02_Adres_Locatie BRUSSELSESTEENWEG 1 PATRICIARING 1 {'patient_gender': 'None'}
9 A948023ZRXYZ98M 02_Adres_Locatie 1785 MERCHTEM 4746 NANDRIN {'patient_gender': 'None'}
10 A948023ZRXYZ98M 01_Naam Louis Jean JANSSENS Frank Bouckaert {'patient_gender': 'None'}
11 A948023ZRXYZ98M 03_Datum 18/01/1998 13/11/1994 {'patient_gender': 'None'}
12 A948023ZRXYZ98M 01_Naam_Dokter Jean JANSSENS Wendy Pieters {'patient_gender': 'None'}
13 A948023ZRXYZ98M 01_Naam DR. RODEL BAAN DR HUBERT PEETERS {'patient_gender': 'None'}
14 A948023ZRXYZ98M 01_Naam DR. DE GROTE TEEN DR. KARIMA DEMUYNCK {'patient_gender': 'None'}

The results are anonymized and pseudonymized texts which can be stored in the database or shared with peers.

example = results["documents"].iloc[2]
print(example.text_pseudonymized)
print(example.text_anonymized)
print(example.text)
  • The pseudonymized text is something that you can share and resembles the original texts where the personally identifiable information has been removed from.
example = results["documents"].iloc[2]
print(example.text_pseudonymized)







DR. HANNE HENDRICKX

Elektronisch adres







17 november 1995



BETREFT:         Jill Van Camp

                        A060206UL00F

PATRICIARING 111

                        4746 NANDRIN



Geachte collega



Uw patiënt, de heer Frank Bouckaert, werd op de raadpleging gezien op 13/11/1994.



Reden van raadpleging:
 6 weken na IHP rechts.
 Gaat goed, stapt vlot uit de wachtkamer met kruk in de hand.
 Zegt vooral laatste weken sterke verbetering beterschap te voelen.

Bevindingen:
 ROM: flexie 100° full extension.
Exo en endo goed en pijnloos.
 Goede straight leg.
 Goede kracht abductoren.

Bijkomende onderzoeken:
 RX: goede stand van de prothese.

Houding:
 Kinesitherapie verder te zetten: gangrevalidatie en spierversterking

Met de patiënt, de heer Wendy Pieters, werd afgesproken hem terug te zien 6 maanden postoperatief met rx .



Met collegiale groeten





DR HUBERT PEETERS                                                 DR. KARIMA DEMUYNCK

Kliniekhoofd                                                     Assistente
  • The anonymized text clearly indicates where the replacements are done.
print(example.text_anonymized)







________________

Elektronisch adres







_______________



BETREFT:        ______________

                        _______________

___________________11

                        _____________



Geachte collega



Uw patiënt, de heer ___________________, werd op de raadpleging gezien op __________.



Reden van raadpleging:
 6 weken na IHP rechts.
 Gaat goed, stapt vlot uit de wachtkamer met kruk in de hand.
 Zegt vooral laatste weken sterke verbetering beterschap te voelen.

Bevindingen:
 ROM: flexie 100° full extension.
Exo en endo goed en pijnloos.
 Goede straight leg.
 Goede kracht abductoren.

Bijkomende onderzoeken:
 RX: goede stand van de prothese.

Houding:
 Kinesitherapie verder te zetten: gangrevalidatie en spierversterking

Met de patiënt, de heer _____________, werd afgesproken hem terug te zien 6 maanden postoperatief met rx .



Met collegiale groeten





______________                                                 _________________

Kliniekhoofd                                                     Assistente
  • And the original text
print(example.text)







DR. GAME KEEPERS

Elektronisch adres







22 januari 1999



BETREFT:         JANSSENS Jean

                        A948023ZRXYZ98M

BRUSSELSESTEENWEG 111

                        1785 MERCHTEM



Geachte collega



Uw patiënt, de heer Louis Jean JANSSENS, werd op de raadpleging gezien op 18/01/1998.



Reden van raadpleging:
 6 weken na IHP rechts.
 Gaat goed, stapt vlot uit de wachtkamer met kruk in de hand.
 Zegt vooral laatste weken sterke verbetering beterschap te voelen.

Bevindingen:
 ROM: flexie 100° full extension.
Exo en endo goed en pijnloos.
 Goede straight leg.
 Goede kracht abductoren.

Bijkomende onderzoeken:
 RX: goede stand van de prothese.

Houding:
 Kinesitherapie verder te zetten: gangrevalidatie en spierversterking

Met de patiënt, de heer Jean JANSSENS, werd afgesproken hem terug te zien 6 maanden postoperatief met rx .



Met collegiale groeten





DR. RODEL BAAN                                                 DR. DE GROTE TEEN

Kliniekhoofd                                                     Assistente

If this all works for you, you can start deploying the anonymization, pseudonymization, the API and the apps in the tutorial on containers.