In this section we continue from here and show how to do the anonymization. Make sure you have set the right paths and credentials to your database and the S3 bucket and you’ve downloaded a model.
Once the model is downloaded, you can load the model with function Blackbar, and anonymize text by replacing the entities with an _, X or the entity label
The result returns the raw text, the anonymized text and the exact locations of the detected entities based on the deep learning model only
from blackbar import Blackbar, blackbar_example, blackbar_s3_downloaddeid = Blackbar(info)text = blackbar_example(type="document")print(text)
dagnota : 10/4/2021
Opname D12
RvO/ SAB op onderliggend aneurysma a. communicans anterior - 30/03 coiling gehad.
A/
Geen bijzonderheden.
O/
114/60 - 36.1C - 60/min - 98% zonder O2
Tele zonder bijzonderheden
Labo/
Geen bijzonderheden
TCD 09/04: geen vaatspasmen
P/
- Monitoring tot en met maandag (14 dagen), dan naar A480
| Uitvoerder: Dr. Jan Janssens |
| Verantwoordelijke: Dr. Jan Janssens |
dagnota : XXXXXXXXX
Opname D12
RvO/ SAB op onderliggend aneurysma a. communicans anterior - XXXXX coiling gehad.
A/
Geen bijzonderheden.
O/
114/60 - 36.1C - 60/min - 98% zonder O2
Tele zonder bijzonderheden
Labo/
Geen bijzonderheden
TCD XXXXX: geen vaatspasmen
P/
- Monitoring tot en met maandag (14 dagen), dan naar A480
| Uitvoerder: XXXXXXXXXXXXXXXX |
| Verantwoordelijke: XXXXXXXXXXXXXXXX |
anno["entities"]
doc_id
label_
term_search
term_detected
similarity
start
end
length
0
None
03_Datum
None
10/4/2021
None
10
18
9
1
None
03_Datum
None
30/03
None
93
97
5
2
None
03_Datum
None
09/04
None
242
246
5
3
None
01_Naam
None
Dr. Jan Janssens
None
343
358
16
4
None
01_Naam
None
Dr. Jan Janssens
None
384
399
16
You can as well anonymize a pandas dataframe with one document. The pandas dataframe should have at least the columns doc_id and text.
If you have several documents, use deid_anonymize to perform the anonymization of the whole set of documents. All the results will be put in an new column called textCvt containing a json with all the entities and the anonymized text.
# If you have a data frame with more than one record, use deid_anonymize to anonymize thesedocs = pd.DataFrame({"doc_id": [1, 2], "text": [text, text]})anno = deid_anonymize(deid, docs, type="_", extended =False)anno
doc_id
text
textCvt
0
1
dagnota : 10/4/2021\n\nOpname D12\nRvO/ SAB op...
{"text_raw": "dagnota : 10/4/2021\n\nOpname D1...
1
2
dagnota : 10/4/2021\n\nOpname D12\nRvO/ SAB op...
{"text_raw": "dagnota : 10/4/2021\n\nOpname D1...
6. Anonymize your text in the database
If you have data in the format as explained in the the database tutorial, you can also extend the anonymization by doing a detection of known names/addresses for these documents based on Smith-Waterman. This will apply the deep learning model as well as the Smith-Waterman alignment. This hybrid approach improves the detection of the entities.
Example with query on the database
Connect to the database
Get documents & physicians which the patient got into contact with read_documents type deid
Store the results in the database in the textCvt column with textCvtStatus set to 1
db.update_anonimisation(anno, status =1, type="anonymization")
Example with test data showing the logic of integrating the available information
The resulting dataset contains - the text - the known names/addresses of the patient - the known names/addresses of the physicians the patient ever got into contact with
from blackbar import BlackbarDB, Blackbar, blackbar_s3_download, blackbar_example, deid_anonymize, deid_enrich_identifiersfrom rlike import*## Get example documents & physicians which the patient got into contact withdocs = blackbar_example('documents')docs = deid_enrich_identifiers(docs, type="patients")physician = blackbar_example('patients_physicians')physician = deid_enrich_identifiers(physician, type="patients_physicians")docs = docs.merge(physician, how ="left", left_on ="patientId", right_on ="pat_dos_nr")## Get the model + Anonymize the documents as follows info = blackbar_s3_download(name ="deid_v2", bucket ="blackbar-models")deid = Blackbar(info)anno = deid_anonymize(deid, docs, type="_", extended =True)anno