Database

In order to be able to use the application in order to anonymize and pseudonymize texts of health records, you will need a database which contains these texts.

The database which you can use for this is a relational database. The project has been set up with the use of InterSystems IRIS in mind but it can be any database (SQLite, PostgreSQL, MySQL, MariaDB, Oracle, MS SQL Server).

Anonymization

Your database should have a specific structure with the following information in order to perform the anonymization

PATIENT_DETAILS: records with and id of the patient alongside his/her name/address
PHYSICIAN_DETAILS: records with and id of the physician/doctor alongside his/her name/address
PATIENT_PHYSICIAN: records indicating the patient had a relation with a physician (e.g. was treated by, is his personal doctor, fysiotherapist, oncologist, paediatrician, dentist, …)
Document: health care records with plain texts with a document id, the text and
- the patientId which is the same as PATIENT_DETAILS.pat_dos_nr, the name of the performing and responsible physician, when the records was created or updated, optionally the medServiceId indicating a typical group of services which would generate different types of texts.
- textCvt/textCvtStatus/textCvtModDate which store the anonymized text as well as the exact location of the detected personally identifiable information in the texts
- make sure you set an index on Document.textCvtStatus and on Document.patientId

CREATE TABLE "PATIENT_DETAILS" (
    "pat_dos_nr"    VARCHAR(50) NOT NULL,
    "pat_vnaam"     VARCHAR(100),
    "pat_fam_naam"  VARCHAR(100),
    "pat_adres"     VARCHAR(250),
    "pat_post_nr"   VARCHAR(10),
    "pat_gemeente"  VARCHAR(100),
    "pat_geslacht"  VARCHAR(10),
    PRIMARY KEY("pat_dos_nr")
);
CREATE TABLE "PHYSICIAN_DETAILS" (
    "hcact_ognr"     VARCHAR(50) NOT NULL,
    "hcact_vnaam"    VARCHAR(100),
    "hcact_fam_naam" VARCHAR(100),
    "hcact_adres"    VARCHAR(250),
    "hcact_post_nr"  VARCHAR(10),
    "hcact_gemeente" VARCHAR(100),
    PRIMARY KEY("hcact_ognr")
);
CREATE TABLE "PATIENT_PHYSICIAN" (
    "pat_dos_nr"    VARCHAR(50) NOT NULL,
    "hcact_ognr"    VARCHAR(50) NOT NULL,
    "relation"      VARCHAR(100),
    FOREIGN KEY("pat_dos_nr") REFERENCES "PATIENT_DETAILS"("pat_dos_nr"),
    FOREIGN KEY("hcact_ognr") REFERENCES "PHYSICIAN_DETAILS"("hcact_ognr")
);
CREATE TABLE "Document" (
    "ID"                     BIGINT NOT NULL UNIQUE,
    "text"                   LONGVARCHAR,
    "patientId"              VARCHAR(50),
    "performingPhysicianId"  VARCHAR(200),
    "responsiblePhysicianId" VARCHAR(200),
    "creDate"                TIMESTAMP,
    "modDate"                TIMESTAMP,
    "medServiceId"           VARCHAR(200),
    "textCvt"                LONGVARCHAR,
    "textCvtStatus"          INTEGER,
    "textCvtModDate"         TIMESTAMP,
    PRIMARY KEY("ID"),
    FOREIGN KEY("patientId") REFERENCES "PATIENT_DETAILS"("pat_dos_nr")
);
CREATE INDEX idx_Document_textCvtStatus ON "Document"("textCvtStatus");

Access rights

The tables PATIENT_DETAILS, PHYSICIAN_DETAILS, PATIENT_PHYSICIAN can be read-only or can be implemented as views in your database. The database user which connects to project blackbar should be allowed to read data from the Document table and should be able to write data to the columns textCvt, textCvtStatus, textCvtModDate.

Pseudonymization

The following 2 pseudonymization tables allow you to store the pseudonymized text records. It allows to store the anonymized and pseudonymized text in table blackbar_document. This pseudonymized text can be different for each project_id depending on the chosen pseudonymization settings. The mapping between the original chunks of text and the replacement is stored in table blackbar_deid.

By patient

The replacement of the entities between the original text chunks and the replaced text chunks is kept for reasons of traceability. The pseudonymization is done by patient such that the replacement is as consistent as possible across all the documents of the same patient.

CREATE TABLE "blackbar_document" (
    "doc_id"              BIGINT NOT NULL,
    "project_id"          INTEGER NOT NULL DEFAULT 0,
    "creation_date"       TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
    "patient_id"          VARCHAR(50) NOT NULL,
    "text"                LONGVARCHAR,
    "text_anonymized"     LONGVARCHAR,
    "text_pseudonymized"  LONGVARCHAR,
    "blackbar_model"      VARCHAR(200),
    "blackbar_entities"   LONGVARCHAR,
    "blackbar_options"    LONGVARCHAR,
    PRIMARY KEY ("doc_id", "project_id")
);
CREATE TABLE "blackbar_deid" (
    "deid"                     BIGINT AUTO_INCREMENT PRIMARY KEY, 
    "project_id"               INTEGER NOT NULL DEFAULT 0,
    "patient_id"               VARCHAR(50) NOT NULL,
    "entity_type"              VARCHAR(100) NOT NULL,
    "entity_text"              VARCHAR(4000) NOT NULL,  
    "entity_text_replacement"  VARCHAR(4000) NOT NULL,
    "blackbar_comment"         LONGVARCHAR
);

Make sure you set the following indexes to speed up internal processes.

CREATE INDEX idx_blackbar_document_doc_id ON "blackbar_document"("doc_id");
CREATE INDEX idx_blackbar_document_patient_id ON "blackbar_document"("patient_id");
CREATE INDEX idx_blackbar_document_project_id ON "blackbar_document"("project_id");
CREATE INDEX idx_blackbar_deid_patient_id ON "blackbar_deid"("patient_id");
CREATE INDEX idx_blackbar_deid_project_id ON "blackbar_deid"("project_id");
CREATE INDEX idx_blackbar_deid_patient_id_project_id ON "blackbar_deid"("patient_id", "project_id");

Footnotes

Schema made with drawdb.vercel.app↩︎