Get started - documents & models

This section shows you how to get started and get familiar with the setup. It is particularly suited for the admins and developers who want to test out connectivity and access. First of all, request access to the software here

1. Test the setup by installing the different software packages

Use the provided token token install the Python packages

export BLACKBAR_GITHUB_PAT=YOUR_CREDENTIALS
python3 -m pip install git+https://${BLACKBAR_GITHUB_PAT}@github.com/bnosac/rlike.git@main#egg=rlike
python3 -m pip install git+https://${BLACKBAR_GITHUB_PAT}@github.com/bnosac/textalignment.git@main#egg=textalignment
python3 -m pip install git+https://${BLACKBAR_GITHUB_PAT}@github.com/bnosac/blackbar-py.git@main#egg=blackbar

2. Test that you can connect to the database with documents

  • The connection to the database is done through the Python package jaydebeapi. Make sure you have JAVA installed in order to be able to make this connection.
  • The structure of the database is explained in the database tutorial
  • In order to test connectivity, an example sqlite database is included in package blackbar
from blackbar import BlackbarDB
db = BlackbarDB('test')

You can get documents stored in the database

x = db.read_documents(ids = [1, 2, 3])
x
ID text
0 1 dagnota : 10/4/2021\n\nOpname D12\nRvO/ SAB op...
1 2 dagnota : 10/4/2021\n\nOpname D12\nRvO/ SAB op...
2 3 \n\n\n\n\n\n\nDR. GAME KEEPERS\n\nElektronisch...

The example database also contains the tables as shown in the database tutorial

x = db.read("select * from PHYSICIAN_DETAILS")
x
hcact_ognr hcact_vnaam hcact_fam_naam hcact_adres hcact_post_nr hcact_gemeente
0 ID-physician-1 Jan Janssens Stormy Daneelsstraat 125 1000 Brussel
1 ID-physician-Z Piet Peeters Stationsstraat 321 1090 Jette
2 ID-physician-teen Teen De Groote BRUSSELSESTEENWEG 123 1785 Merchtem
3 ID-dr-baan Baan Rodel BRUSSELSESTEENWEG 321 1785 Merchtem

3. Test the connectivity to your own database

  • We use jaydebeapi to connect to the database
  • Get the jdbc driver of your database e.g. InterSystems IRIS specify the URL of your database, your user and password.
  • Make sure the jars of your driver are in your CLASSPATH environment variable
  • Test if you can get data out of your database
from blackbar import BlackbarDB
from rlike import *
environ = {
    "BLACKBAR_DB": "InterSystems IRIS",
    "BLACKBAR_DB_URL": "jdbc:IRIS://SERVER:PORT/DOMAIN", 
    "BLACKBAR_DB_USER": "XXXXXXXXXX",
    "BLACKBAR_DB_PASSWORD": "XXXXXXXXXX",
    "BLACKBAR_DB_TABLES": "default",
    "BLACKBAR_DB_JAR": "/path/to/jdbc/driver/intersystems-jdbc-3.6.1.jar",
    "CLASSPATH": "/path/to/jdbc/driver/intersystems-jdbc-3.6.1.jar;/more/dependencies/mydriver-deps-0.1.1.jar"}
Sys_setenv(environ)
db = BlackbarDB()
x = db.read_documents(ids = [1, 2, 3])
x = db.read('select * from PHYSICIAN_DETAILS')

If you have put your tables in different names than the default ones, you can set BLACKBAR_DB_TABLES to a JSON string where you replace the 5 table names to your specific tables. These are the defaults for the InterSystems IRIS database.

'{"documents": "Data.Document", "patients": "dbo.PATIENT_DETAILS", "physicians": "dbo.PHYSICIAN_DETAILS", "patient_physician_relation": "dbo.PATIENT_PHYSICIAN", "pseudonymization": "dbo.blackbar_document", "pseudonymization_entities": "dbo.blackbar_deid"}'

4. Get a pretrained model

  • The software allows to build your own models but provides example models for the purpose of evaluating and testing out
  • You can download a model as follows by specifying the environment variables
    • BLACKBAR_S3_ENDPOINT: the url where S3/Minio is hosted
    • BLACKBAR_S3_ACCESS_KEY_ID: the provided access key to the S3/Minio instance
    • BLACKBAR_S3_SECRET_ACCESS_KEY: the key/password of the key the connection to S3/Minio and specifying the bucket and name of the model on the bucket.
from blackbar import blackbar_s3_download
from rlike import *
environ = {
    "BLACKBAR_S3_ENDPOINT": "blackbar.datatailor.be",
    "BLACKBAR_S3_ACCESS_KEY_ID": "XXXXXXXXXX", 
    "BLACKBAR_S3_SECRET_ACCESS_KEY": "XXXXXXXXXX"}
Sys_setenv(environ)
info = blackbar_s3_download(name = "deid_v2", bucket = "blackbar-models")