Quarkus Presidio

This extension provides a REST Client to interact with Presidio Analyzer and Anonymizer services.

Apart from providing a REST Client, this extension integrates with Quarkus Health Check extension registering a readness probe against Presidio /health endpoint.

Moreover, it provides DevServices integration starting an Analyzer and an Anonymizer service, and configuring the URL pointing to these instances.

Installation

If you want to use this extension, you need to add the io.quarkiverse.presidio:quarkus-presidio extension first to your build file.

For instance, with Maven, add the following dependency to your POM file:

<dependency>
    <groupId>io.quarkiverse.presidio</groupId>
    <artifactId>quarkus-presidio</artifactId>
    <version>0.3.0</version>
</dependency>

Usage

This extension provides an integration with Presidio implementing a Microprofile Rest Client, so everything that is valid there like configuration properties are also valid for this extension.

For example the Rest Client config key for the analyzer is presidio-analyzer while for the anonymizer is presidio-anonymizer. Hence, to configure the URL for analyzer and anonymizer you should add the following properties to the application.properties file:

quarkus.rest-client.presidio-analyzer.url=http://www.example.org/analyzer
quarkus.rest-client.presidio-anonymizer.url=http://www.example.org/anonymizer

Then you can inject the io.quarkiverse.presidio.runtime.Analyzer to analyze a text in the following way:

@RestClient
Analyzer analyzer;

In the same way, you can inject the anonymizer io.quarkiverse.presidio.runtime.Anonymizer:

@RestClient
Anonymizer analyzer;

To understand the required objects o run it, we recommend you taking a look at the Presidio OpenAPI document, as there is a direct relationship between objects and the API.

Example

Presidio works by first analyzing and identifying patterns in a text and then run the anonymizer configuring what to anonymize and how.

Analyzing

To analyze a text:

AnalyzeRequest analyzeRequest = new AnalyzeRequest();
analyzeRequest.text("John Smith drivers license is AC432223");
analyzeRequest.language("en");

List<RecognizerResultWithAnaysisExplanation> explanations = analyzer.analyzePost(analyzeRequest);

Then you can iterate over the list to get all the information about the detected entities that could be anonymized.

Anonymizer

To anonymize you need to create a complex object which:

  1. Defines how to anonymize the content, for example replace the content with ANONYMIZED or if it is a phone number with 4 *.

  2. Add the results from the analyze call.

  3. Run anonymizer.

List<RecognizerResultWithAnaysisExplanation> recognizerResults = ...;

// Defines how to anonymize the content

static Replace REPLACE = new Replace("ANONYMIZED");
static Mask PHONE = new Mask("*", 4, true);

// Registers text to anonymize and the Anonymizers

AnonymizeRequest anonymizeRequest = new AnonymizeRequest();
anonymizeRequest.setText(text);

// By default replaces content
anonymizeRequest.putAnonymizersItem("DEFAULT", REPLACE);

// If detects a phone number it masks it
anonymizeRequest.putAnonymizersItem("PHONE_NUMBER", PHONE);

// Sets the analyzer results. You need to cast the objects for this reason, use Collections object to do it automatically.
anonymizeRequest.analyzerResults(
    Collections.unmodifiableList(recognizerResults)
);

// Anonymize elements
AnonymizeResponse anonymizeResponse = this.presidioService.anonymize(anonymizeRequest);
anonymizeResponse.getText();

You can check an example in the integration-tests module in the repo of the extension.

Types of Anonymizers

This extension supports the default anonymizers defined by Presidio:

  • io.quarkiverse.presidio.runtime.model.Redact

  • io.quarkiverse.presidio.runtime.model.Replace

  • io.quarkiverse.presidio.runtime.model.Mash

  • io.quarkiverse.presidio.runtime.model.Hash

  • io.quarkiverse.presidio.runtime.model.Encrypt

  • io.quarkiverse.presidio.runtime.model.Decrypt

If your Presidio instance registers other anonymizers, you can implement the class extending io.quarkiverse.presidio.runtime.model.Operator abstract class.

These classes are POJOs marshalled using the Jackson mapper, check any other of the provided classes to see how they are implemented.

Health Check integration

If your project imports Quarkus Health Check extension then this extension registers two readiness health checks pinging the analyzer and anonymizer instances. You can disable this by setting quarkus.presidio.health.enabled to false.

Dev Services

When starting Quarkus in Dev or Test mode, it will start a contaienr containing an instance of Presidio analyzer and another container with the Presidio anonymizer. Moreover, it will configure the application to connect to these instances.

You can disable dev services individually using quarkus.presidio.devservices.{analyzer/anonymizer}.enabled=false.

Also you can override the default container images using quarkus.presidio.devservices.{analyzer/anonymizer}.image=newImage

Extension Configuration Reference

Configuration property fixed at build time - All other configuration properties are overridable at runtime

Configuration property

Type

Default

Whether an analyzer devservice should start

Environment variable: QUARKUS_PRESIDIO_DEVSERVICES_ANALYZER_ENABLED

boolean

true

Whether an anonymizer devservice should start

Environment variable: QUARKUS_PRESIDIO_DEVSERVICES_ANONYMIZER_ENABLED

boolean

true

Set specific anonymizer devservice container image

Environment variable: QUARKUS_PRESIDIO_DEVSERVICES_ANONYMIZER_IMAGE

string

quay.io/lordofthejars/presidio-anonymizer:latest

Set specific analyzer devservice container image

Environment variable: QUARKUS_PRESIDIO_DEVSERVICES_ANALYZER_IMAGE

string

quay.io/lordofthejars/presidio-analyzer:latest

Sets reusable anonymizer container

Environment variable: QUARKUS_PRESIDIO_DEVSERVICES_ANONYMIZER_REUSE

boolean

true

Sets reusable analyzer container

Environment variable: QUARKUS_PRESIDIO_DEVSERVICES_ANALYZER_REUSE

boolean

true

Whether a health check is published in case the smallrye-health extension is present.

Environment variable: QUARKUS_PRESIDIO_HEALTH_ENABLED

boolean

true