Quarkus Presidio
This extension provides a REST Client to interact with Presidio Analyzer and Anonymizer services.
Apart from providing a REST Client, this extension integrates with Quarkus Health Check extension registering a readness probe against Presidio /health
endpoint.
Moreover, it provides DevServices integration starting an Analyzer and an Anonymizer service, and configuring the URL
pointing to these instances.
Installation
If you want to use this extension, you need to add the io.quarkiverse.presidio:quarkus-presidio
extension first to your build file.
For instance, with Maven, add the following dependency to your POM file:
<dependency>
<groupId>io.quarkiverse.presidio</groupId>
<artifactId>quarkus-presidio</artifactId>
<version>0.3.0</version>
</dependency>
Usage
This extension provides an integration with Presidio implementing a Microprofile Rest Client, so everything that is valid there like configuration properties are also valid for this extension.
For example the Rest Client config key for the analyzer is presidio-analyzer
while for the anonymizer is presidio-anonymizer
.
Hence, to configure the URL for analyzer and anonymizer you should add the following properties to the application.properties
file:
quarkus.rest-client.presidio-analyzer.url=http://www.example.org/analyzer
quarkus.rest-client.presidio-anonymizer.url=http://www.example.org/anonymizer
Then you can inject the io.quarkiverse.presidio.runtime.Analyzer
to analyze a text in the following way:
@RestClient
Analyzer analyzer;
In the same way, you can inject the anonymizer io.quarkiverse.presidio.runtime.Anonymizer
:
@RestClient
Anonymizer analyzer;
To understand the required objects o run it, we recommend you taking a look at the Presidio OpenAPI document, as there is a direct relationship between objects and the API.
Example
Presidio works by first analyzing and identifying patterns in a text and then run the anonymizer configuring what to anonymize and how.
Analyzing
To analyze a text:
AnalyzeRequest analyzeRequest = new AnalyzeRequest();
analyzeRequest.text("John Smith drivers license is AC432223");
analyzeRequest.language("en");
List<RecognizerResultWithAnaysisExplanation> explanations = analyzer.analyzePost(analyzeRequest);
Then you can iterate over the list to get all the information about the detected entities that could be anonymized.
Anonymizer
To anonymize you need to create a complex object which:
-
Defines how to anonymize the content, for example replace the content with
ANONYMIZED
or if it is a phone number with 4*
. -
Add the results from the analyze call.
-
Run anonymizer.
List<RecognizerResultWithAnaysisExplanation> recognizerResults = ...;
// Defines how to anonymize the content
static Replace REPLACE = new Replace("ANONYMIZED");
static Mask PHONE = new Mask("*", 4, true);
// Registers text to anonymize and the Anonymizers
AnonymizeRequest anonymizeRequest = new AnonymizeRequest();
anonymizeRequest.setText(text);
// By default replaces content
anonymizeRequest.putAnonymizersItem("DEFAULT", REPLACE);
// If detects a phone number it masks it
anonymizeRequest.putAnonymizersItem("PHONE_NUMBER", PHONE);
// Sets the analyzer results. You need to cast the objects for this reason, use Collections object to do it automatically.
anonymizeRequest.analyzerResults(
Collections.unmodifiableList(recognizerResults)
);
// Anonymize elements
AnonymizeResponse anonymizeResponse = this.presidioService.anonymize(anonymizeRequest);
anonymizeResponse.getText();
You can check an example in the integration-tests
module in the repo of the extension.
Types of Anonymizers
This extension supports the default anonymizers defined by Presidio:
-
io.quarkiverse.presidio.runtime.model.Redact
-
io.quarkiverse.presidio.runtime.model.Replace
-
io.quarkiverse.presidio.runtime.model.Mash
-
io.quarkiverse.presidio.runtime.model.Hash
-
io.quarkiverse.presidio.runtime.model.Encrypt
-
io.quarkiverse.presidio.runtime.model.Decrypt
If your Presidio instance registers other anonymizers, you can implement the class extending io.quarkiverse.presidio.runtime.model.Operator
abstract class.
These classes are POJOs marshalled using the Jackson mapper, check any other of the provided classes to see how they are implemented.
Health Check integration
If your project imports Quarkus Health Check extension then this extension registers two readiness health checks pinging the analyzer and anonymizer instances.
You can disable this by setting quarkus.presidio.health.enabled
to false.
Dev Services
When starting Quarkus in Dev or Test mode, it will start a contaienr containing an instance of Presidio analyzer and another container with the Presidio anonymizer. Moreover, it will configure the application to connect to these instances.
You can disable dev services individually using quarkus.presidio.devservices.{analyzer/anonymizer}.enabled=false
.
Also you can override the default container images using quarkus.presidio.devservices.{analyzer/anonymizer}.image=newImage
Extension Configuration Reference
Configuration property fixed at build time - All other configuration properties are overridable at runtime
Configuration property |
Type |
Default |
---|---|---|
Whether an analyzer devservice should start Environment variable: |
boolean |
|
Whether an anonymizer devservice should start Environment variable: |
boolean |
|
Set specific anonymizer devservice container image Environment variable: |
string |
|
Set specific analyzer devservice container image Environment variable: |
string |
|
Sets reusable anonymizer container Environment variable: |
boolean |
|
Sets reusable analyzer container Environment variable: |
boolean |
|
Whether a health check is published in case the smallrye-health extension is present. Environment variable: |
boolean |
|