Quarkus logging splunk
Introduction
Splunk is a middleware solution that receives, stores, indexes and finally allows to exploit the logs of an application.
This Quarkus extension provides the support of the official Splunk client library to index log events through the HTTP Event collection, provided by Splunk enterprise solution.
Installation
If you want to use this extension, you need to add the quarkus-logging-splunk
extension first.
In your pom.xml
file, add:
<dependency>
<groupId>io.quarkiverse.logging.splunk</groupId>
<artifactId>quarkus-logging-splunk</artifactId>
<version>{project-version}</version>
</dependency>
Features
The extension can be used transparently with any log frontend used by Quarkus (Log4j, SLF4J, … ).
Log message formatting
In all cases the log message formatter is aligned by default with the one of Quarkus console handler:
quarkus.log.handler.splunk.format="%d{yyyy-MM-dd HH:mm:ss,SSS} %-5p [%c{3.}] (%t) %s%e%n"
This can be adapted in order to avoid duplication with metadata that are passed in a structured way.
Log event metadata
The type of metadata depends on the serialization format.
If quarkus.log.handler.splunk.raw
is enabled or quarkus.log.handler.splunk.serialization
is raw
, there are no per-event metadata.
Only few global metadata shared between all events of a batch are sent via HTTP headers and query parameters.
In other cases, the extension uses structured logging, via JSON serialization. There are two supported structured formats:
-
The
nested
serialization is the default format of Splunk HEC Java client and defines the name of some pre-defined metadata. Combined withquarkus.log.handler.splunk.format=%s%e
it also support log messages that are themselves JSON. -
The
flat
serialization is a simpler and more generic format, also used by the OpenTelemetry Splunk HEC exporter.
Some metadata can be indexed by Splunk, see indexed fields.
The default _json
source type indexes metadata passed in the fields
object.
The extension provides the support of the resolution of MDC scoped properties, as defined in JBoss supported formatters.
Serialization format | nested |
flat |
---|---|---|
HEC metadata |
|
|
Pre-defined metadata |
Only
|
Only
|
MDC properties |
Passed via |
Passed via |
Static metadata |
Passed via |
A structured query to Splunk HEC looks like:
curl -k -v -X POST https://localhost:8080/services/collector/event/1.0 -H "Content-type: application/json; profile=\"urn:splunk:event:1.0\"; charset=utf-8" -H "Authorization: Splunk 29fe2838-cab6-4d17-a392-37b7b8f41f75" -d@events.json
{
"time": "1673001538.042",
"host": "hostname",
"source": "mysource",
"sourcetype": "_json",
"index": "main",
"event": {
"message": "2023-01-06 ERROR The log message",
"logger": "com.acme.MyClass",
"severity": "ERROR",
"exception": "java.lang.NullPointerException",
"properties": {
"mdc-key": "mdc-value"
}
},
"fields": {
"key": "static-value"
}
}
{
"time": "1673001538.042",
"host": "hostname",
"source": "mysource",
"index": "main",
"event": "2023-01-06 ERROR The log message",
"fields": {
"severity": "ERROR",
"mdc-key": "mdc-value",
"key": "static-value"
}
}
Connectivity failures
Batched events that cannot be sent to the Splunk indexer will be logged to stdout:
-
Formatted using console handler settings if the console handler is enabled
-
Formatted using splunk handler settings otherwise
In any case, the root cause of the failure is always logged to stderr.
Asynchronous handler
By default, the log handler is synchronous and only the HTTP requests to HEC endpoint are done asynchronously:
This can be an issue because the Splunk library #send
is synchronized, so any preprocessing of the batch HTTP request itself happens on the application thread of the log event that triggered the batch to be full (either by reaching quarkus.log.handler.splunk.batch-size-count
or quarkus.log.handler.splunk.batch-size-bytes
)
By enabling quarkus.log.handler.splunk.async=true
, an intermediate event queue is used, which decouples the flushing of the batch from any application thread:
By default quarkus.log.handler.splunk.async.overflow=block
, so applicative threads will block once the queue limit has reached quarkus.log.handler.splunk.async.queue-length
.
There’s no link between quarkus.log.handler.splunk.async.queue-length
and quarkus.log.handler.splunk.batch-size-count
.
Sequential and parallel modes
The number of events kept in memory for batching purposes is not limited.
After tuning quarkus.log.handler.splunk.batch-size-count
and quarkus.log.handler.splunk.batch-size-bytes
, in case the HEC endpoint cannot keep up with the batch throughput, using multiple HTTP connections might help to reduce memory usage on the client.
By setting quarkus.log.handler.splunk.send-mode=parallel
multiple batches will be sent over the wire in parallel, potentially increasing throughput with the HEC endpoint.
Named Splunk log handlers
A named log handler can be configured to manage multiple Splunk configurations for particular log emissions. Like for core Quarkus handlers (console, file or syslog), Splunk named handlers follow the same configuration:
# Global configuration
quarkus.log.handler.splunk.token=12345678-1234-1234-1234-1234567890AB
quarkus.log.handler.splunk.metadata-index=mylogindex
# Splunk named handler configuration, named here MONITORING
quarkus.log.handler.splunk."MONITORING".token=12345678-0000-0000-0000-1234567890AB
quarkus.log.handler.splunk."MONITORING".metadata-index=mystatsindex
# Registration of the custom handler through Quarkus core category management, here monitoring as the logging category
quarkus.log.category."monitoring".handlers=MONITORING
quarkus.log.category."monitoring".use-parent-handlers=false
Next to use such logger in actual code, you can rely on annotation or factory:
-
With annotation:
@LoggerName("monitoring")
Logger monitoringLogger;
-
With factory:
static final Logger monitoringLogger = Logger.getLogger("monitoring");
Some important considerations
-
Every handler is isolated and uses a separate Splunk client and connection pool, which means it has a cost.
-
The configuration from the root handler are not inherited by named handlers.
-
Use
quarkus.log.category."named-handler".use-parent-handlers=false
is required if you do not want the root handler to also receive log events already sent to named handlers.
Developer experience
To enhance the developer experience, some integration in the Development mode of Quarkus is provided.
Dev service
The extension provides a Dev Service that starts in background a splunk container. It is deactivated by default, to maintain the compatibility with disabling the splunk extension at runtime.
Activation
To activate the dev service, the following needs to be configured with this property:
quarkus.log.handler.splunk.devservices.enabled=true
Obviously in "normal" mode (not dev, not test), this has no effect. The Quarkus dev services framework picks up the configuration and starts the splunk container. When eventually the container is considered started, some configuration is injected to expose the random port on which splunk is listening. It also injects the following:
quarkus.log.handler.splunk.token=local-dev-token
quarkus.log.handler.splunk.disable-certificate-validation=true
quarkus.log.handler.splunk.enabled=true
Namely:
-
the HEC token configured in the Splunk container at boot
-
Splunk enforces HTTPS on endpoints but with self-signed certificates, so we need to ignore certificate validation
-
forcing the activation of the
quarkus-logging-splunk
extension when its dev service has been activated at build time.
Support of named handlers
Named handlers are supported through additional build time configuration. Example:
quarkus.log.handler.splunk.devservices.plug-named-handlers.<myhandler>=true
Here is the config property table entry.
The result is that it will override for that named handler the following configuration:
-
the HEC endpoint
-
the token
-
remove of certificate verification
-
force enabled the splunk log handler
Usage in tests
a quarkus-logging-splunk-test-utils
module proposes some test framework layer allowing access to the Splunk API URL.
This is useful to launch searches after the test run to do some assertions on the logs eventually sent to Splunk.
It is definitely not to be used in all tests, as it quite lengthens the test run time.
I.e. the start of splunk takes about 30s.
And there needs to be added some delay after the test run to make sure log entries have been properly propagated to Splunk.
The search in Splunk can be done using its API.
We propose a custom QuarkusTestResourceLifecycleManager to inject the URL to the Splunk API (for compatibility with QuarkusIntegrationTest
when microprofile-config
injection is disallowed):
@QuarkusTest
@QuarkusTestResource(LoggingSplunkInjectionTestResource.class) (1)
class MyQuarkusTest {
@LoggingSplunkApiUrl (2)
String splunkApiUrl;
@Test
void test() {
RestAssured.given()
.request()
.formParam("search", "search \"hello splunk\"")
.formParam("exec_mode", "oneshot")
.relaxedHTTPSValidation()
.auth()
.basic("admin", "admin123") (3)
.log()
.ifValidationFails()
.post(splunkApiUrl + "/services/search/jobs") (4)
.then()
.statusCode(200)
.body(containsString("hello splunk"), containsString("mdc-value"));
}
}
1 | The QuarkusTestResource to declare, which will also be picked by the potential IT test extending this QuarkusTest class. |
2 | The annotation to use on a String field, where the Splunk API will be injected |
3 | The default credentials configured in the Splunk container (user admin, password admin123) |
4 | The URL injected has the https://host:port pattern, so it need to be completed with the actual service path you want to access. |
Additional configuration
To customize a bit the splunk container started, a few configuration options are given:
-
customize the actual container image used through
quarkus.log.handler.splunk.devservices.image-name
-
enforce that in dev mode the splunk instances are shared between runs of microservices run in Dev mode with
quarkus.log.handler.splunk.devservices.shared
(boolean). Default is shared true. -
Add/customize environment variables with
quarkus.log.handler.splunk.devservices.container-env
. Map of key values.
Extension Configuration Reference
This extension follows the log handlers
configuration domain that is defined by Quarkus, every configuration property of this extension will belong to the following configuration root : quarkus.log.handler.splunk
When present this extension is enabled by default, meaning the client would expect a valid connection to a Splunk indexer and would print an error message for every log created by the application.
So in local environment, the log handler can be disabled with the following property :
quarkus.log.handler.splunk.enabled=false
Every configuration property of the extension is overridable at runtime.
Configuration property fixed at build time - All other configuration properties are overridable at runtime
Type |
Default |
|
---|---|---|
whether to activate dev services or not Environment variable: |
boolean |
|
Override the docker image used for the Splunk dev service Environment variable: |
string |
|
Whether the instance of splunk can be shared between runs in DEV mode. Environment variable: |
boolean |
|
Determine whether to enable the handler Environment variable: |
boolean |
|
The splunk handler log level. By default, it is no more strict than the root handler level. Environment variable: |
|
|
Splunk HEC endpoint base url. With raw events, the endpoint targeted is /services/collector/raw. With flat or nested JSON events, the endpoint targeted is /services/collector/event/1.0. Environment variable: |
string |
|
Disable TLS certificate validation with HEC endpoint Environment variable: |
boolean |
|
The application token to authenticate with HEC, the token is mandatory if the extension is enabled https://docs.splunk.com/Documentation/Splunk/latest/Data/FormateventsforHTTPEventCollector#HEC_token Environment variable: |
string |
|
The strategy to send events to HEC. In sequential mode, there is only one HTTP connection to HEC and the order of events is preserved, but performance is lower. In parallel mode, event batches are sent asynchronously over multiple HTTP connections, and events with the same timestamp (that has 1 millisecond resolution) may be indexed out of order by Splunk. Environment variable: |
|
|
A GUID to identify an HEC client and guarantee isolation at HEC level in case of slow clients. https://docs.splunk.com/Documentation/Splunk/latest/Data/AboutHECIDXAck#About_channels_and_sending_data Environment variable: |
string |
|
Batching delay before sending a group of events. If 0, the events are sent immediately. Environment variable: |
|
|
Maximum number of events in a batch. By default 10, if 0 no batching. Environment variable: |
long |
|
Maximum total size in bytes of events in a batch. By default 10KB, if 0 no batching. Environment variable: |
long |
|
Maximum number of retries in case of I/O exceptions with HEC connection. Environment variable: |
long |
|
A middleware to customize the behavior of sending events to Splunk. Environment variable: |
string |
|
The log format, defining which metadata are inlined inside the log main payload. Specific metadata (hostname, category, thread name, …), as well as MDC key/value map, can also be sent in a structured way. Environment variable: |
string |
|
Whether to send the thrown exception message as a structured metadata of the log event (as opposed to %e in a formatted message, it does not include the exception name or stacktrace). Only applicable to 'nested' serialization. Environment variable: |
boolean |
|
Whether to send the logger name as a structured metadata of the log event (equivalent of %c in a formatted message). Only applicable to 'nested' serialization. Environment variable: |
boolean |
|
Whether to send the thread name as a structured metadata of the log event (equivalent of %t in a formatted message). Only applicable to 'nested' serialization. Environment variable: |
boolean |
|
Overrides the host name metadata value. Environment variable: |
string |
|
The source value to assign to the event data. For example, if you’re sending data from an app you’re developing, you could set this key to the name of the app. https://docs.splunk.com/Documentation/Splunk/latest/Data/FormateventsforHTTPEventCollector#Event_metadata Environment variable: |
string |
|
The optional format of the events, to enable some parsing on Splunk side. https://docs.splunk.com/Documentation/Splunk/latest/Data/FormateventsforHTTPEventCollector#Event_metadata A given source type may have indexed fields extraction enabled, which is the case of the built-in _json used for nested serialization. Environment variable: |
string |
|
The optional name of the index by which the event data is to be stored. If set, it must be within the list of allowed indexes of the token (if it has the indexes parameter set). https://docs.splunk.com/Documentation/Splunk/latest/Data/FormateventsforHTTPEventCollector#Event_metadata Environment variable: |
string |
|
The name of the key used to convey the severity / log level in the metadata fields. Only applicable to 'flat' serialization. With 'nested' serialization, there is already a 'severity' field. Environment variable: |
string |
|
The format of the payload.
Environment variable: |
|
|
The name of the named filter to link to the splunk handler. Environment variable: |
string |
|
Indicates whether to log asynchronously Environment variable: |
boolean |
|
The queue length to use before flushing writing Environment variable: |
int |
|
Determine whether to block the publisher (rather than drop the message) when the queue is full Environment variable: |
|
|
The API URL the splunk dev service listens on. Environment variable: |
string |
|
Additional environment variables to inject. Environment variable: |
|
|
Map that allows to tell to plug the following named handlers to the dev service It is necessary as we do not have access to runtime configuration when starting the Splunk container. Environment variable: |
|
|
Optional static key/value pairs to populate the "fields" key of event metadata. This isn’t applicable to raw serialization. https://docs.splunk.com/Documentation/Splunk/latest/Data/FormateventsforHTTPEventCollector#Event_metadata Environment variable: |
|
|
Determine whether to enable the handler Environment variable: |
boolean |
|
The splunk handler log level. By default, it is no more strict than the root handler level. Environment variable: |
|
|
Splunk HEC endpoint base url. With raw events, the endpoint targeted is /services/collector/raw. With flat or nested JSON events, the endpoint targeted is /services/collector/event/1.0. Environment variable: |
string |
|
Disable TLS certificate validation with HEC endpoint Environment variable: |
boolean |
|
The application token to authenticate with HEC, the token is mandatory if the extension is enabled https://docs.splunk.com/Documentation/Splunk/latest/Data/FormateventsforHTTPEventCollector#HEC_token Environment variable: |
string |
|
The strategy to send events to HEC. In sequential mode, there is only one HTTP connection to HEC and the order of events is preserved, but performance is lower. In parallel mode, event batches are sent asynchronously over multiple HTTP connections, and events with the same timestamp (that has 1 millisecond resolution) may be indexed out of order by Splunk. Environment variable: |
|
|
A GUID to identify an HEC client and guarantee isolation at HEC level in case of slow clients. https://docs.splunk.com/Documentation/Splunk/latest/Data/AboutHECIDXAck#About_channels_and_sending_data Environment variable: |
string |
|
Batching delay before sending a group of events. If 0, the events are sent immediately. Environment variable: |
|
|
Maximum number of events in a batch. By default 10, if 0 no batching. Environment variable: |
long |
|
Maximum total size in bytes of events in a batch. By default 10KB, if 0 no batching. Environment variable: |
long |
|
Maximum number of retries in case of I/O exceptions with HEC connection. Environment variable: |
long |
|
A middleware to customize the behavior of sending events to Splunk. Environment variable: |
string |
|
The log format, defining which metadata are inlined inside the log main payload. Specific metadata (hostname, category, thread name, …), as well as MDC key/value map, can also be sent in a structured way. Environment variable: |
string |
|
Whether to send the thrown exception message as a structured metadata of the log event (as opposed to %e in a formatted message, it does not include the exception name or stacktrace). Only applicable to 'nested' serialization. Environment variable: |
boolean |
|
Whether to send the logger name as a structured metadata of the log event (equivalent of %c in a formatted message). Only applicable to 'nested' serialization. Environment variable: |
boolean |
|
Whether to send the thread name as a structured metadata of the log event (equivalent of %t in a formatted message). Only applicable to 'nested' serialization. Environment variable: |
boolean |
|
Overrides the host name metadata value. Environment variable: |
string |
|
The source value to assign to the event data. For example, if you’re sending data from an app you’re developing, you could set this key to the name of the app. https://docs.splunk.com/Documentation/Splunk/latest/Data/FormateventsforHTTPEventCollector#Event_metadata Environment variable: |
string |
|
The optional format of the events, to enable some parsing on Splunk side. https://docs.splunk.com/Documentation/Splunk/latest/Data/FormateventsforHTTPEventCollector#Event_metadata A given source type may have indexed fields extraction enabled, which is the case of the built-in _json used for nested serialization. Environment variable: |
string |
|
The optional name of the index by which the event data is to be stored. If set, it must be within the list of allowed indexes of the token (if it has the indexes parameter set). https://docs.splunk.com/Documentation/Splunk/latest/Data/FormateventsforHTTPEventCollector#Event_metadata Environment variable: |
string |
|
Optional static key/value pairs to populate the "fields" key of event metadata. This isn’t applicable to raw serialization. https://docs.splunk.com/Documentation/Splunk/latest/Data/FormateventsforHTTPEventCollector#Event_metadata Environment variable: |
|
|
The name of the key used to convey the severity / log level in the metadata fields. Only applicable to 'flat' serialization. With 'nested' serialization, there is already a 'severity' field. Environment variable: |
string |
|
The format of the payload.
Environment variable: |
|
|
The name of the named filter to link to the splunk handler. Environment variable: |
string |
|
Indicates whether to log asynchronously Environment variable: |
boolean |
|
The queue length to use before flushing writing Environment variable: |
int |
|
Determine whether to block the publisher (rather than drop the message) when the queue is full Environment variable: |
|
|
Sets the default connect timeout for new connections in milliseconds. Environment variable: |
long |
|
Sets the default timeout for complete calls in milliseconds. Environment variable: |
long |
|
Sets the default read timeout for new connections in milliseconds. Environment variable: |
long |
|
Sets the default write timeout for new connections in milliseconds. Environment variable: |
long |
|
Sets the default termination timeout during a flush in milliseconds. Environment variable: |
long |
|
About the Duration format
To write duration values, use the standard You can also use a simplified format, starting with a number:
In other cases, the simplified format is translated to the
|