Class ElasticsearchCommitter
- All Implemented Interfaces:
IBatchConsumer,ICommitter,IXMLConfigurable,AutoCloseable
Commits documents to Elasticsearch. This committer relies on Elasticsearch REST API.
"_id" field
Elasticsearch expects a field named "_id" that uniquely identifies each documents. You can provide that field yourself in documents you submit. If you do not specify an "_id" field, this committer will create one for you, using the document reference as the identifier value.
"content" field
By default the "body" of a document is read as an input stream
and stored in a "content" field. You can change that target field name
with setTargetContentField(String). If you set the target
content field to null, it will effectively skip storing
the content stream.
Dots (.) in field names
Your Elasticsearch installation may consider dots in field names
to be representing "objects", which may not always be what you want.
If having dots is causing you issues, make sure not to submit fields
with dots, or use setDotReplacement(String) to replace dots
with a character of your choice (e.g., underscore).
If your dot represents a nested object, keep reading.
JSON Objects
It is possible to provide a regular expression
that will identify one or more fields containing a JSON object rather
than a regular string (setJsonFieldsPattern(String)). For example,
this is a useful way to store nested objects. While very flexible,
it can be challenging to come up with the JSON structure. You may
want to consider custom code.
For this to work properly, make sure you define your Elasticsearch
field mappings on your index beforehand.
Elasticsearch ID limitations:
As of this writing, Elasticsearch 5 or higher have a 512 bytes
limitation on its "_id" field.
By default, an error (from Elasticsearch) will result from trying to submit
documents with an invalid ID. You can get around this by
setting setFixBadIds(boolean) to true. It will
truncate references that are too long and append a hash code to it
representing the truncated part. This approach is not 100%
collision-free (uniqueness), but it should safely cover the vast
majority of cases.
Type Name
As of Elasticsearch 7.0, the index type has been deprecated.
If you are using Elasticsearch 7.0 or higher, do not configure the
typeName. Doing so may cause errors.
The typeName is available only for backward compatibility
for those using this Committer with older versions of Elasticsearch.
Authentication
Basic authentication is supported for password-protected clusters.
Alternatively, API Key authentication can be used by providing the
encoded API key value via the setApiKey(String) method
or the <apiKey> XML configuration element.
When an API key is set, it takes precedence over basic credentials.
The API key value should be the Base64-encoded string as provided
by Elasticsearch (i.e., the value sent in the
Authorization: ApiKey ... header).
Timeouts
You can specify timeout values for when this committer sends documents to Elasticsearch.
XML configuration usage:
<committer
class="com.norconex.committer.elasticsearch.ElasticsearchCommitter">
<nodes>
(Comma-separated list of Elasticsearch node URLs.
Defaults to http://localhost:9200)
</nodes>
<indexName>(Name of the index to use)</indexName>
<typeName>
(Name of the type to use. Deprecated since Elasticsearch v7.)
</typeName>
<ignoreResponseErrors>[false|true]</ignoreResponseErrors>
<discoverNodes>[false|true]</discoverNodes>
<dotReplacement>
(Optional value replacing dots in field names)
</dotReplacement>
<jsonFieldsPattern>
(Optional regular expression to identify fields containing JSON
objects instead of regular strings)
</jsonFieldsPattern>
<connectionTimeout>(milliseconds)</connectionTimeout>
<socketTimeout>(milliseconds)</socketTimeout>
<fixBadIds>
[false|true](Forces references to fit into Elasticsearch _id field.)
</fixBadIds>
<!-- Use "credentials" for basic auth, or "apiKey" for API Key auth. -->
<credentials/>
<apiKey>
(Base64-encoded API key for Elasticsearch API Key authentication.
When set, takes precedence over basic credentials.)
</apiKey>
<sourceIdField>
(Optional document field name containing the value that will be stored
in Elasticsearch "_id" field. Default is the document reference.)
</sourceIdField>
<targetContentField>
(Optional Elasticsearch field name to store the document
content/body. Default is "content".)
</targetContentField>
</committer>
XML configuration entries expecting millisecond durations
can be provided in human-readable format (English only), as per
DurationParser (e.g., "5 minutes and 30 seconds" or "5m30s").
XML usage example:
<committer
class="com.norconex.committer.elasticsearch.ElasticsearchCommitter">
<indexName>some_index</indexName>
</committer>
The above example uses the minimum required settings, on the local host.
- Author:
- Pascal Essiembre
-
Field Summary
Fields -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionprotected voidprotected voidprotected org.elasticsearch.client.RestClientprotected org.elasticsearch.client.sniff.SniffercreateSniffer(org.elasticsearch.client.RestClient client) booleanGets the API key for Elasticsearch API Key authentication.intGets Elasticsearch connection timeout.Gets Elasticsearch authentication credentials.Gets the character used to replace dots in field names.Gets the index name.Gets the regular expression matching fields that contains a JSON object for its value (as opposed to a regular string).getNodes()Gets an unmodifiable list of Elasticsearch cluster node URLs.intGets Elasticsearch socket timeout.Gets the document field name containing the value to be stored in Elasticsearch "_id" field.Gets the name of the Elasticsearch field where content will be stored.Gets the type name.inthashCode()protected voidbooleanWhether automatic discovery of Elasticsearch cluster nodes should be enabled.booleanGets whether to fix IDs that are too long for Elasticsearch ID limitation (512 bytes max).booleanWhether to ignore response errors.protected voidprotected voidvoidSets the API key for Elasticsearch API Key authentication.voidsetConnectionTimeout(int connectionTimeout) Sets Elasticsearch connection timeout.voidsetCredentials(Credentials credentials) Sets Elasticsearch authentication credentials.voidsetDiscoverNodes(boolean discoverNodes) Sets whether automatic discovery of Elasticsearch cluster nodes should be enabled.voidsetDotReplacement(String dotReplacement) Sets the character used to replace dots in field names.voidsetFixBadIds(boolean fixBadIds) Sets whether to fix IDs that are too long for Elasticsearch ID limitation (512 bytes max).voidsetIgnoreResponseErrors(boolean ignoreResponseErrors) Sets whether to ignore response errors.voidsetIndexName(String indexName) Sets the index name.voidsetJsonFieldsPattern(String jsonFieldsPattern) Sets the regular expression matching fields that contains a JSON object for its value (as opposed to a regular string).voidSets cluster node URLs.voidSets cluster node URLs.voidsetSocketTimeout(int socketTimeout) Sets Elasticsearch socket timeout.voidsetSourceIdField(String sourceIdField) Sets the document field name containing the value to be stored in Elasticsearch "_id" field.voidsetTargetContentField(String targetContentField) Sets the name of the Elasticsearch field where content will be stored.voidsetTypeName(String typeName) Sets the type name.toString()Methods inherited from class com.norconex.committer.core3.batch.AbstractBatchCommitter
consume, doClean, doClose, doDelete, doInit, doUpsert, getCommitterQueue, loadCommitterFromXML, saveCommitterToXML, setCommitterQueueMethods inherited from class com.norconex.committer.core3.AbstractCommitter
accept, addRestriction, addRestrictions, applyFieldMappings, clean, clearFieldMappings, clearRestrictions, close, delete, fireDebug, fireDebug, fireError, fireError, fireInfo, fireInfo, getCommitterContext, getFieldMappings, getRestrictions, init, loadFromXML, removeFieldMapping, removeRestriction, removeRestriction, saveToXML, setFieldMapping, setFieldMappings, upsertMethods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, waitMethods inherited from interface com.norconex.commons.lang.xml.IXMLConfigurable
loadFromXML, saveToXML
-
Field Details
-
ELASTICSEARCH_ID_FIELD
- See Also:
-
DEFAULT_ELASTICSEARCH_CONTENT_FIELD
- See Also:
-
DEFAULT_NODE
- See Also:
-
DEFAULT_CONNECTION_TIMEOUT
public static final int DEFAULT_CONNECTION_TIMEOUT- See Also:
-
DEFAULT_SOCKET_TIMEOUT
public static final int DEFAULT_SOCKET_TIMEOUT- See Also:
-
-
Constructor Details
-
ElasticsearchCommitter
public ElasticsearchCommitter()
-
-
Method Details
-
getNodes
Gets an unmodifiable list of Elasticsearch cluster node URLs. Defaults to "http://localhost:9200".- Returns:
- Elasticsearch nodes
-
setNodes
Sets cluster node URLs. Node URLs with no port are assumed to be using port 80.- Parameters:
nodes- Elasticsearch cluster nodes
-
setNodes
Sets cluster node URLs. Node URLs with no port are assumed to be using port 80.- Parameters:
nodes- Elasticsearch cluster nodes
-
getTargetContentField
Gets the name of the Elasticsearch field where content will be stored. Default is "content".- Returns:
- field name
-
setTargetContentField
Sets the name of the Elasticsearch field where content will be stored. Specifying anullvalue will disable storing the content.- Parameters:
targetContentField- field name
-
getSourceIdField
Gets the document field name containing the value to be stored in Elasticsearch "_id" field. Default is not a field, but rather the document reference.- Returns:
- name of field containing id value
-
setSourceIdField
Sets the document field name containing the value to be stored in Elasticsearch "_id" field. Setnullto use the document reference instead of a field (default).- Parameters:
sourceIdField- name of field containing id value, ornull
-
getIndexName
Gets the index name.- Returns:
- index name
-
setIndexName
Sets the index name.- Parameters:
indexName- the index name
-
getTypeName
Gets the type name. Type name is deprecated if you are using Elasticsearch 7.0 or higher and should benull.- Returns:
- type name
-
setTypeName
Sets the type name. Type name is deprecated if you are using Elasticsearch 7.0 or higher and should benull.- Parameters:
typeName- type name
-
getJsonFieldsPattern
Gets the regular expression matching fields that contains a JSON object for its value (as opposed to a regular string). Default isnull.- Returns:
- regular expression
- Since:
- 4.1.0
-
setJsonFieldsPattern
Sets the regular expression matching fields that contains a JSON object for its value (as opposed to a regular string).- Parameters:
jsonFieldsPattern- regular expression- Since:
- 4.1.0
-
isIgnoreResponseErrors
public boolean isIgnoreResponseErrors()Whether to ignore response errors. By default, an exception is thrown if the Elasticsearch response contains an error. Whentruethe errors are logged instead.- Returns:
truewhen ignoring response errors
-
setIgnoreResponseErrors
public void setIgnoreResponseErrors(boolean ignoreResponseErrors) Sets whether to ignore response errors. Whenfalse, an exception is thrown if the Elasticsearch response contains an error. Whentruethe errors are logged instead.- Parameters:
ignoreResponseErrors-truewhen ignoring response errors
-
isDiscoverNodes
public boolean isDiscoverNodes()Whether automatic discovery of Elasticsearch cluster nodes should be enabled.- Returns:
trueif enabled
-
setDiscoverNodes
public void setDiscoverNodes(boolean discoverNodes) Sets whether automatic discovery of Elasticsearch cluster nodes should be enabled.- Parameters:
discoverNodes-trueif enabled
-
getCredentials
Gets Elasticsearch authentication credentials.- Returns:
- credentials
- Since:
- 5.0.0
-
setCredentials
Sets Elasticsearch authentication credentials.- Parameters:
credentials- the credentials- Since:
- 5.0.0
-
getApiKey
Gets the API key for Elasticsearch API Key authentication.- Returns:
- the Base64-encoded API key, or
null - Since:
- 5.0.0
-
setApiKey
Sets the API key for Elasticsearch API Key authentication. When set, this takes precedence over basic credentials. The value should be the Base64-encoded API key as provided by Elasticsearch.- Parameters:
apiKey- the Base64-encoded API key- Since:
- 5.0.0
-
getDotReplacement
Gets the character used to replace dots in field names. Default isnull(does not replace dots).- Returns:
- replacement character or
null
-
setDotReplacement
Sets the character used to replace dots in field names.- Parameters:
dotReplacement- replacement character ornull
-
getConnectionTimeout
public int getConnectionTimeout()Gets Elasticsearch connection timeout.- Returns:
- milliseconds
- Since:
- 4.1.0
-
setConnectionTimeout
public void setConnectionTimeout(int connectionTimeout) Sets Elasticsearch connection timeout.- Parameters:
connectionTimeout- milliseconds- Since:
- 4.1.0
-
getSocketTimeout
public int getSocketTimeout()Gets Elasticsearch socket timeout.- Returns:
- milliseconds
- Since:
- 4.1.0
-
setSocketTimeout
public void setSocketTimeout(int socketTimeout) Sets Elasticsearch socket timeout.- Parameters:
socketTimeout- milliseconds- Since:
- 4.1.0
-
isFixBadIds
public boolean isFixBadIds()Gets whether to fix IDs that are too long for Elasticsearch ID limitation (512 bytes max). Iftrue, long IDs will be truncated and a hash code representing the truncated part will be appended.- Returns:
trueto fix IDs that are too long- Since:
- 4.1.0
-
setFixBadIds
public void setFixBadIds(boolean fixBadIds) Sets whether to fix IDs that are too long for Elasticsearch ID limitation (512 bytes max). Iftrue, long IDs will be truncated and a hash code representing the truncated part will be appended.- Parameters:
fixBadIds-trueto fix IDs that are too long- Since:
- 4.1.0
-
initBatchCommitter
- Overrides:
initBatchCommitterin classAbstractBatchCommitter- Throws:
CommitterException
-
commitBatch
- Specified by:
commitBatchin classAbstractBatchCommitter- Throws:
CommitterException
-
closeBatchCommitter
- Overrides:
closeBatchCommitterin classAbstractBatchCommitter- Throws:
CommitterException
-
createRestClient
protected org.elasticsearch.client.RestClient createRestClient() -
createSniffer
protected org.elasticsearch.client.sniff.Sniffer createSniffer(org.elasticsearch.client.RestClient client) -
saveBatchCommitterToXML
- Specified by:
saveBatchCommitterToXMLin classAbstractBatchCommitter
-
loadBatchCommitterFromXML
- Specified by:
loadBatchCommitterFromXMLin classAbstractBatchCommitter
-
equals
- Overrides:
equalsin classAbstractBatchCommitter
-
hashCode
public int hashCode()- Overrides:
hashCodein classAbstractBatchCommitter
-
toString
- Overrides:
toStringin classAbstractBatchCommitter
-