(Quick Personal work note)
Elasticsearch provides various of different field types in schema to support users' use cases and also for indexing data for search engine.
In this thread, I will be writing brief about one of field type called join which probably is not used often unlike other field types, use cases, and possibly something that good to know.
What is join field type?
Join field type is basically a field that forms parent/child relationship between records in the same index.
This can be done by defining type property to join in the schema/mapping with name of keys that defines parent and child.
{
"mappings": {
"properties": {
"_id": {
"type": "keyword"
},
"name": {
"type": "keyword"
},
"document_join_field": {
"type": "join",
"relations": {
"documentSet": "document"
}
}
}
}
}
Above schema will define every records in the index to have a property called "document_join_field" and then it will take a property value called name which takes either documentSet or document
Updating Document Set (Parent)
PUT /example-index/_doc/1?routing=1
{
"_id": "1",
"name": "Document folderA",
"NAME_OF_JOIN_FIELD": {
"name": "documentSet"
}
}
Updating Document (Child)
PUT /example-index/_doc/1?routing=1
{
"_id": "1",
"name": "Document Name A",
"NAME_OF_JOIN_FIELD": {
"name": "document"
}
}
As we can see that on updating a record, there's parameter called routing. Elasticsearch needs to index both parent and child data in the same shard when either parent or child record updates thus the routing parameter is used to make that happen.
Use case and consideration
Join field type can possibly be considered over nested field type by below points:
- Will each record have quite large amount of child fields?
- Do number of child fields need to be extended real-time? (Eg. like consumer adds new child field on app level)
If use cases need to cover points from above then I suggest using Join over nested field type.
Usage and challenge in Join field type
When we try to search in index, the index will return child and parent records since child is also a record in the index. Thus, to query search or filter for join field type, we can use either has_child or has_parent query which can help us from targeting records that we want.
has_child query
has_child query can be used on parent record which tells Elasticsearch to navigate through all child fields records from the parent record in the index and allow user to define search/filter queries.
GET /my-index/_search
{
"query": {
"has_child": {
"type": "document",
"query": {
...YOUR_SEARCH_QUERY_HERE
},
"inner_hits": {}
}
}
}
has_parent query
has_parent query can be used on child records to get parent data and allow user to define search/filter queries.
GET /my-index/_search
{
"query": {
"has_parent": {
"parent_type": "documentSet",
"query": {
...YOUR_SEARCH_QUERY_HERE
}
}
}
}
In addition, we can query has_parent query within has_parent query and search other child record data.
GET /my-index/_search
{
"query": {
"has_parent": {
"parent_type": "documentSet",
"query": {
"has_child": {
"type": "document",
"query": {
...YOUR_SEARCH_QUERY_HERE
},
"inner_hits": {}
}
}
}
}
}
With has_parent and has_child queries, it can cover most of search and filter queries. BUT one thing that needs to be careful is try not to make search queries massive. (Sometimes query for join field can get ugly and hard to read so be careful on that)
Sorting (challenge)
Sort query is quite challenge for join field type. In nested field type sorting can be done with a property called nested_path within sort query but there's no property that can support for join field.
One useful approach to make sorting work (kind of) is to use script_score on search query and then sort by script score.
This may work or may not work but this challenge still exists in now days and people are trying to solve problem (Unless Elasticsearch provides an update on this).
So is this better than nested field type?
Many people will say "Can't we just use nested field type then?" and my answer in generally will be yes. We can just use nested field type. And Nested field type should be faster than join field type.
- Nested field is generally faster on read/search queries
- Nested field may be slower on write/update queries (This depends on the data size but generally nested field reindexes every child data when one updates) While joined field just need to reindex the child/parent record user is trying to update
- Nested field need to set specific number of child fields that each nested field can have. While joined field does not need to worry about this.
- Join field search query can become massive (May become hard to read or maintain unless you are Elasticsearch expert)
- Join field has sort query challenge that needs to be solved.
- Join field data is easier to maintain than nested field
Again, both nested and join field have pros and cons and use cases are quite different. But generally if you are trying to work with massive nested data than generally join field is preferred over nested field.
Top comments (0)