Elasticsearch系列---几个高级功能 - 好文

概要

本篇主要介绍一下搜索模板、映射模板、高亮搜索和地理位置的简单玩法。

标准搜索模板

搜索模板search
tempalte高级功能之一，可以将我们的一些搜索进行模板化，使用现有模板时传入指定的参数就可以了，避免编写重复代码。对常用的功能可以利用模板进行封装，使用时更简便。

这点类似于我们编程时的接口封装，将一些细节处理的东西封装成接口，供别人调用，使用者就只需要关注参数和响应结果就行，这样可以更好地提高代码复用率。

下面我们来看看最基本的几种用法

参数替换
GET /music/children/_search/template { "source": { "query": { "match": {
"{{field}}":"{{value}}" } } }, "params": { "field":"name", "value":"bye-bye" } }
该搜索模板编译后等同于：
GET /music/children/_search { "query": { "match": { "name":"bye-bye" } } }
使用Json格式的条件查询

{{#toJson}}块内可以写稍微复杂一些的条件
GET /music/children/_search/template { "source": "{\"query\":{\"match\":
{{#toJson}}condition{{/toJson}}}}", "params": { "condition": { "name":"bye-bye"
} } }
该搜索模板编译后等同于如下：
GET /music/children/_search { "query": { "match": { "name":"bye-bye" } } }
join语法

join内的参数names可以写多个：
GET /music/children/_search/template { "source": { "query": { "match": {
"name": "{{#join delimiter=' '}}names{{/join delimiter=' '}}" } } }, "params":
{ "name":["gymbo","you are my sunshine","bye-bye"] } }
该搜索模板编译后等同于如下:
GET /music/children/_search { "query": { "match": { "name":"gymbo you are my
sunshine bye-bye" } } }
搜索模板的默认值设置

可以对搜索模板进行一些默认值的设置，如{{^end}}500表示如果end参数为空，默认值为500
GET /music/children/_search/template { "source":{ "query":{ "range":{
"likes":{ "gte":"{{start}}", "lte":"{{end}}{{^end}}500{{/end}}" } } } },
"params": { "start":1, "end":300 } }
该搜索模板编译后等同于：
GET /music/children/_search { "query": { "range": { "likes": { "gte": 1,
"lte": 300 } } } }
条件判断

在Mustache语言中，它没有if/else这样的判断，但是你可以定section来跳过它如果那个变量是false还是没有被定义
{{#param1}} "This section is skipped if param1 is null or false" {{/param1}}
示例：创建mustache scripts对象
POST _scripts/condition { "script": { "lang": "mustache", "source": """ {
"query": { "bool": { "must": { "match": { "name": "{{name}}" } }, "filter":{
{{#isLike}} "range":{ "likes":{ {{#start}} "gte":"{{start}}" {{#end}},{{/end}}
{{/start}} {{#end}} "lte":"{{end}}" {{/end}} } } {{/isLike}} } } } } """ } }
使用mustache template查询：
GET _search/template { "id": "condition", "params": { "name":"gymbo",
"isLike":true, "start":1, "end":500 } }

以上是常用的几种搜索模板介绍，如果在大型项目，并且配置了专门的Elasticsearch工程师，就经常会用一些通用的功能进行模板化，开发业务系统的童鞋只需要使用模板即可。

定制映射模板

ES有自己的规则对插入的数据进行类型映射，如10，会自动映射成long类型，"10"会自动映射成text，还会自带一个keyword的内置field。方便是很方便，但有时候这些类型不是我们想要的，比如我们的整数值10，我们期望是这个integer类型，"10"我们希望是keyword类型，这时候我们可以预先定义一个模板，插入数据时，相关的field就按我们预先定义的规则进行匹配，决定这个field值的类型。

另外要声明一下，实际工作中编码规范一般严谨一些，所有的document都是预先定义好类型再执行数据插入的，哪怕是中途增加的field，也是先执行mapping命令，再插入数据的。

但自定义动态映射模板也需要了解一下。

默认的动态映射效果

试着插入一条数据：
PUT /test_index/type/1 { "test_string":"hello kitty", "test_number":10 }
查看mapping信息

GET /test_index/_mapping/type

响应如下：
{ "test_index": { "mappings": { "type": { "properties": { "test_number": {
"type": "long" }, "test_string": { "type": "text", "fields": { "keyword": {
"type": "keyword", "ignore_above": 256 } } } } } } } }
默认的动态映射规则，可能不是我们想要的。

例如，我们希望数字类型的默认是integer类型，字符串默认是string类型，但是内置的field名字叫raw，不叫keyword，保留128个字符。

动态映射模板

有两种方式：

* 根据新加入的field的默认的数据类型，来进行匹配，匹配某个预定义的模板
* 根据新加入的field的名字，去匹配预定义的名字，或者去匹配一个预定义的通配符，然后匹配上某个预定义的模板
根据数据类型进行匹配
PUT /test_index { "mappings": { "type": { "dynamic_templates": [ { "integers"
: { "match_mapping_type": "long", "mapping": { "type":"integer" } } }, {
"strings" : { "match_mapping_type": "string", "mapping": { "type":"text",
"fields": { "raw": { "type": "keyword", "ignore_above": 128 } } } } } ] } } }
删除索引，重新插入数据，查看mapping信息如下：
{ "test_index": { "mappings": { "type": { "dynamic_templates": [ { "integers":
{ "match_mapping_type": "long", "mapping": { "type": "integer" } } }, {
"strings": { "match_mapping_type": "string", "mapping": { "fields": { "raw": {
"ignore_above": 128, "type": "keyword" } }, "type": "text" } } } ],
"properties": { "test_number": { "type": "integer" }, "test_string": { "type":
"text", "fields": { "raw": { "type": "keyword", "ignore_above": 128 } } } } } }
} }
以按预计类型进行映射，符合预期。

* 按field名称进行映射
* "long_"开头的field，并且原本是long类型的，转换为integer类型
* "string_"开头的field，并且原本是string类型的，转换为string.raw类型
"_text"结尾的field，并且原本是string类型的，保持不变 PUT /test_index { "mappings": { "type": {
"dynamic_templates":[ { "long_as_integer": { "match_mapping_type":"long",
"match": "long_*", "mapping":{ "type":"integer" } } }, { "string_as_raw": {
"match_mapping_type":"string", "match": "string_*", "unmatch":"*_text",
"mapping": { "type":"text", "fields": { "raw": { "type": "keyword",
"ignore_above": 128 } } } } } ] } } }
插入数据：
PUT /test_index/type/1 { "string_test":"hello kitty", "long_test": 10,
"title_text":"Hello everyone" }
查询mapping信息
{ "test_index": { "mappings": { "type": { "dynamic_templates": [ {
"long_as_integer": { "match": "long_*", "match_mapping_type": "long",
"mapping": { "type": "integer" } } }, { "string_as_raw": { "match": "string_*",
"unmatch": "*_text", "match_mapping_type": "string", "mapping": { "fields": {
"raw": { "ignore_above": 128, "type": "keyword" } }, "type": "text" } } } ],
"properties": { "long_test": { "type": "integer" }, "string_test": { "type":
"text", "fields": { "raw": { "type": "keyword", "ignore_above": 128 } } },
"title_text": { "type": "text", "fields": { "keyword": { "type": "keyword",
"ignore_above": 256 } } } } } } } }
结果符合预期。

在某些日志管理的场景中，我们可以定义好type，每天按日期创建一个索引，这种索引的创建就可以用到映射模板，把我们定义的映射关系全部做进去。

高亮搜索

我们在浏览器上搜索文本时，发现我们输入的关键字有高亮显示，查看html源码就知道，高亮的部分是加了
标签的，ES也支持高亮搜索这种操作的，并且在返回的文档中自动加了标签，兼容html5页面。

highlight基本语法

我们还是以音乐网站为案例，开始进行高亮搜索：
GET /music/children/_search { "query": { "match": { "content": "love" } },
"highlight": { "fields": { "content": {} } } }
highlight里面的参数即为高亮搜索的语法，指定高亮的字段为content，我们可以看到命中的Love里面带了高亮标签，
表现在html上会变成红色，所以说你的指定的field中，如果包含了那个搜索词的话，就会在那个field的文本中，对搜索词进行红色的高亮显示。
{ "took": 35, "timed_out": false, "_shards": { "total": 5, "successful": 5,
"skipped": 0, "failed": 0 }, "hits": { "total": 1, "max_score": 0.2876821,
"hits": [ { "_index": "music", "_type": "children", "_id": "5", "_score":
0.2876821, "_source": { "id": "1740e61c-63da-474f-9058-c2ab3c4f0b0a",
"author_first_name": "Jean", "author_last_name": "Ritchie", "author": "Jean
Ritchie", "name": "love somebody", "content": "love somebody, yes I do",
"language": "english", "tags": "love", "length": 38, "likes": 3, "isRelease":
true, "releaseDate": "2019-12-22" }, "highlight": { "content": [ "love
somebody, yes I do" ] } } ] } }
highlight下的字段可以指定多个，这样就可以在多个字段命中的关键词进行高亮显示，例如：
GET /music/children/_search { "query": { "match": { "content": "love" } },
"highlight": { "fields": { "name":{}, "content": {} } } }
三种高亮语法

有三种高亮的语法：

* plain highlight：使用standard Lucene highlighter，对简单的查询支持度非常好。
* unified highlight：默认的高亮语法，使用Lucene Unified
Highlighter，将文本切分成句子，并对句子使用BM25计算词条的score，支持精准查询和模糊查询。
* fast vector highlighter：使用Lucene Fast Vector
highlighter，功能很强大，如果在mapping中对field开启了term_vector，并设置了with_positions_offsets，就会使用该highlighter，对内容特别长的文本（大于1MB）有性能上的优势。
例如：
PUT /music { "mappings": { "children": { "properties": { "name": { "type":
"text", "analyzer": "ik_max_word" }, "content": { "type": "text", "analyzer":
"ik_max_word", "term_vector" : "with_positions_offsets" } } } } }
一般情况下，用plain highlight也就足够了，不需要做其他额外的设置
如果对高亮的性能要求很高，可以尝试启用unified highlight
如果field的值特别大，超过了1M，那么可以用fast vector highlight

自定义高亮html标签

我们知道高亮的默认标签是，这个标签可以自己定义的，然后使用自己喜欢的样式：
GET /music/children/_search { "query": { "match": { "content": "Love" } },
"highlight": { "pre_tags": ["<tag1>"], "post_tags": ["</tag2>"], "fields": {
"content": { "type": "plain" } } } }
高亮片段fragment的设置

针对一些很长的文本，我们不可能在页面上完整显示的，我们需要只显示有关键词的上下文即可，这里设置fragment就行：
GET /_search { "query" : { "match": { "content": "friend" } }, "highlight" : {
"fields" : { "content" : {"fragment_size" : 150, "number_of_fragments" : 3,
"no_match_size": 150 } } } }
fragment_size: 设置要显示出来的fragment文本判断的长度，默认是100。

number_of_fragments：你可能你的高亮的fragment文本片段有多个片段，你可以指定就显示几个片段。

地理位置

现在基于地理位置的app层出不穷，支持地理位置的组件也有不少，Elasticsearch也不例外，并且ES可以把地理位置、全文搜索、结构化搜索和分析结合到一起，我们来看一下。

geo point数据类型

Elasticsearch基于地理位置的搜索，有一个专门的对象geo_point存储地理位置信息（经度，纬度），并且提供了一些基本的查询方法，如geo_bounding_box。

建立geo_point类型的mapping
PUT /location { "mappings": { "hotels": { "properties": { "location": {
"type": "geo_point" }, "content": { "type": "text" } } } } }
插入数据

推荐使用如下插入数据方式：
#latitude：维度,longitude：经度 PUT /location/hotels/1 { "content":"7days hotel",
"location": { "lon": 113.928619, "lat": 22.528091 } }
还有两种插入数据的方式，但特别容易搞混经纬度的位置，所以不是很推荐：
# location中括号内，前一个是经度，后一个是纬度 PUT /location/hotels/2 { "content":"7days hotel
", "location": [113.923567,22.523988] } # location中，前一个是纬度，后一个是经度 PUT
/location/hotels/3 { "text": "7days hotel Orient Sunseed Hotel", "location":
"22.521184, 113.914578" }
查询方法

geo_bounding_box查询，查询某个矩形的地理位置范围内的坐标点
GET /location/hotels/_search { "query": { "geo_bounding_box": { "location": {
"top_left":{ "lon": 112, "lat": 23 }, "bottom_right":{ "lon": 114, "lat": 21 }
} } } }
常见查询场景

geo_bounding_box方式
GET /location/hotels/_search { "query": { "bool": { "must": [ {"match_all":
{}} ], "filter": { "geo_bounding_box": { "location": { "top_left":{ "lon": 112,
"lat": 23 }, "bottom_right":{ "lon": 114, "lat": 21 } } } } } } }
geo_polygon方式,三个点组成的多边形（三角形）区域

支持多边形，只是这个过滤器使用代价很大，尽量少用。
GET /location/hotels/_search { "query": { "bool": { "must": [ {"match_all":
{}} ], "filter": { "geo_polygon": { "location": { "points": [ {"lon":
115,"lat": 23}, {"lon": 113,"lat": 25}, {"lon": 112,"lat": 21} ] } } } } } }
geo_distance方式

根据当前位置的距离进行搜索，非常实用
GET /location/hotels/_search { "query": { "bool": { "must": [ {"match_all":
{}} ], "filter": { "geo_distance": { "distance": 500, "location": { "lon":
113.911231, "lat": 22.523375 } } } } } }
按距离排序

根据当前位置进行条件搜索，会指定一个距离的上限，2km或5km，并且符合条件查询的结果显示与当前位置的距离（可以指定单位），并且按从近到远排序，这个是非常常用的场景。

请求示例：
GET /location/hotels/_search { "query": { "bool": { "must": [ {"match_all":
{}} ], "filter": { "geo_distance": { "distance": 2000, "location": { "lon":
113.911231, "lat": 22.523375 } } } } }, "sort": [ { "_geo_distance": {
"location": { "lon": 113.911231, "lat": 22.523375 }, "order": "asc", "unit":
"m", "distance_type": "plane" } } ] }
* filter.geo_distance.distance: 最大的距离，这里是2000m
* _geo_distance: 固定写法，下面为指定位置的经纬度
* order: 排序方式，asc或desc
* unit: 距离的单位，m/km都行
* distance_type: 计算距离的方式，sloppy_arc (默认值), arc (精准的) and plane (最快速的)
响应如下：
"hits": [ { "_index": "location", "_type": "hotels", "_id": "3", "_score":
null, "_source": { "text": "7days hotel Orient Sunseed Hotel", "location":
"22.521184, 113.914578" }, "sort": [ 421.35435857277366 ] }, { "_index":
"location", "_type": "hotels", "_id": "2", "_score": null, "_source": {
"content": "7days hotel", "location": [ 113.923567, 22.523988 ] }, "sort": [
1268.8952707727062 ] }
sort里面的内容，就是与当前位置的地面距离，单位是m。

统计我当前位置几个范围内酒店的数量

unit表示距离单位，常用的是mi和km。

distance_type表示计算距离的方式，sloppy_arc (默认值), arc (精准的) and plane (最快速的)。
GET /location/hotels/_search { "size": 0, "aggs": { "group_by_distance": {
"geo_distance": { "field": "location", "origin": { "lon": 113.911231, "lat":
22.523375 }, "unit": "mi", "distance_type": "arc", "ranges": [ {"from": 0,"to":
500}, {"from": 500,"to": 1500}, {"from": 150,"to": 2000} ] } } } }
小结

本篇简单介绍了一下搜索模板、映射模板、高亮搜索和地理位置的简单玩法，有些ES相关的项目做得比较深的，搜索模板和映射模板用处还是很大的。高亮搜索一般体现在浏览器搜索引擎上，地理位置的应用挺有意思，也可以参与到基于Location的APP应用当中。

专注Java高并发、分布式架构，更多技术干货分享与心得，请关注公众号：Java架构社区
可以扫左边二维码添加好友，邀请你加入Java架构社区微信群共同探讨技术

热门工具换一换