ElasticSearch基础入门 - 好文

ElasticSearch简写ES，ES是一个高扩展、开源的全文检索和分析引擎，它可以准实时地快速存储、搜索、分析海量的数据。

应用场景

* 我们常见的商城商品的搜索
* 日志分析系统（ELK）
* 基于大量数据（数千万的数据）需要快速调查、分析并且并将结果可视化的业务需求
安装并运行ES

Java环境安装

Elastic 需要 Java 8 环境。如果你的机器还没安装 Java，可以参考JAVA安装
<https://www.cnblogs.com/benjamin77/p/8460030.html>

ElasticSearch安装

安装完Java环境后，我们可以开始以下ElasticSearch安装或者根据官方文档
<https://www.elastic.co/guide/cn/elasticsearch/guide/current/running-elasticsearch.html>
安装
wget https://
artifacts.elastic.co/downloads/elasticsearch/elasticsearch-5.5.1.zip unzip
elasticsearch-5.5.1.zip cd elasticsearch-5.5.1/
进入解压目录之后，运行下面命令，启动ElasticSearch
./bin/elasticsearch
如果此时报以下错误

错误一
OpenJDK 64-Bit Server VM warning: If the number of processors is expected to
increase from one, thenyou should configure the number of parallel GC threads
appropriately using -XX:ParallelGCThreads=N
打开: elasticsearch-5.5.1/config/jvm.options

在末尾添加:
-XX:-AssumeMP
错误二
OpenJDK 64-Bit Server VM warning: INFO: os::commit_memory(0x0000000085330000,
2060255232, 0) failed; error='Cannot allocate memory' (errno=12)
先执行：
sysctl -w vm.max_map_count=262144
再打开elasticsearch-5.5.1/config/jvm.options
-Xmx512m -Xms512m
错误三
[2019-06-27T15:01:43,165][WARN ][o.e.b.ElasticsearchUncaughtExceptionHandler]
[] uncaught exceptionin thread [main]
org.elasticsearch.bootstrap.StartupException: java.lang.RuntimeException: can
not run elasticsearch as root
原因：elasticsearch自5版本之后，处于安全考虑，不允许使用root用户运行。

解决：创建一个普通用户，将elasticsearch 安装目录权限修改一下，切换至普通用户运行elasticsearch就可以了
useradd elk chown -R elk.elk /usr/local/share/applications/elasticsearch-5.5.1
su - elk cd /usr/local/share/applications/elasticsearch-5.5.1
重新启动
./bin/elasticsearch
如果一切正常，Elastic 就会在默认的9200端口运行。这时，打开另一个命令行窗口，请求该端口，会得到说明信息。
$ curl 'localhost:9200' { "name" : "cWyaT72", "cluster_name" : "elasticsearch",
"cluster_uuid" : "A7akNm1SRw2Gm-BdSBkdaw", "version" : { "number" : "5.5.1", "
build_hash" : "19c13d0", "build_date" : "2017-07-18T20:44:24.823Z", "
build_snapshot" : false, "lucene_version" : "6.6.0" }, "tagline" : "You Know,
for Search" }
访问配置

Elastic 默认情况下，只允许本地访问，如果需要远程访问，可以修改 config/elasticsearch.yml文件，去掉network.host
的注释，将它的值改成0.0.0.0，然后重新启动 Elastic。
network.host: 0.0.0.0
上面代码中，设成0.0.0.0让任何人都可以访问。线上服务不要这样设置，要设成具体的 IP。

基本概念

Node 与 Cluster

Elastic本质上是一个分布式数据库，允许多台服务器协同工作，每台服务器可以运行多个 Elastic 实例。

单个 Elastic 实例称为一个节点（node）。一组节点构成一个集群（cluster）。

查看Cluster Health
curl -X GET 'http://localhost:9200/_cat/health?v'
获取集群的所有节点
curl -X GET 'http://localhost:9200/_cat/nodes?v'
Index

Elastic会索引所有字段，经过处理后写入一个反向索引（Inverted Index）。查找数据的时候，直接查找该索引。(一个 Index
类似于传统关系数据库中的一个数据库，是一个存储关系型文档的地方）。

所以，Elastic 数据管理的顶层单位就叫做 Index（索引）。它是单个数据库的同义词。每个 Index （即数据库）的名字必须是小写。

下面的命令可以查看当前节点的所有 Index。
curl -X GET 'http://localhost:9200/_cat/indices?v'
Document

Index里的单条记录称为Document，多条Document构成一个Index.

Document使用JSON格式表示，如：
{ "goods_name": "空调", "category_name": "家电分类", "price": "3999.00" }
同一个 Index 里面的 Document，不要求有相同的结构（scheme），但是最好保持相同，这样有利于提高搜索效率。

Type

Document是可以分组的，如goods_list这个Index ，可以按照category（家电、衣服）分类，也可以按照price（>1000、
<1000）分类。这种分组叫Type它是虚拟的逻辑分组，用于过滤Document。

列出每个Index下面的Type
curl 'http://localhost:9200/_mapping?pretty=true'
根据规划
<https://www.elastic.co/blog/index-type-parent-child-join-now-future-in-elasticsearch>
，Elastic 6.x 版只允许每个 Index 包含一个 Type，7.x 版将会彻底移除 Type。

Index操作

新建（Create Index）

新建 Index，可以直接向 Elastic服务器发出 PUT 请求。下面的例子是新建一个名叫goods_list的 Index。
curl -X PUT 'http://localhost:9200/goods_list'
服务器返回一个 JSON 对象，里面的acknowledged字段表示操作成功。
{ "acknowledged": true, "shards_acknowledged": true }
删除（Delete Index）
curl -X DELETE 'http://localhost:9200/goods_list' { "acknowledged": true }
数据操作

上面介绍了Index和Type的一些基本的概念和Index的基本操作，现在先来创建一个完整的Index结构，并对数据进行操作。

新建Index结构
curl -X PUT 'localhost:9200/goods_list' -d ' { "mappings": { "goods_info": { "
properties": { "goods_name": { "type": "keyword" }, "category_name": { "type": "
keyword" }, "price": { "type": "float" } } } } } ' { "acknowledged": true }
执行上面命名，重新创建一个新的Index

新增记录

向指定的 /Index/Type 发送 PUT 请求，就可以在 Index 里面新增一条记录。比如，向/goods_list/goods_info
发送请求，就可以新增一条商品记录。
curl -X PUT 'localhost:9200/goods_list/goods_info/1' -d ' { "goods_name": "
华为笔记本", "category_name": "计算机", "price": "1000" }'
服务器返回的 JSON 对象，会给出 Index、Type、Id、Version 等信息：
{ "_index": "goods_list", "_type": "goods_info", "_id": "1", "_version": 1, "
result": "created", "_shards": { "total": 2, "successful": 1, "failed": 0 }, "
created": true }
相信细心的你会发现/goods_list/goods_info/1，后面多了一个1，这个1是该条记录的 ID。可以是任意字符串

新增记录的时候，也可以不指定 Id，这时要改成 POST 请求。
curl -X POST 'localhost:9200/goods_list/goods_info' -d ' { "goods_name": "洗衣机",
"category_name": "家电", "price": "899.99" }'
如果没有指定ID，那么Elastic会随机生成一串字符串作为ID
{ "_index": "goods_list", "_type": "goods_info", "_id": "AWub5f7FFq1D5epJJhqT",
"_version": 1, "result": "created", "_shards": { "total": 2, "successful": 1, "
failed": 0 }, "created": true }
查看记录
curl 'localhost:9200/goods_list/goods_info/1?pretty=true'
上面代码请求查看/goods_list/goods_info/1这条记录，URL 的参数pretty=true表示以易读的格式返回。

返回的数据中，found字段表示查询成功，_source字段返回原始记录：
{ "_index" : "goods_list", "_type" : "goods_info", "_id" : "1", "_version" : 1,
"found" : true, "_source" : { "goods_name" : "华为笔记本", "category_name" : "计算机", "
price" : "1000" } }
如果 ID不正确，就查不到数据，found字段就是false。
curl 'localhost:9200/goods_list/goods_info/2?pretty=true'
ID=2并不存在，所以会返回以下结果：
{ "_index" : "goods_list", "_type" : "goods_info", "_id" : "2", "found" : false
}
删除记录
curl -X DELETE 'localhost:9200/goods_list/goods_info/1'
PS：这里先不要删除这条记录，后面还要用到。

更新记录
curl -X PUT 'localhost:9200/goods_list/goods_info/1' -d ' { "user" : "华为笔记本", "
title" : "计算机", "desc" : "5000" }'

更新记录就是使用 PUT 请求，重新发送一次数据。
{ "_index": "goods_list", "_type": "goods_info", "_id": "1", "_version": 2, "
result": "updated", "_shards": { "total": 2, "successful": 1, "failed": 0 }, "
created": false }

返回结果里面，有几个字段发生了变化：
"_version" : 2, "result" : "updated", "created" : false

数据查询

返回所有记录
curl 'localhost:9200/goods_list/goods_info/_search' { "took": 127, "timed_out":
false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total
": 2, "max_score": 1, "hits": [ { "_index": "goods_list", "_type": "goods_info",
"_id": "AWub5f7FFq1D5epJJhqT", "_score": 1, "_source": { "goods_name": "洗衣机", "
category_name": "家电", "price": "899.99" } }, { "_index": "goods_list", "_type":
"goods_info", "_id": "1", "_score": 1, "_source": { "user": "华为笔记本", "title": "
计算机", "desc": "5000" } } ] } }
上面代码中，返回结果的 took字段表示该操作的耗时（单位为毫秒），timed_out字段表示是否超时，hits字段表示命中的记录，里面子字段的含义如下：

* total：返回记录数，本例是2条。
* max_score：最高的匹配程度，本例是1.0。
* hits：返回的记录组成的数组。
返回的记录中，每条记录都有一个_score字段，表示匹配的程序，默认是按照这个字段降序排列。

总结

这里主要介绍了Elastic的安装、基本概念以及数据的基本操作，在下一章带来Elastic的分词和全文搜索以及相关的技术点。

原文地址

https://github.com/WilburXu/b...
<https://github.com/WilburXu/blog/blob/master/ElasticSearch/ElasticSearch%E5%9F%BA%E7%A1%80.md>

热门工具换一换