중개 플랫폼 서비스 : ElasticSearch

ElasticSearch 개요
ElasticSearch 설치

CentOS에서 ElasticSearch 설치

ElasticSearch 설치
ElasticSearch 로드밸런서 설치
환경 설정
Service로 실행

Windows에서 ElasticSearch 설치

ElasticSearch 설정

ElasticSearch 폴더 구조
elasticsearch.yml 설정
Schema 설계

Module과 Service
Plugin
Java 개발 환경 구성

ElasticSearch Java 환경 구성
Lucene Java 환경 구성
Arirang Java 환경 구성

REST API
JAVA API

Client
index java api
get java api

관리자 매뉴얼

사전 구성
elasticsearch.yml
오류 처리

참고 문헌

Lucene을 바탕으로 개발한 분산 검색엔진인 ElasticSearch를 정리 합니다.

홈페이지 : http://www.elasticsearch.org/, http://elasticsearch.kr/
- 매뉴얼 : http://www.elasticsearch.org/guide/
- 한글 형태소 분석기 : https://github.com/chanil1218/elasticsearch-analysis-korean
- 한국 유저 커뮤니티 : https://www.facebook.com/groups/elasticsearch.kr/
다운로드 : http://www.elasticsearch.org/download/, https://github.com/elasticsearch/elasticsearch/
- Arirang 다운로드 : https://lucenekorean.svn.sourceforge.net/svnroot/lucenekorean/
라이선스 : Apache 2.0
플랫폼 : Java

ElasticSearch 개요

[[Lucene|Lucene]]은 널리 알려진 [[Java|Java]] 기반의 오픈소스 검색 엔진 라이브러리 입니다. 많은 곳에서 사용 되고 있지만 라이브러리 형태라 사용에 불편함이 있고 [[BigData|BigData]] 시대를 맞아 분산 환경을 지원하지 않아 새로운 대안 솔루션이 필요하게 되었습니다. 오픈소스 진영에서는 분산 환경을 지원하는 Solr와 [http://www.elasticsearch.org/ ElasticSearch]가 Lucene 기반으로 작성이 되었습니다. [http://www.elasticsearch.org/ ElasticSearch]는 RESTful API를 지원하는 특성으로 인하여 여러 환경으로 포팅이 될 수 있어서 사용이 편리한 분산 검색 엔진 입니다.

ElasticSearch의 특징

실시간 검색 및 분석
분산 구성 및 병렬 처리
index (Database)와 Type (Table)을 사용하여 다양한 문서 처리
JSON을 사용하는 RESTful API 지원
Plugin 방식의 기능 확장

ElasticSearch 용어

{| cellspacing="0" cellpadding="2" border="1" width="100%" bgcolor="#FFFFFF" align="center"
|- | width="30%" align="center" valign="middle" style="background-color: rgb(204, 204, 204);" | 용어

Node의 집합으로 유일한 이름을 가짐

|-
| align="center" valign="middle" | Node |

Cluster를 이루는 물리적인 서버

|-
| align="center" valign="middle" | Index
(indice)

유사한 특징을 가진 문서들의 모음으로 DBMS에서 '''데이터베이스'''와 유사한 개념
Term, Count, Docs로 구성

|-
| align="center" valign="middle" | Shard |

Index의 subset 개념으로 Lucene을 사용하여 구성
실제 데이터와 색인을 저장하고 있으며 Primary Shard와 Replica Shard로 분류
Primary Shard : Shard를 구성하는 기본 인덱스
Replica Shard : 분산된 다른 node에 저장된 Primary Shard의 복제본
*서비스 장애시 서비스의 영속성 보장

|-
| align="center" valign="middle" | Type
(Document Type)

|

데이터 (Document)의 종류로 index 내에서의 논리적인 category/partition
DBMS에서 '''테이블'''과 유사한 개념

|-
| align="center" valign="middle" | Mapping |

DBMS에서 '''테이블 스키마'''와 유사한 개념

|-
| align="center" valign="middle" | Route |

색인 필드 중 unique key에 해당하는 값을 routing path로 지정한 후, 이 path를 사용하여 인덱싱과 검색에 사용할 shard를 지정하여 성능할 향상할 수 있습니다.
Routing Field : 스토어 옵션을 yes로 index not_analyzed로 설정

|-
| align="center" valign="middle" | Document |

ElasticSearch에서 관리하는 기본적인 데이터(정보)의 저장 단위
JSON (JavaScript Object Notaion)으로 표현
DBMS에서 '''레코드'''와 유사한 개념

|-
| align="center" valign="middle" | Field |

Document를 구성하고 있는 항목으로 name과 value로 구성
DBMS에서 '''컬럼'''과 유사한 개념

|-
| align="center" valign="middle" | Gateway |

Cluster 상태, Index 설정 등의 정보를 저장

|-
| align="center" valign="middle" | Query |

검색어

|-
| align="center" valign="middle" | TermQuery |

검색어의 종류

|-
| align="center" valign="middle" | Term |

검색어의 항목

|-
| align="center" valign="middle" | Token |

검색어의 항목을 구성하는 요소

'''ElasticSearch의 개념적 구성도'''

[[File:ElasticSearch.png|700px|ElasticSearch.png]]
파일:LuceneIndex01.png

ElasticSearch Architecture

700px|ElasticSearchArchitecture.png

_index : index 이름
_type : type 이름
_id : Document ID
_score
_source : Document 저장
properties
- 필드명 (field)
  - type : string

'''ElasticSearch 관련 오픈소스'''

[700px|ElasticSearch Environment.png](File:ElasticSearch Environment.png.md)

ElasticSearch 설치

CentOS에서 ElasticSearch 설치

ElasticSearch 설치

ElasticSearch 설치
- JDK 1.7 이상 필요

 cd install
 wget [https://download.elasticsearch.org/elasticsearch/elasticsearch/elasticsearch-1.3.2.tar.gz](https://download.elasticsearch.org/elasticsearch/elasticsearch/elasticsearch-1.3.2.tar.gz)
 tar -xvzf elasticsearch-1.3.2.tar.gz
 chown -R hduser:hdgroup elasticsearch-1.3.2
 mv elasticsearch-1.3.2 /nas/appl/elasticsearch

환경 설정
- vi ~hduser/.bash_profile

 ### ----------------------------------------------------------------------------
 ###     ELASTICSEARCH 설정
 ### ----------------------------------------------------------------------------
 export ELASTICSEARCH_HOME=/nas/appl/elasticsearch
 export PATH=$PATH:$ELASTICSEARCH_HOME/bin

ElasticSearch 환경 설정
- 데이터와 로그 폴더 생성

 cd /nas/appl/elasticsearch
 mkdir data
 mkdir logs
 chown hduser:hdgroup data logs

vi /nas/appl/elasticsearch/config/elasticsearch.yml

 cluster.name: elasticsearch
 node.name: "node201"
 path.data: /nas/appl/elasticsearch/data
 path.logs: /nas/appl/elasticsearch/logs
 discovery.zen.ping.multicast.enabled: false
 discovery.zen.ping.unicast.hosts: ["node201:9200"]("node201:9200".md)
 bootstrap.mlockall: true

서비스 실행 및 확인

 su - hduser
 elasticsearch                                 #--- Foreground로 실행
 elasticsearch -d                              #--- Daemon으로 실행
 
 curl localhost:9200                           #--- 서비스 확인
 [http://node201.hadoop.com:9200/](http://node201.hadoop.com:9200/)               #--- 서비스 확인
 [http://node201.hadoop.com:9200/_status](http://node201.hadoop.com:9200/_status)
 [http://node201.hadoop.com:9200/_plugin/head/](http://node201.hadoop.com:9200/_plugin/head/)  #--- elasticsearch-head plugin이 설치된 경우

ElasticSearch 로드밸런서 설치

ElasticSearch 로드밸런서 환경 설정
- vi /nas/appl/elasticsearch/config/elasticsearch.yml

 node.master: false
 node.data: false
 network.bind_host: 192.168.0.1

로드밸런서용 plugin 설치

 plugin -install mobz/elasticsearch-head
 plugin -install lukas-vlcek/bigdesk

환경 설정

환경 변수
- bin/elasticsearch 환경변수
- JAVA_OPTS
- ES_JAVA_OPTS, ES_HEAP_SIZE
- ES_MIN_MEM=256m, ES_MAX_MEM=1gb
환경 설정 방법
- 환경 설정 파일로 설정

 vi /nas/appl/elasticsearch/config/elasticsearch.yml
 index:
   store:
     type: memory

명령행 옵션으로 설정

 elasticsearch -Des.index.store.type=memory

REST API로 설정

 curl -XPUT 'node201.hadoop.com:9200/customer/ -d '
 index:
   store:
     type: memory
 '

file descriptors 확인
- max_file_descriptors

 curl 'node201.hadoop.com:9200/_nodes/process?pretty'

memory settings : disable swap
- 한번만 적용

 swapoff -a

항상 적용

 vi /etc/fstab
 #--- swap을 주석 처리

ElasticSearch 설정으로 처리

 ulimit -l unlimited            #--- root 사용자로 실행
 mkdir /tmp/tmpJna
 vi config/elasticsearch.yml
  bootstrap.mlockall: true
 elasticsearch -Djna.tmpdir=/tmp/tmpJna

Service로 실행

환경 설정 변수
- ES_USER, ES_GROUP
- ES_HEAP_SIZE, ES_HEAP_NEWSIZE, ES_DIRECT_SIZE
- MAX_OPEN_FILES
- MAX_LOCKED_MEMORY, MAX_MAP_COUNT
- LOG_DIR, DATA_DIR, WORK_DIR
- CONF_DIR, CONF_FILE
- ES_JAVA_OPTS, RESTART_ON_UPGRADE

 #--- /etc/init.d/elasticsearch
 #--- /etc/sysconfig/elasticsearch
 /sbin/chkconfig --add elasticsearch

Windows에서 ElasticSearch 설치

'''ElasticSearch 설치'''

ElasticSearch는 JDK 7 이상에서 실행되는 Java 기반의 애플리케이션으로 별도의 설치 과정 없이 소스를 다운로드 받아 실행하면 됩니다. 다운로드 사이트(http://www.elasticsearch.org/download/)에서 최신 버전(elasticsearch-1.3.2.zip)의 ElasticeSearch를 다운로드 합니다. 압축을 풀어 c:/appl/elasticsearch/ 폴더를 생성 합니다.

'''실행 및 확인'''

bin/ 폴더에서 elasticsearch.bat 파일을 실행 합니다.

브라우저에서 http://localhost:9200/ 로 접속하여 확인 합니다.

http://www.jopenbusiness.com/mediawiki/images/e/e1/ElasticSearch_Install_Windows_001.png

Cluster 정보 확인

http://localhost:9200/_cluster/health?pretty=true

Node 정보 확인

ElasticSearch 설정

ElasticSearch 폴더 구조

{| border="1" cellspacing="0" cellpadding="2" style="width: 100%;"
|- | style="text-align: center; background-color: rgb(204, 204, 204);" | 폴더 | style="text-align: center; background-color: rgb(204, 204, 204);" | 설정 변수

| style="text-align: center; background-color: rgb(204, 204, 204);" | 상세
|- | style="text-align: center;" | bin | style="text-align: center;" |

| 윈도우용 실행 파일

elasticsearch.bat : ElasticSearch 실행 프로그램
service.bat : Service 형태로 ElasticSearch 실행
service.bat install | remove | start | stop | manager SERVICE_ID
plugin.bat : 플러그인 설치 프로그램 (org.elasticsearch.plugins.PluginManager 프로그램이 실행됨)

Linux용 실행 파일

elasticsearch : ElasticSearch 실행 프로그램
plugin : 플러그인 설치 프로그램 (org.elasticsearch.plugins.PluginManager 프로그램이 실행됨)

플러그인명/ : Plugin 설치 파일의 bin/ 폴더가 여기로 이동됨

elasticsearch.yml : ElasticSearch 설정 파일
*path.plugins : 플러그인 설치 폴더 (Default. plugins/)
logging.yml : 로그 설정 파일

플러그인명/ : Plugin 설치 파일의 config/ 폴더가 여기로 이동됨

path.data: /path/to/data1,/path/to/data2

|-
| style="text-align: center;" | lib |

| ElasticSearch용 라이브러리

Lucene 검색 엔진 라이브러리
Sigar 라이브러리 : CPU, Memory, Disk 등을 모니터링

_site/ : http://node111.jopenbusiness.com:9200/_plugin/head/ URL로 호출

path.home : ElasticSearch가 설치된 폴더를 지정하는 설정 변수

elasticsearch.yml 설정

YAML 문법에 따라 elasticsearch.yml 파일에서 설정 변수를 구성 합니다.

{| border="1" cellspacing="0" cellpadding="2" style="width: 100%;"
|- | style="text-align: center; background-color: rgb(204, 204, 204);" | 설정 변수 | style="text-align: center; background-color: rgb(204, 204, 204);" | Default

|-
| style="text-align: center;" | node.master

node.data

node.client

| style="text-align: center;" |
true

true

false

|
Node 종류

Master node
*node.master: true
*Cluster와 Node의 상태 정보를 관리
*Index와 Shard의 조정자 역할
Data node
*node.data: true
*색인 데이터를 저장
Load Balance node
*node.master: false, node.data: false
*검색 요청을 받아 분산 처리
Client node
*node.client: true, node.master: false
*Master node로 사용하지 않고 Client node로 사용하고자 할 경우

Shard 개수
style="text-align: center;"
style="text-align: center;"
Replica 개수
-
style="text-align: center;"
style="text-align: center;"
http 서비스를 활성화 합니다.
-
style="text-align: center;"
style="text-align: center;"
http 서비스에서 사용하는 port
-
style="text-align: center;"
style="text-align: center;"
netty의 Transport에서 사용하는 port
-
style="text-align: center;"
style="text-align: center;"
true이면 Transport에서 압축 허용
-
style="text-align: center;"
style="text-align: center;"
Client의 요청을 접수할 IP 주소
-
style="text-align: center;"
style="text-align: center;"
ElasticSearch Node의 IP 주소
-
style="text-align: center;"
style="text-align: center;"

Cluster의 메타 정보와 Index 설정, Mapping 정보 등을 어디서 관리할 것인지 지정

Gateway 종류

local
shared fs
hadoop
s3

최소 Master node 개수 (2개 이상 권장)
style="text-align: center;"
style="text-align: center;"

-
style="text-align: center;"
style="text-align: center;"

-
style="text-align: center;"
style="text-align: center;"
Unicast 사용시 검색할 서버와 포트 예) "host2:port"
}

Schema 설계

curl -XPUT "http://localhost:9200/aaa?pretty=true" -d @aaa.json

{| border="1" cellspacing="0" cellpadding="2" style="width: 100%;"
|- | style="text-align: center; background-color: rgb(241, 241, 241);" | 단계

Entity와 Entity Field 정의
Entity간의 관계 정의

|-
| style="text-align: center;" | Field 정의 |

검색 필드 정의
통합 검색 필드 정의
정렬 필드 정의
패싯 필드 정의
강조 필드 정의

Module과 Service

'''ElasticSearch 실행 순서'''

ElasticSearch
Bootstrap
org.elasticsearch.node.NodeBuilder
org.elasticsearch.node.internal.InternalNode : 생성자 -> start()
*PluginsService 생성
**Site Plugin 로딩
**JVM Plugin 등록
***onModuleReferences에 onModule(AnyModule module) 함수 등록
*Module 등록 : 여기외에 Service 등에서도 Module을 등록할 수 있음
**Google Guice의 DI (Dependency Injection) 사용
***org.elasticsearch.common.inject.Module
***org.elasticsearch.common.inject.AbstractModule
**
**Version, PageCacheRecyclerModule, CircuitBreakerModule, BigArraysModule, PluginsModule
**SettingsModule, NodeModule, NetworkModule, ScriptModule, EnvironmentModule
**NodeEnvironmentModule, ClusterNameModule, ThreadPoolModule, discoveryModule, ClusterModule
**RestModule, TransportModule, HttpServerModule, RiversModule, IndicesModule
**SearchModule, ActionModule, MonitorModule, GatewayModule, NodeClientModule
**BulkUdpModule, ShapeModule, PercolatorModule, ResourceWatcherModule, RepositoriesModule
**TribeModule, BenchmarkModule
*Service 시작 <- LifecycleComponent (start(), stop(), close(), lifecycleState())
**AllocationService, Discovery
**PluginsService를 사용하여 Plugin에 등록된 Service 시작
**MappingUpdatedAction, IndicesService, IndexingMemoryController, IndicesClusterStateService, IndicesTTLService
**RiversManager, SnapshotsService, ClusterService, RoutingService, SearchService
**MonitorService, RestController, TransportService, DiscoveryService, GatewayService
**HttpServer, BulkUdpService, ResourceWatcherService, TribeService

Plugin

[ElasticSearch - Plugin](ElasticSearch - Plugin.md)

Java 개발 환경 구성

ElasticSearch Java 환경 구성

ElasticSearch 다운로드 사이트에서 elasticsearch-1.2.1.zip 파일을 다운로드 합니다.
- lib/elasticsearch-1.2.1.jar
ElasticSearch github 사이트에서 elasticsearch-master.zip 파일을 다운로드 합니다.
- src/main/java/ 폴더 아래의 소스 파일을 사용 합니다.

Lucene Java 환경 구성

Lucene 사이트에서 "DOWNLOAD" 버튼을 눌러 lucene-4.8.1.zip 파일을 다운로드 합니다.
- core/lucene-core-4.8.1.jar
Lucene 사이트에서 "DOWNLOAD" 버튼을 눌러 lucene-4.8.1-src.tgz 파일을 다운로드 합니다.
- core/src/java/ 폴더 아래의 소스 파일을 사용 합니다.

Arirang Java 환경 구성

SVN 저장소에서 소스를 다운로드 합니다.
- arirang.morph 소스를 먼저 받아 mvn install 진행
http://cafe.naver.com/korlucene/1102
http://svn.apache.org/repos/asf/lucene/dev/branches/lucene4956/lucene/analysis/arirang/
사전 구성 및 사용법
- http://cafe.naver.com/korlucene/6
- http://cafe.naver.com/korlucene/877

[Arirang 사전.zip](File:Arirang 사전.zip.md) [http://www.jopenbusiness.com/mediawiki/images/5/54/Arirang_사전.zip http://www.jopenbusiness.com/mediawiki/images/5/54/Arirang_사전.zip]

REST API

[ElasticSearch - REST API](ElasticSearch - REST API.md)

JAVA API

Client

 import static org.elasticsearch.node.NodeBuilder.nodeBuilder; 
 
 import org.elasticsearch.client.Client;
 import org.elasticsearch.client.transport.TransportClient;
 import org.elasticsearch.common.settings.ImmutableSettings;
 import org.elasticsearch.common.settings.Settings;
 import org.elasticsearch.common.transport.InetSocketTransportAddress;
 import org.elasticsearch.node.Node;
 
 	private Boolean getTransportClient() {
 		Settings settings = null;
 		
 		settings = ImmutableSettings.settingsBuilder().put("cluster.name", CLUSTER_NAME).build();
 		client = new TransportClient(settings).addTransportAddress(new InetSocketTransportAddress(HOST, PORT));
 		return true; 
 	}
 	
 	//--- elasticsearch.yml
 	//---   cluster.name=~
 	private Boolean getNodeClient() {
 		node = nodeBuilder().clusterName(CLUSTER_NAME).client(true).local(true).node();
 		client = node.client();
 		return true;
 	}

index java api

 import static org.elasticsearch.common.xcontent.XContentFactory.jsonBuilder;
 
 import org.elasticsearch.action.index.IndexResponse; 
 
 json = jsonBuilder().startObject()
         .field("name", "value")
         .endObject().string();
 res = client.prepareIndex("index", "type", "id").setSource(json).execute().actionGet();
 
 UtilLogger.info.print(logCaller, "_index : " + res.getIndex());
 UtilLogger.info.print(logCaller, "_type : " + res.getType());
 UtilLogger.info.print(logCaller, "_id : " + res.getId());
 UtilLogger.info.print(logCaller, "_version : " + res.getVersion());
 UtilLogger.info.print(logCaller, "_index : " + res.getIndex());

get java api

관리자 매뉴얼

사전 구성

국어 사전
온라인 사전을 통해서 주기적으로 갱신
위키에 등록된 명사로 주기적으로 갱신

elasticsearch.yml

index.query.bool.max_clause_count

오류 처리

Heap 메모리 부족시
- vi /nas/appl/elasticsearch/bin/elasticsearch.in.sh

 #ES_MIN_MEM=256m
 #ES_MAX_MEM=1g
 
 ES_MIN_MEM=4g
 ES_MAX_MEM=4g

많은 Client에서 접속하여, 파일 개수 부족으로 오류 발생시
- 오류 메시지

 org.elasticsearch.common.netty.channel.ChannelException: Failed to create a selector.
 Caused by: java.io.IOException: Too many open files

조치 방법

 ulimit -n
 vi  /etc/security/limits.conf
    hduser soft nofile 999999
    hduser hard nofile 999999

참고 문헌

[ElasticSearch - Plugin](ElasticSearch - Plugin.md)
[ElasticSearch - REST API](ElasticSearch - REST API.md)
한글 오픈데이터 플랫폼
RegExp
Nutch
[[Lucene|Lucene]] / Solr
Sigar
Kibana : ElasticSearch의 데이터로 대시보드를 생성
fluentd
http://guruble.wordpress.com/tag/elasticsearch/
http://www.youtube.com/watch?v=6qpVJPNEkWc
https://www.found.no/tag/Elasticsearch/
- https://www.found.no/search/#search/query=ElasticSearch
- https://www.found.no/foundation/elasticsearch-internals/
- Guice
- Dependency injection
- Dependency Injection Demystified, 2006.03
http://jjeong.tistory.com/
- Elasticsearch: lucene arirang analyzer plugin, 2014.04
- lucene: arirang maven build 하기, 2014.04
- Elasticsearch: Plugins - site 플러그인과 custom analyzer 플러그인 만들기, 2013.04
- elasticsearch 한국어 형태소분석기 분석의 이해, 2013.01
http://cafe.naver.com/korlucene
Helloworld naver
- http://helloworld.naver.com/helloworld/645609
- elasticsearch로 로그 검색 시스템 만들기, 2013.02
21세기 세종계획
Statistical Natural Language Processing
- POSTAG_SEJONG/K
http://elasticsearch-kr.github.io
https://github.com/imotov/elasticsearch-facet-script
https://www.found.no/foundation/writing-a-plugin/
http://en.wikipedia.org/wiki/ElasticSearch
로그 파일에 대해 Elasticsearch 사용하기, 2012.10
elasticsearch (검색엔진) 설치 – 한글형태소분석기 적용, 2012.12
ElasticSearch 설치 및 샘플 사용기, 2012.02
elasticsearch로 로그 검색 시스템 만들기, 2013.02
Install ElasticSearch on CentOS 6
elasticsearch cluster 설치 + 한글형태소분석기, 2012.12
http://linuxism.tistory.com/1554
ElasticSearch (http://guruble.wordpress.com/tag/elasticsearch/)
- Elasticsearch - 1. 시작하기
- Elasticsearch - 2. Shard & Replica
- Elasticsearch - 3. Node Discovery
http://blog.naver.com/PostView.nhn?blogId=sung487&logNo=10164948506
MeCab (C++로 작성)
- https://bitbucket.org/eunjeon/mecab-ko-lucene-analyzer/raw/master/elasticsearch-analysis-mecab-ko/ (최신)
- https://github.com/bibreen/mecab-ko-lucene-analyzer (예전 버전)
  - https://github.com/bibreen/mecab-ko-lucene-analyzer/tree/master/elasticsearch-analysis-mecab-ko
http://www.acornpub.co.kr/book/elasticsearch-server

[[Category:Search|Category:Search]]
분류: BigData

최종 수정일: 2024-09-30 12:26:18