当前位置：首页 > 日记本 > 正文内容

elasticsearch各个分词效果测试

zhangchap1年前 (2023-04-01)日记本138

from elasticsearch import Elasticsearch

es = Elasticsearch()

text = "10万左右口碑最好的车 "

# 使用 Elasticsearch 的 standard 分词器分析文本
tokens = es.indices.analyze(index="new_cars",body={'text': text, 'analyzer': 'standard'})

print("使用 standard 分词器分析文本：")
for token in tokens['tokens']:
    print(token['token'])

# 使用 Elasticsearch 的 ik_max_word 分词器分析文本
tokens = es.indices.analyze(index="new_cars",body={'text': text, 'analyzer': 'ik_max_word'})

print("\n使用 ik_max_word 分词器分析文本：")
for token in tokens['tokens']:
    print(token['token'])

# 使用 Elasticsearch 的 ik_smart 分词器分析文本
tokens = es.indices.analyze(index="new_cars",body={'text': text, 'analyzer': 'ik_smart'})

print("\n使用 ik_smart 分词器分析文本：")
for token in tokens['tokens']:
    print(token['token'])

分享给朋友：

返回列表

上一篇：elasticsearch老数据库新建索引python代码

下一篇：新建个mysql数据库并加索引sql语句

相关文章

python使用mongodb数据库

from pymongo import MongoClient,collection class KSpdier(Thread): ...

python读取txt文件放到Queue队列

from queue import Queue with open('kw.txt',encoding='utf-8')&nb...

python md5生成

from hashlib import md5 md5_hash = md5(title.encode('utf-8')).hexd...

python下elasticsearch搜索接口介绍

# elasticsearch 默认算法bm25 from elasticsearch import Elasticsearch import&n...

pythonstr.format()详解格式化字符串介绍

前序：format是python2.6新增的一个格式化字符串的方法，相对于老版的%格式方法，它有很多优点。不需要理会数据类型的问题，在%方法中%s只能替代字符串类...

python chardet模块自动识别编码

import chardet str = b'Hello word' str1 = '你好，世界。'.e...

发表评论

最顶级的能力是屏蔽力，任何消耗你的人和事，多看一眼都是你的不对。

人生最大的代价不是金钱，而是你走过的弯路，
人生最大的成本不是金钱，而是你的时间和精力，
机遇一旦错过就可能是一生。
Copyright zhenglia.com Rights Reserved.
挣俩网张涛与你共勉：当你的才华还撑不起你的野心的时候，你就应该静下心来学习。当你的能力还驾驭不了你的目标的时候，你就应该沉下心来历练。问问自己，想要怎样的人生。
分享学习(python、优化)的点点滴滴

Powered By Z-BlogPHP. Theme by TOYEAN.