elasticsearch使用python创建或更新文档
问题描述:
我正在使用elasticsearch-py进行弹性搜索操作。elasticsearch使用python创建或更新文档
我在尝试使用elasticsearch.helpers.bulk
来创建或更新多个记录。
from elasticsearch import Elasticsearch
from elasticsearch import helpers
es = Elasticsearch()
data = [
{
"_index": "customer",
"_type": "external",
"_op_type": "create",
"_id": 3,
"doc" : {"name": "test"}
},
{
"_index": "customer",
"_type": "external",
"_op_type": "create",
"_id": 4,
"doc" : {"name": "test"}
},
{
"_index": "customer",
"_type": "external",
"_op_type": "create",
"_id": 5,
"doc" : {"name": "test"}
},
{
"_index": "customer",
"_type": "external",
"_op_type": "create",
"_id": 6,
"doc" : {"name": "test"}
},
]
print helpers.bulk(es, data)
是否有任何方法可以执行此操作?
现在我们只能给_op_type
作为create
或update
。如果我们给update
并且记录不存在,那么它会引发错误。
Traceback (most recent call last):
File "/tmp/test.py", line 37, in <module>
print helpers.bulk(es, data)
File "/local/lib/python2.7/site-packages/elasticsearch/helpers/__init__.py", line 182, in bulk
for ok, item in streaming_bulk(client, actions, **kwargs):
File "/local/lib/python2.7/site-packages/elasticsearch/helpers/__init__.py", line 155, in streaming_bulk
raise BulkIndexError('%i document(s) failed to index.' % len(errors), errors)
elasticsearch.helpers.BulkIndexError: ('4 document(s) failed to index.', [{u'update': {u'status': 404, u'_type': u'external', u'_id': u'3', u'error': u'DocumentMissingException[[customer][-1] [external][3]: document missing]', u'_index': u'customer'}}, {u'update': {u'status': 404, u'_type': u'external', u'_id': u'4', u'error': u'DocumentMissingException[[customer][-1] [external][4]: document missing]', u'_index': u'customer'}}, {u'update': {u'status': 404, u'_type': u'external', u'_id': u'5', u'error': u'DocumentMissingException[[customer][-1] [external][5]: document missing]', u'_index': u'customer'}}, {u'update': {u'status': 404, u'_type': u'external', u'_id': u'6', u'error': u'DocumentMissingException[[customer][-1] [external][6]: document missing]', u'_index': u'customer'}}])
答
按照_bulk
endpoint文档,你可以和应该使用这个index
行动,提供您的文档始终具有相同的标识符。
create
在第一次创建文档时很有用,而update
更适合做部分和/或脚本更新。
您也可以根本不指定任何_op_type
,并且index
将默认采用。
答
我尝试了@Val建议的解决方案,它用作魅力。
from elasticsearch import Elasticsearch
from elasticsearch import helpers
es = Elasticsearch()
data = [
{
"_index": "customer",
"_type": "external",
"_id": 3,
"doc" : {"name": "test"}
},
{
"_index": "customer",
"_type": "external",
"_id": 4,
"doc" : {"name": "test"}
},
{
"_index": "customer",
"_type": "external",
"_id": 5,
"doc" : {"name": "test"}
},
{
"_index": "customer",
"_type": "external",
"_id": 6,
"doc" : {"name": "test"}
},
]
print helpers.bulk(es, data)
你试过用'index'作为op_type而不是'create'和'update'吗? – Val
@Val,根据'helpers.bulk'文件,我们必须给'index',我也试过你的解决方案,它给出'ValidationError','elasticsearch.exceptions.TransportError:TransportError(500,u'ActionRequestValidationException [Validation Failed :1:没有添加任何请求;]')' – Nilesh
这很奇怪...你确定你有''_op_type“:”index“'? – Val