elasticsearch用python进行部分更新

问题描述:

我有以下格式的elasticsearch文档。我需要部分更新“x”字段并在其中添加一个python字典。elasticsearch用python进行部分更新

{ 
     "_index": "gdata34", 
     "_type": "gdat", 
     "_id": "328091-72341-118", 
     "_version": 1, 
     "_score": 1, 
     "_source": { 
      "d": { 
       "Thursday": { 
        "s": "" 
       }, 
       "Email": { 
        "s": "" 
       }, 
       "Country": { 
        "s": "US" 
       }, 

      }, 
      "x": { 
       "Geo": { 
        "s": "45.335428,-118.057133", 
        "g": [ 
         -118.057133 
         , 
         45.335428 
        ] 
       } 
      }, 
      } 
     } 

我尝试下面的代码更新:

from elasticsearch import Elasticsearch, exceptions 
import pprint 


elasticsearch = Elasticsearch() 
doc = elasticsearch.get(index='gdata34', doc_type='gdat', id='328091-72341-7') 

elasticsearch.update(index='gdata34', doc_type='gdat', id='328091-72341-7', 
        body={"script":"ctx._source.x += y", 
          "params":{"y":"z"} 
        } 
        ) 
elasticsearch.indices.refresh(index='gdata34') 
new_doc = elasticsearch.get(index='gdata34', doc_type='gdat', id='328091-72341-7') 

我收到此错误:

elasticsearch.exceptions.RequestError: TransportError(400, u'ElasticsearchIllegalArgumentException[failed to execute script]; nested: ScriptException[dynamic scripting for [groovy] disabled]; ') 

什么是使用Python做elasticsearch部分更新的正确方法?

+0

您使用的是什么版本的ES? – 2015-04-06 10:42:05

+0

@LukasGraf 1.4.4 – Anish 2015-04-06 10:43:59

为了将来的参考,部分更新的以下方法工作。

elasticsearch.update(index='gdata34', doc_type='gdat', id='328091-72341-7', 
        body={ 
         'doc': {'x': {'y':'z'}} 
        } 
        ) 

ElasticSearch docs on scripting

We recommend running Elasticsearch behind an application or proxy, which protects Elasticsearch from the outside world. If users are allowed to run dynamic scripts (even in a search request), then they have the same access to your box as the user that Elasticsearch is running as. For this reason dynamic scripting is allowed only for sandboxed languages by default.

现在,在最近的ES版本出现了在Groovy脚本引擎,允许脚本来逃避沙盒和执行shell命令的脆弱性的错误用户运行Elasticsearch Java虚拟机 - 这就是为什么Groovy sandbox is disabled by default in recent versions以及因此在请求正文或.scripts索引中传递的Groovy脚本的执行情况。使用此默认配置执行Groovy脚本的唯一方法是将它们放置在节点上的config/scripts/目录中。

所以,你有两个选择:

  • 如果你的ES实例直接访问并固定在一个代理,你可以通过你的节点上设置在config/elasticsearch.ymlscript.groovy.sandbox.enabled: true重新打开Groovy的沙盒(S )。如果您的ES实例可通过您的
  • 访问您可以准备脚本并将其放置在节点的config/scripts目录中的文件系统上,然后按名称调用它。详情请参阅Running Groovy Scripts without Dynamic Scripting
+0

我们是否可以使用以下?elasticsearch.update(index ='gdata34',doc_type ='gdat',id ='328091-72341-7', body = { 'doc':{' x':{'y':'z'}} } ) – Anish 2015-04-06 11:02:21

+1

是的,但请务必阅读[documentation](http://www.elastic.co/guide/en/elasticsearch/reference/1.4/) docs-update.html) - 除非指定'“detect_noop”:true',否则这将始终导致文档被更新,即使合并过程未检测到任何更改。 – 2015-04-06 11:10:14

+0

至少在ElasticSearch 2.3(目前的最新版本)中,默认情况下启用“detect_noop”。 – 2016-06-12 20:14:05