如何用python bing替换单词拼写更正建议
问题描述:
我在python中使用了bing API进行拼写纠正。尽管我得到了带有建议的正确的Json格式,但它并未替换原始字符串。我试着用data.replace,但它不起作用。有没有其他简单的方法可以用建议的词语替换原始字符串。如何用python bing替换单词拼写更正建议
import httplib,urllib,base64
headers = {
# Request headers
'Ocp-Apim-Subscription-Key': '7fdf55a1a7e42d0a7890bab142343f8'
}
params = urllib.urlencode({
# Request parameters
'text': 'Lectures were really good. There were lot of people who came their without any Java knowledge and yet you were very suppor.',
'mode': 'proof',
'preContextText': '{string}',
'postContextText': '{string}',
'mkt': '{string}',
})
try:
conn = httplib.HTTPSConnection('api.cognitive.microsoft.com')
conn.request("GET", "/bing/v5.0/spellcheck/?%s" % params, "{body}", headers)
response = conn.getresponse()
data = response.read()
print(data)
conn.close()
except Exception as e:
print("[Errno {0}] {1}".format(e.errno, e.strerror))
输出(打印漂亮):
{'_type': 'SpellCheck',
'flaggedTokens': [{'offset': 61,
'suggestions': [{'score': 0.854956767552189,
'suggestion': 'there'}],
'token': 'their',
'type': 'UnknownToken'},
{'offset': 116,
'suggestions': [{'score': 0.871971469417366,
'suggestion': 'support'}],
'token': 'suppor',
'type': 'UnknownToken'}]}
答
你需要做自己更换你的文字。
您可以遍历“flaggedTokens”,每个令牌获取偏移,找到最好的建议,并建议更换令牌:
import operator
text = 'Lectures were really good. There were lot of people who came their without any Java knowledge and yet you were very suppor.'
data = {'_type': 'SpellCheck',
'flaggedTokens': [{'offset': 61,
'suggestions': [{'score': 0.854956767552189,
'suggestion': 'there'}],
'token': 'their',
'type': 'UnknownToken'},
{'offset': 116,
'suggestions': [{'score': 0.871971469417366,
'suggestion': 'support'}],
'token': 'suppor',
'type': 'UnknownToken'}]}
shifting = 0
correct = text
for ft in data['flaggedTokens']:
offset = ft['offset']
suggestions = ft['suggestions']
token = ft['token']
# find the best suggestion
suggestions.sort(key=operator.itemgetter('score'), reverse=True)
substitute = suggestions[0]['suggestion']
# replace the token by the suggestion
before = correct[:offset + shifting]
after = correct[offset + shifting + len(token):]
correct = before + substitute + after
shifting += len(substitute) - len(token)
print(correct)
你得到:“讲座真的很不错。有很多人在没有任何Java知识的情况下来到那里,但你却非常支持。“