从图像中提取文本

问题描述：

我正在从图像中提取文本。从图像中提取文本

最初图像被着色成放置在白色的文字，在进一步处理所述图像，文本显示在黑色和其它像素是白色的（有一些噪声），在这里是一个示例：

现在，当我尝试使用pytesseract（tesseract）对其进行OCR时，我仍然没有收到任何文本。

是否有解决方案可以从彩色图像中提取文本？

将颜色转换为灰度并设置二进制阈值，以使所有内容都为黑色或白色。你可以尝试使用去斑或删除噪声，但如果命令行中的'tesseract'不能提取它，那么我会推荐来自Google的'ocropy'。 –

您是否尝试从[Adrian Rosebrock的博客]（http://www.pyimagesearch.com/2017/07/10/using-tesseract-ocr-python/）获取帮助？ –

原则上它应该是可能的：您的图片在Google OCR中运行得很好，而在ocr.space中则为一半。我测试了https://ocr.space/compare-ocr-software –

答

from PIL import Image 
import pytesseract 
import argparse 
import cv2 

# construct the argument parser and parse the arguments 
ap = argparse.ArgumentParser() 
ap.add_argument("-i", "--image", required=True, help="Path to the image") 
args = vars(ap.parse_args()) 

# load the image and convert it to grayscale 
image = cv2.imread(args["image"]) 
cv2.imshow("Original", image) 

# Apply an "average" blur to the image 

blurred = cv2.blur(image, (3,3)) 
cv2.imshow("Blurred_image", blurred) 
img = Image.fromarray(blurred) 
text = pytesseract.image_to_string(img, lang='eng') 
print (text) 
cv2.waitKey(0)

由于作为结果我得到=“住宿：在Overwoter平房$ 3。»”

怎么样使用轮廓，并从它承担不必要的斑点？可能会工作

谢谢，我会试一试，会发布结果。 –

从图像中提取文本

相关推荐