我可以在Windows命令行中测试tesseract ocr吗?
问题描述:
我是tesseract OCR的新手。我试图将图像转换为tif并运行它以查看在windows中使用cmd的tesseract的输出,但是我不能。你可以帮我吗?什么将命令使用?我可以在Windows命令行中测试tesseract ocr吗?
这里是我的示例图像:
答
最简单的tesseract.exe语法tesseract.exe inputimage output-text-file
。 这里的假设是,tesseract.exe被添加到PATH
环境变量中。 如果您的文本参数特别难以识别,您可以添加-psm N
参数。
我发现正常的语法(没有任何-psm
开关)对于附加的图像来说工作得很好,除非精度水平不够好。
请注意,非英文字符(例如处方旁边的符号)未被识别;我的默认安装只包含英语培训数据。
这里的正方体语法描述:
C:\Users\vish\Desktop>tesseract.exe
Usage:tesseract.exe imagename outputbase [-l lang] [-psm pagesegmode] [configfile...]
pagesegmode values are:
0 = Orientation and script detection (OSD) only.
1 = Automatic page segmentation with OSD.
2 = Automatic page segmentation, but no OSD, or OCR
3 = Fully automatic page segmentation, but no OSD. (Default)
4 = Assume a single column of text of variable sizes.
5 = Assume a single uniform block of vertically aligned text.
6 = Assume a single uniform block of text.
7 = Treat the image as a single text line.
8 = Treat the image as a single word.
9 = Treat the image as a single word in a circle.
10 = Treat the image as a single character.
-l lang and/or -psm pagesegmode must occur before anyconfigfile.
Single options:
-v --version: version info
--list-langs: list available languages for tesseract engine
这里是为您的图像输出(注:当我下载了它,它转换成一个PNG图像):
C:\Users\vish\Desktop>tesseract.exe ECL8R.png out.txt
Tesseract Open Source OCR Engine v3.02 with Leptonica
C:\Users\vish\Desktop>type out.txt.txt
1 Project Background
A prescription (R) is a written order by a physician or medical doctor to a pharmacist in the form of
medication instructions for an individual patient. You can't get prescription medicines unless someone
with authority prescribes them. Usually, this means a written prescription from your doctor. Dentists,
optometrists, midwives and nurse practitioners may also be authorized to prescribe medicines for you.
It can also be defined as an order to take certain medications.
A prescription has legal implications; this means the prescriber must assume his responsibility for the
clinical care ofthe patient.
Recently, the term "prescriptionΓÇ¥ has known a wider usage being used for clinical assessments,
请解释一下你已经尝试过更详细的了。 – Vish 2014-10-09 10:29:27
@Vish我从它的网站安装了tesseract库。并从cmd我试图转换文本图像。 tesseract imagename.tif输出。但无法获得任何输出。 – Akunar 2014-10-09 23:57:28
对于您键入的语法,输出存储在文件output.txt中。你检查过这个文件是否被创建?另外,你可以上传你的TIF文件吗?如果我有一些时间,我可以检查我的tesseract安装。 – Vish 2014-10-10 05:44:08