文字识别中特征与相似度度量的研究

李杰; 方木云

文章摘要

文字识别中特征与相似度度量的研究

Research on Feature and Similarity Measurement in Character Recognition

投稿时间：2016-05-20

DOI：10.16018/j.cnki.cn32-1650/n.201604009

中文关键词: 概率特征结构特征相似度文字识别

英文关键词: probability feature structure feature similarity character recognition

基金项目:安徽省自然科学基金资助项目(1308085QF113)

作者	单位
李杰	安徽工业大学计算机科学与技术学院, 安徽马鞍山 243002
方木云	安徽工业大学计算机科学与技术学院, 安徽马鞍山 243002

摘要点击次数: 7330

全文下载次数: 5457

中文摘要:

在大样本测试集下国内现有成熟的OCR识别软件的首位识别准确率为95%~97%之间，在准确率和方法上仍有提升和改进的空间。提出一种基于概率特征和结构特征融合的自适应文字识别算法，模拟人类学习的模式，通过对训练样本的不断学习去构建汉字在测量空间的概率分布矩阵，然后比对原始图像和标准汉字库中汉字的概率分布矩阵的相似度来达到汉字分类的效果。其中相似度度量准则是从矩阵空间的结构和概率2个角度出发去构建的，充分考虑了结构模式识别和统计模式识别的优缺点。实验结果显示算法在训练样本下的首位识别正确率可以达到99.66%，在1 623张非训练样本文字图像下的首位识别正确率可以达到99.13%，在5 515张非训练样本文字图像下的首位识别正确率可以达到98.57%。可以证明提出的相似度度量方法在文字识别中的有效性。

英文摘要:

In the large sample test set, the first recognition accuracy of the existing mature OCR recognition software is 95%~97%. There is still a space for improvement and improvement in accuracy and method. An adaptive character recognition algorithm based on the fusion of probability feature and structure feature is proposed. By simulating the model of human learning, we construct the probability distribution matrix of Chinese characters in the measurement space through continuous learning of training samples, and then compare the similarity between the original image and the probability distribution matrix of Chinese characters in the standard Chinese character library to achieve the effect of Chinese character classification. The similarity measurement criterion is constructed from two angles of the structure and probability of matrix space, and the advantages and disadvantages of structural pattern recognition and statistical pattern recognition are fully considered. The experimental results show that the algorithm can achieve the first recognition accuracy rate of 99. 66% in the training samples. The first recognition accuracy of the 1 623 non-training sample text images can reach 99. 13%. The first recognition accuracy of the 5 515 non-training sample text images can reach 98. 57%. It can be proved that the proposed similarity measure method is effective in word recognition.

查看全文查看/发表评论下载PDF阅读器

关闭