文章摘要
文字识别中特征与相似度度量的研究
Research on Feature and Similarity Measurement in Character Recognition
投稿时间:2016-05-20  
DOI:10.16018/j.cnki.cn32-1650/n.201604009
中文关键词: 概率特征  结构特征  相似度  文字识别
英文关键词: probability feature  structure feature  similarity  character recognition
基金项目:安徽省自然科学基金资助项目(1308085QF113)
作者单位
李杰 安徽工业大学 计算机科学与技术学院, 安徽 马鞍山 243002 
方木云 安徽工业大学 计算机科学与技术学院, 安徽 马鞍山 243002 
摘要点击次数: 4587
全文下载次数: 3704
中文摘要:
      在大样本测试集下国内现有成熟的OCR识别软件的首位识别准确率为95%~97%之间,在准确率和方法上仍有提升和改进的空间。提出一种基于概率特征和结构特征融合的自适应文字识别算法,模拟人类学习的模式,通过对训练样本的不断学习去构建汉字在测量空间的概率分布矩阵,然后比对原始图像和标准汉字库中汉字的概率分布矩阵的相似度来达到汉字分类的效果。其中相似度度量准则是从矩阵空间的结构和概率2个角度出发去构建的,充分考虑了结构模式识别和统计模式识别的优缺点。实验结果显示算法在训练样本下的首位识别正确率可以达到99.66%,在1 623张非训练样本文字图像下的首位识别正确率可以达到99.13%,在5 515张非训练样本文字图像下的首位识别正确率可以达到98.57%。可以证明提出的相似度度量方法在文字识别中的有效性。
英文摘要:
      In the large sample test set, the first recognition accuracy of the existing mature OCR recognition software is 95%~97%. There is still a space for improvement and improvement in accuracy and method. An adaptive character recognition algorithm based on the fusion of probability feature and structure feature is proposed. By simulating the model of human learning, we construct the probability distribution matrix of Chinese characters in the measurement space through continuous learning of training samples, and then compare the similarity between the original image and the probability distribution matrix of Chinese characters in the standard Chinese character library to achieve the effect of Chinese character classification. The similarity measurement criterion is constructed from two angles of the structure and probability of matrix space, and the advantages and disadvantages of structural pattern recognition and statistical pattern recognition are fully considered. The experimental results show that the algorithm can achieve the first recognition accuracy rate of 99. 66% in the training samples. The first recognition accuracy of the 1 623 non-training sample text images can reach 99. 13%. The first recognition accuracy of the 5 515 non-training sample text images can reach 98. 57%. It can be proved that the proposed similarity measure method is effective in word recognition.
查看全文   查看/发表评论  下载PDF阅读器
关闭