The purpose of this study is to automatically recognize goods information in leaflets images and to create a database in order to record and refer to leaflets information. Leaflet information is divided into the content information (company name, goods name and content) and the price information of the goods. We aimed to realize a function to recognize the content information, which has not been realized yet. In order to recognize the content information, it is necessary to recognize characters in a complex background. Therefore, characters were recognized using the OCR function of Google Cloud Vision API. In order to correct misrecognitions automatically and to recognize the content information, we realized the recognition of character color and background color, the correction of coordinates using these colors, the correction of misspellings using our own goods information database, and the separation of company name, goods name, and content amount. In the experiment, we used 154 pieces of content information, which consisted of a company name, a goods name and a content amount. Although about half of the content information contained misrecognition, 92.9% of the content information was recognized correctly. This method was shown to be effective as a recognition method of content information.
|Translated title of the contribution||Automatic recognition of goods information in leaflets: -Content information recognition-|
|Number of pages||40|
|Journal||Journal of the Japan Personal Computer Application Technology Society|
|Publication status||Published - 2021 Mar 27|