Publication details

Who is Selling to Whom – Feature Evaluation for Multi-block Classification in Invoice Information Extraction

Authors

HA Hien Thi HORÁK Aleš

Year of publication 2021
Type Article in Proceedings
Conference SPECOM 2021: 23rd International Conference on Speech and Computer
MU Faculty or unit

Faculty of Informatics

Citation
Web https://link.springer.com/chapter/10.1007/978-3-030-87802-3_23
Doi http://dx.doi.org/10.1007/978-3-030-87802-3_23
Keywords OCR; Invoice; Block type classification; Seller; Buyer; Delivery address
Description The invoice information extraction task aims at unifying the automatized processing of invoices in structured forms and in the form of a scanned image. Recognizing the pieces of information where a specific value is identified with a keyword (such as the invoice date) is a relatively well-managed task. On the other hand, identification of multi-block information on the invoice, such as distinguishing the seller, buyer, and the delivery address, is much more challenging due to versatile invoice layouts. In this work, we present a new technique of feature extraction and classification to recognize the seller, buyer, and delivery address text blocks in scanned invoices based on a combination of complex layout and annotated text features. The method does not only consider the block positional features but also the relation between blocks and block contents at a higher level. The technique is implemented as a module of the OCRMiner system. We offer its detailed evaluation and error analysis with a dataset of more than five hundred Czech invoices reaching the overall macro average F1-score of 94%.
Related projects:

You are running an old browser version. We recommend updating your browser to its latest version.

More info