Saikat Roy

PhD Student (MIC, DKFZ)

Menu

Document Image Classification with Intra-Domain Transfer Learning and Stacked Generalization of Deep Convolutional Neural Networks


Journal article


Arindam Das, Saikat Roy, U. Bhattacharya
International Conference on Pattern Recognition, 2018

Semantic Scholar ArXiv DBLP DOI
Cite

Cite

APA   Click to copy
Das, A., Roy, S., & Bhattacharya, U. (2018). Document Image Classification with Intra-Domain Transfer Learning and Stacked Generalization of Deep Convolutional Neural Networks. International Conference on Pattern Recognition.


Chicago/Turabian   Click to copy
Das, Arindam, Saikat Roy, and U. Bhattacharya. “Document Image Classification with Intra-Domain Transfer Learning and Stacked Generalization of Deep Convolutional Neural Networks.” International Conference on Pattern Recognition (2018).


MLA   Click to copy
Das, Arindam, et al. “Document Image Classification with Intra-Domain Transfer Learning and Stacked Generalization of Deep Convolutional Neural Networks.” International Conference on Pattern Recognition, 2018.


BibTeX   Click to copy

@article{arindam2018a,
  title = {Document Image Classification with Intra-Domain Transfer Learning and Stacked Generalization of Deep Convolutional Neural Networks},
  year = {2018},
  journal = {International Conference on Pattern Recognition},
  author = {Das, Arindam and Roy, Saikat and Bhattacharya, U.}
}

Abstract

In this article, a region-based Deep Convolutional Neural Network framework is presented for document structure learning. The contribution of this work involves efficient training of region based classifiers and effective ensembling for document image classification. A primary level of ‘inter-domain’ transfer learning is used by exporting weights from a pre-trained VGG16 architecture on the ImageNet dataset to train a document classifier on whole document images. Exploiting the nature of region based influence modelling, a secondary level of ‘intra-domain’ transfer learning is used for rapid training of deep learning models for image segments. Finally, a stacked generalization based ensembling is utilized for combining the predictions of the base deep neural network models. The proposed method achieves state-of-the-art accuracy of 92.21% on the popular RVL-CDIP document image dataset, exceeding the benchmarks set by the existing algorithms.


Share



Follow this website


You need to create an Owlstown account to follow this website.


Sign up

Already an Owlstown member?

Log in