A first step toward a fair comparison of evaluation protocols for text detection algorithms

Abstract

Text detection is an important topic in pattern recognition, but evaluating the reliability of such detection algorithms is challenging. While many evaluation protocols have been developed for that purpose, they often show dissimilar behaviors when applied in the same context. As a consequence, their usage may lead to misinterpretations, potentially yielding erroneous comparisons between detection algorithms or their incorrect parameters tuning. This paper is a first attempt to derive a methodology to perform the comparison of evaluation protocols. We then apply it on five state-of-the-art protocols, and exhibit that there indeed exist inconsistencies among their evaluation criteria. Our aim here is not to rank the investigated evaluation protocols, but rather raising awareness in the community that we should carefully reconsider them in order to converge to their optimal usage.