Dominique Li

Trie-based output itemset sampling

By Lamine Diop, Cheikh Talibouya Diop, Arnaud Giacometti, Dominique Li, Arnaud Soulet


In 2022 IEEE international conference on big data (big data)

Abstract Pattern sampling algorithms produce interesting patterns with a probability proportional to a given utility measure. Utility changes need quick re-preprocessing when sampling patterns from large databases. In this context, existing sampling techniques require storing all data in memory, which is costly. To tackle these issues, this work enriches D. Knuth’s trie structure, avoiding 1) the need to access the database to sample since patterns are drawn directly from the enriched trie and 2) the necessity to reprocess the whole dataset when the utility changes.

Continue reading