This page contains illustrations for synthetic data:
Two examples of generated documents before (up) and after (down) addition of uniform noise. |
The clean (without noise) setting above is already difficult as we, humans, would hardly find the patterns. The documents have a low contrast as we generate only 40 observations per motif occurrence. Below, we show the motifs we used to generate these documents.
Original motifs used to generate documents. |
We ran our algorithm on 10 clean such documents of length 100, seeking for motifs for a maximum length of 10. As introduced in the article (see Fig 5), we see that, thanks to our prior, the actual motifs are aligned on the left within the recovered motifs leaving the first time step empty. Without the prior, we recover similar motifs but with random positioning of the actual motif within the recovered motif tables. Notice that the motifs are well recovered despite the fact that the length of the sought motifs was longer than that of the actual motifs.
5 longer motifs recovered with clean documents. |
Under noisy conditions, we observe that recovered motifs are a little noisy but that most of the added noise was captured by dedicated motifs.
5 longer motifs recovered with noisy document (ranked in order of importance according to the number of associated observations). |