Probabilistic Models: temporal topic models and more

PLSM: introduction

PLSM stands for Probabilistic Latent Sequential Motif. It can be seen as a time-sensitive evolution of PLSA (Probabilistic Latent Sequential Analysis) which is the original probabilistic topic model. PLSM, similarily to PLSA, is defined by a probabilistic generative model and learning the parameters of the model can be done using an EM algorithm (Expectation-Maximization).

PLSM: understanding the model

PLSM can be represented as a graphical model, wherein nodes represent random variables and the absence of link between nodes represents conditional independence. Here, we provide three equivalent views of the PLSM model.

image/svg+xml z t a t s t r w d D Nd
image/svg+xml t a z t s t r w d D Nd
image/svg+xml t a z t s t r w d D Nd φ K


The PLSM model explains how the set of all observations is supposed to be generated. Each observation is a triple (d,w,ta) meaning that a word w occured once at time ta in the document d. PLSM supposes that there exists a set of K motifs named φ (represented only in the last version). The generative process of each observation goes as follow:

  • draw the document d from a distribution p(d),
  • draw a pair (z,ts) made of a motif index and a starting time, drawn from a per document starting distribution p(z,ts|d),
  • given this z, draw a pair (w,tr) of a word and a relative time, drawn from the corresponding motif defined as a distribution p(w,tr|z) (or φz(w,tr)).
  • set the absolute time of the observation as the sum of the motif starting time and the drawn relative time: ta = ts + tr.

Given a set observations, an Expectation Maximization algorithm allows to find the most likely parameters. The set of parameters is made of the p(z,ts|d) distribution and the p(w,tr|z) distributions (φ in the third representation).