.. meta::
:description: Orange3 Textable Prototypes documentation, Topic Models
widget
:keywords: Orange3, Textable, Prototypes, documentation, topic models,
widget
.. _Topic Models:
Topic Models
============
.. image:: figures/topic_models.png
Build topic models based on term-document matrices.
Author
------
Aris Xanthos
Signals
-------
Input:
* ``Textable crosstab``
A term-document matrix in Textable PivotCrosstab format
Outputs:
* ``Term-topic Textable table``
A table (in Textable PivotCrosstab format) showing the association between terms and topics
* ``Document-topic Textable table``
A table (in Textable PivotCrosstab format) showing the association between documents and topics
* ``Term-topic Orange table``
A table (in Orange format) showing the association between terms and topics
* ``Document-topic Orange table``
A table (in Orange PivotCrosstab format) showing the association between documents and topics
Description
-----------
This widget takes a term-document matrix in input (such as emitted by
Textable's **Count** widget) and applies one of several topic modelling methods
to these data in order to infer latent, fuzzy word and document categories.
Two of the underlying methods (Latent Dirichlet and Latent semantic indexing
allocation) are based on the `Gensim `_
third-party package while the third method (correspondence analysis) uses
Orange's internal implementation.
The widget's output are two pairs of tables (one in Textable format and one in
Orange format): term-topic tables show how strongly each topic is associated to
each term, and document-topic tables displays their association with each
document.
In addition, the widget's interface shows the list of terms that are most
strongly associated with each topic. In the case of Latent semantic indexing
and Correspondence analysis, the displayed terms are those that are
either positively or negatively associated with each latent dimension
(or factor, or component), and an indication of the proportion of variance
(or inertia) explained by each topic is also given (see
:ref:`figure 1 ` below).
Interface
~~~~~~~~~
The widget's interface requires little input from the user (see
:ref:`figure 1 ` below): the desired topic modelling
**Method** (Latent Dirichlet allocation, Latent semantic indexing, or
Correspondence analysis) and the **Number of topics** to be computed.
.. _topic_models_fig1:
.. figure:: figures/topic_models_interface.png
:align: center
:alt: Interface of the Topic Models widget
Figure 1: **Topic Models** widget interface.
The **Info** section indicates that the input has been correctly processed, or
the reason why no output is emitted (no input, etc.).
The **Send** button triggers the computation and emission of term-topic and
document-topic tables to the output connection(s). When it is selected, the
**Send automatically** checkbox disables the button and the widget attempts to
automatically emit results at every modification of its interface.
Messages
--------
Information
~~~~~~~~~~~
*Tables correctly sent to output.*
This confirms that the widget has operated properly.
Warnings
~~~~~~~~
*Settings were changed, please click 'Send' when ready.*
Settings have changed but the **Send automatically** checkbox
has not been selected, so the user is prompted to click the **Send**
button (or equivalently check the box) in order for computation and data
emission to proceed.
*Widget needs input*
A term-document matrix (in Textable PivotCrosstab format) should be input
in the widget.