Topic Data#
Topic data is the main abstraction in topicwizard that contains information about topical inference in a corpus, that can be used to reproduce the interpretive visualizations in the web app and individual figures.
This interface is needed so that topicwizard can use data from other libraries or your own topic models, and inference data can be persisted and used across different machines.
The TopicData type is what’s referred to as a TypedDict in Python. What this means is that TopicData is essentially just a dictionary at runtime, and is as such interoperable with anything else in Python, but static type checking is provided if you are using a type checker in your editor, like Pyright.
All visualization utils at least optionally take this object. This means that if you have a TopicData object from some corpus and some topic model, you can reproduce all visualizations using this object.
import topicwizard
from topicwizard.figures import topic_map
# Usage with figures
topic_map(topic_data)
# Usage with web app
# Beware that topic_data is a keyword argument
topicwizard.visualize(topic_data=topic_data)
API Reference#
- class topicwizard.data.TopicData(*args, **kwargs)#
Inference data used to produce visualizations in the application and figures.
- corpus#
The corpus on which inference was run.
- Type:
list
ofstr
- vocab#
Array of all words in the vocabulary of the topic model.
- Type:
ndarray
ofshape (n_vocab,)
- document_term_matrix#
Bag-of-words document representations. Elements of the matrix are word importances/frequencies for given documents.
- Type:
ndarray
ofshape (n_documents
,n_vocab)
- document_topic_matrix#
Topic importances for each document.
- Type:
ndarray
ofshape (n_documents
,n_topics)
- topic_term_matrix#
Importances of each term for each topic in a matrix.
- Type:
ndarray
ofshape (n_topics
,n_vocab)
- document_representation#
Embedded representations for documents. Can also be a sparse BoW matrix for classical models.
- Type:
ndarray
ofshape (n_documents
,n_dimensions)
- transform#
Function that transforms documents to document-topic matrices. Can be None in the case of transductive models.
- Type:
(list[str]) -> ndarray
, optional
- topic_names#
Names or topic descriptions inferred for topics by the model.
- Type:
list
ofstr