Topic Data#

Topic data is the main abstraction in topicwizard that contains information about topical inference in a corpus, that can be used to reproduce the interpretive visualizations in the web app and individual figures.

This interface is needed so that topicwizard can use data from other libraries or your own topic models, and inference data can be persisted and used across different machines.

The TopicData type is what’s referred to as a TypedDict in Python. What this means is that TopicData is essentially just a dictionary at runtime, and is as such interoperable with anything else in Python, but static type checking is provided if you are using a type checker in your editor, like Pyright.

All visualization utils at least optionally take this object. This means that if you have a TopicData object from some corpus and some topic model, you can reproduce all visualizations using this object.

import topicwizard
from topicwizard.figures import topic_map

# Usage with figures
topic_map(topic_data)

# Usage with web app
# Beware that topic_data is a keyword argument
topicwizard.visualize(topic_data=topic_data)

API Reference#

class topicwizard.data.TopicData(*args, **kwargs)#

Inference data used to produce visualizations in the application and figures.

corpus#

The corpus on which inference was run.

Type:

list of str

vocab#

Array of all words in the vocabulary of the topic model.

Type:

ndarray of shape (n_vocab,)

document_term_matrix#

Bag-of-words document representations. Elements of the matrix are word importances/frequencies for given documents.

Type:

ndarray of shape (n_documents, n_vocab)

document_topic_matrix#

Topic importances for each document.

Type:

ndarray of shape (n_documents, n_topics)

topic_term_matrix#

Importances of each term for each topic in a matrix.

Type:

ndarray of shape (n_topics, n_vocab)

document_representation#

Embedded representations for documents. Can also be a sparse BoW matrix for classical models.

Type:

ndarray of shape (n_documents, n_dimensions)

transform#

Function that transforms documents to document-topic matrices. Can be None in the case of transductive models.

Type:

(list[str]) -> ndarray, optional

topic_names#

Names or topic descriptions inferred for topics by the model.

Type:

list of str