Individual Figures#

If you are preparing individual figures for a publication or report, you might need to modify the appearance or resolution of the figures you producing.

It might also be the case that it’s only certain figures you’re interested in.

topicwizard comes with an interface that allows you to do just that. If you have a TopicData object, you can manually produce individual figures.

These figures are like any other interactive Plotly figure, therefore you can manipulate them as such, and export them as HTML or a number of image formats. For an extensive overview of how you can manipulate plots produced by topicwizard consult Plotly’s documentation.

Topic Map#

You can display a semantic map of topics in your model.

from topicwizard.figures import topic_map

topic_map(topic_data)
topicwizard.figures.topic_map(topic_data: TopicData) Figure#

Plots topics on a scatter plot based on the UMAP projections of their parameters into 2D space.

Parameters:

topic_data (TopicData) – Inference data from topic modeling.

Word Barplots#

You can display a joint plot of all topics, where word importances are displayed on a bar chart. You can specify the relevance metric with the alpha keyword parameter.

from topicwizard.figures import topic_barcharts

topic_barcharts(topic_data)

If you find that too many words get displayed, you can reduce that with the top_n keyword.

topic_barcharts(topic_data, top_n=5)
topicwizard.figures.topic_barcharts(topic_data: TopicData, top_n: int = 5, n_columns: int = 4) Figure#

Plots most relevant words as bar charts for every topic.

Parameters:
  • topic_data (TopicData) – Inference data from topic modeling.

  • top_n (int, default 5) – Specifies the number of words to show for each topic.

  • n_columns (int, default 4) – Number of columns in the subplot grid.

Word Clouds#

You can produce a joint word cloud plot of all topics. You can specify the relevance metric with the alpha keyword parameter.

from topicwizard.figures import topic_wordclouds

topic_wordclouds(topic_data)
topicwizard.figures.topic_wordclouds(topic_data: TopicData, top_n: int = 30, n_columns: int = 4) Figure#

Plots most relevant words as word clouds for every topic.

Parameters:
  • topic_data (TopicData) – Inference data from topic modeling.

  • top_n (int, default 30) – Specifies the number of words to show for each topic.

  • n_columns (int, default 4) – Number of columns in the subplot grid.

Word Map#

The word map that you can display with a dedicated function is slightly different from the one in the app as here you can’t select words to highlight.

Instead you can specify a cutoff in Z-values over which words will be labelled on the graph.

Words are also distinctively colored according to the most relevant topic as you cannot select the individual words for inspection.

You can either choose to let UMAP discover the axis and project the words into 2D space, which is good for exploring words’ distances and relations to each other in the model, as well as potential clusters of words in the topic model.

from topicwizard.figures import word_map

word_map(topic_data)

Or you can display words with given topics as axes. This is especially useful for models like Semantic Signal Separation or Latent Semantic Analysis, where words with the lowest importance for a topic also cary information, as a topic is assumed to be an axis of semantic space.

from topicwizard.figures import word_map

word_map(
  topic_data,
  topic_axes=(
     "9_api_apis_register_automatedsarcasmgenerator",
     "4_study_studying_assessments_exams"
  )
)
topicwizard.figures.word_map(topic_data: TopicData, z_threshold: float = 2.0, topic_axes: Tuple[str | int, str | int] | None = None) Figure#

Plots words on a scatter plot based on UMAP projections of their importances in topics into 2D space or by two topic axes.

Parameters:
  • topic_data (TopicData) – Inference data from topic modeling.

  • z_threshold (float, default 2.0) – Z-score frequency threshold over which words get labels on the plot. The default roughly corresponds to 95% percentile if we assume normal distribution for word frequencies (which is probably not the case, see Zipf’s law). If you find not enough words have labels, lower this number if you find there is too much clutter on your graph, change this to something higher.

  • topic_axes (tuple of str|int, optional) – The topic axes along which the words should be displayed. If not specified, the axes on the graph are going to be UMAP projections’ dimensions.

Important Topics#

You can visualize most relevant topics for a given set of words with barcharts, these behave virtually the same as in the app, but no associations are selected by default.

So for example if we would like to know which topics contain the words “supreme” and “court”, we can do so:

from topicwizard.figures import word_association_barchart

word_association_barchart(topic_data, ["supreme", "court"])
topicwizard.figures.word_association_barchart(topic_data: TopicData, words: List[str] | str, n_association: int = 0, top_n: int = 20)#

Plots bar chart of most important topics for the given words and their closest associations in topic space.

Parameters:
  • topic_data (TopicData) – Inference data from topic modeling.

  • words (list[str] or str) – Words you want to start the association from.

  • n_association (int, default 0) – Number of words to associate with the given words. None get displayed by default.

  • top_n (int = 20) – Top N topics to display.

Document Map#

You can display a map of documents as a self-contained plot. This can be advantageous when you want to see how different documents relate to each other in your corpus, and to the underlying topics discovered by the model.

This plot is not entirely identical to the one in the app, as documents cannot be selected or searched for.

Different topics are clearly outlined with discrete colors.

from topicwizard.figures import document_map

document_map(topic_data)
topicwizard.figures.document_map(topic_data: TopicData, document_metadata: DataFrame | None = None) Figure#

Projects documents into 2d space and displays them on a scatter plot.

Parameters:
  • topic_data (TopicData) – Inference data from topic modeling.

  • document_metadata (DataFrame, optional) – Metadata you want displayed when hovering over documents on the graph.

Topic Distribution#

You can display topic distributions for a given document or list of documents on a bar chart.

from topicwizard.figures import document_topic_distribution

document_topic_distribution(
    topic_data,
    "New cure against type 2 diabetes in development.",
)
topicwizard.figures.document_topic_distribution(topic_data: TopicData, documents: List[str] | str, top_n: int = 8) Figure#

Displays topic distribution on a bar plot for a document or a set of documents.

Parameters:
  • topic_data (TopicData) – Inference data from topic modeling.

  • documents (list[str] or str) – Documents to display topic distribution for.

  • top_n (int, default 8) – Number of topics to display at most.

You can also display topic distribution over time in a single document on a line chart. (or an entire corpus if you join the texts.)

This works by taking windows of tokens from the document and running them through the pipeline. You can specify window and step size in number of tokens if you find the results have to high or to low resolution.

from topicwizard.figures import document_topic_timeline

document_topic_timeline(
    topic_data,
    "New cure against type 2 diabetes in development.",
)
topicwizard.figures.document_topic_timeline(topic_data: TopicData, document: str, window_size: int = 10, step_size: int = 1) Figure#

Projects documents into 2d space and displays them on a scatter plot.

Parameters:
  • topic_data (TopicData) – Inference data from topic modeling.

  • document (str) – Document to display the timeline for.

  • window_size (int, default 10) – The windows over which topic inference should be run.

  • step_size (int, default 1) – Size of the steps for the rolling window.

Group Map#

You can display the group map as a standalone plot, with the groups being colored according to dominant topic.

from topicwizard.figures import group_map

group_map(topic_data, group_labels)
topicwizard.figures.group_map(topic_data: TopicData, group_labels: List[str]) Figure#

Projects groups into 2d space and displays them on a scatter plot.

Parameters:
  • topic_data (TopicData) – Inference data from topic modeling.

  • group_labels (list[str]) – Labels for each of the documents in the corpus.

Group Topic Barcharts#

You can create a joint plot of the topic content of all groups. These will be displayed as bar charts.

from topicwizard.figures import group_topic_barcharts

group_topic_barcharts(corpus, group_labels, pipeline=pipeline, top_n=5)
topicwizard.figures.group_topic_barcharts(topic_data: TopicData, group_labels: List[str], top_n: int = 5, n_columns: int = 4)#

Displays the most important topics for each group.

Parameters:
  • topic_data (TopicData) – Inference data from topic modeling.

  • group_labels (list[str]) – Labels for each of the documents in the corpus.

  • top_n (int, default 5) – Maximum number of topics to display for each group.

  • n_columns (int, default 4) – Indicates how many columns the faceted plot should have.

Group Word Clouds#

You can create word clouds for each of the group labels. This will only take word counts into account and not relevance.

from topicwizard.figures import group_wordclouds

group_wordclouds(corpus, group_labels, pipeline=pipeline)
topicwizard.figures.group_wordclouds(topic_data: TopicData, group_labels: List[str], top_n: int = 30, n_columns: int = 4) Figure#

Plots wordclouds for each group.

Parameters:
  • topic_data (TopicData) – Inference data from topic modeling.

  • group_labels (list[str]) – Labels for each document in the corpus.

  • top_n (int, default 30) – Number of words to display for each group.

  • n_columns (int, default 4) – Number of columns the faceted plot should have.