Multimodal Topic Modelling (BETA)

Note

Multimodal modeling is still a BETA feature in Turftopic, and it is likely that we will add more features and change the interface in the near future.

Some corpora spread across multiple modalities. A good example of this would be news articles with images attached. Turftopic now supports multimodal modelling with a number of models.

Multimodal Encoders

In order for images to be usable in Turftopic, you will need an embedding model that can both encode texts and images. You can both use models that are supported in SentenceTransformers, or those that support the MTEB multimodal encoder interface.

Use a multimodal encoder model

SentenceTransformersMTEB/MIEB

from turftopic import KeyNMF

multimodal_keynmf = KeyNMF(10, encoder="clip-ViT-B-32")

Tip

You can find current state-of-the-art embedding models and their capabilities on the Massive Image Embedding Benchmark leaderboard.

pip install "mteb<2.0.0"

from turftopic import KeyNMF
import mteb

encoder = mteb.get_model("kakaobrain/align-base")

multimodal_keynmf = KeyNMF(10, encoder="clip-ViT-B-32")

Corpus Structure

Currently all documents have to have an image attached to them, and only one image. This is a limitation, and we will address it in the future. Images can both be represented as file paths or PIL.Image objects.

from PIL import Image

images: list[Image] = [Image.open("file_path/something.jpeg"), ...]
texts: list[str] = [...]

len(images) == len(texts)

Basic Usage

All multimodal models have a fit_multimodal()/fit_transform_multimodal() method, that you can use to discover topics in multimodal corpora.

Fit a multimodal model on a corpus

SemanticSignalSeparationKeyNMFClustering ModelsGMMAutoEncodingTopicModel

from turftopic import SemanticSignalSeparation

model = SemanticSignalSeparation(12, encoder="clip-ViT-B-32")
model.fit_multimodal(texts, images=images)
model.plot_multimodal_topics()

from turftopic import KeyNMF

model = KeyNMF(12, encoder="clip-ViT-B-32")
model.fit_multimodal(texts, images=images)
model.plot_multimodal_topics()

from turftopic import ClusteringTopicModel

# BERTopic-style
model = ClusteringTopicModel(encoder="clip-ViT-B-32", feature_importance="c-tf-idf")
# Top2Vec-style
model = ClusteringTopicModel(encoder="clip-ViT-B-32", feature_importance="centroid")
model.fit_multimodal(texts, images=images)
model.plot_multimodal_topics()

from turftopic import GMM

model = GMM(12, encoder="clip-ViT-B-32")
model.fit_multimodal(texts, images=images)
model.plot_multimodal_topics()

from turftopic import AutoEncodingTopicModel

# CombinedTM
model = AutoEncodingTopicModel(12, combined=True, encoder="clip-ViT-B-32")
# ZeroShotTM
model = AutoEncodingTopicModel(12, combined=False, encoder="clip-ViT-B-32")
model.fit_multimodal(texts, images=images)
model.plot_multimodal_topics()

API reference

`turftopic.multimodal.MultimodalModel`

Base model for multimodal topic models.

Source code in turftopic/multimodal.py

class MultimodalModel:
    """Base model for multimodal topic models."""

    def encode_multimodal(
        self,
        sentences: list[str],
        images: list[ImageRepr],
    ) -> dict[str, np.ndarray]:
        """Produce multimodal embeddings of the documents passed to the model.

        Parameters
        ----------
        sentences: list[str]
            Textual documents to encode.
        images: list[ImageRepr]
            Corresponding images for each document.

        Returns
        -------
        MultimodalEmbeddings
            Text, image and joint document embeddings.

        """
        return encode_multimodal(self.encoder_, sentences, images)

    @staticmethod
    def validate_embeddings(embeddings: Optional[MultimodalEmbeddings]):
        if embeddings is None:
            return
        try:
            document_embeddings = embeddings["document_embeddings"]
            image_embeddings = embeddings["image_embeddings"]
        except KeyError as e:
            raise TypeError(
                "embeddings do not contain document and image embeddings, can't be used for multimodal modelling."
            ) from e
        if document_embeddings.shape != image_embeddings.shape:
            raise ValueError(
                f"Shape mismatch between document_embeddings {document_embeddings.shape} and image_embeddings {image_embeddings.shape}"
            )

    def validate_encoder(self):
        if not hasattr(self.encoder_, "encode"):
            if not all(
                (
                    hasattr(self.encoder_, "get_text_embeddings"),
                    hasattr(self.encoder_, "get_image_embeddings"),
                ),
            ):
                raise TypeError(
                    "An encoder must either have an encode() method or a get_text_embeddings and get_image_embeddings method (optionally get_fused_embeddings)"
                )

    @abstractmethod
    def fit_transform_multimodal(
        self,
        raw_documents: list[str],
        images: list[ImageRepr],
        y=None,
        embeddings: Optional[MultimodalEmbeddings] = None,
    ) -> np.ndarray:
        """Fits topic model in a multimodal context and returns the document-topic matrix.

        Parameters
        ----------
        raw_documents: iterable of str
            Documents to fit the model on.
        images: list[ImageRepr]
            Images corresponding to each document.
        y: None
            Ignored, exists for sklearn compatibility.
        embeddings: MultimodalEmbeddings
            Precomputed multimodal embeddings.

        Returns
        -------
        ndarray of shape (n_documents, n_topics)
            Document-topic matrix.
        """
        pass

    def fit_multimodal(
        self,
        raw_documents: list[str],
        images: list[ImageRepr],
        y=None,
        embeddings: Optional[MultimodalEmbeddings] = None,
    ):
        """Fits topic model on a multimodal corpus.

        Parameters
        ----------
        raw_documents: iterable of str
            Documents to fit the model on.
        images: list[ImageRepr]
            Images corresponding to each document.
        y: None
            Ignored, exists for sklearn compatibility.
        embeddings: MultimodalEmbeddings
            Precomputed multimodal embeddings.

        Returns
        -------
        Self
            The fitted topic model
        """
        self.fit_transform_multimodal(raw_documents, images, y, embeddings)
        return self

    @staticmethod
    def collect_top_images(
        images: list[Image.Image],
        image_topic_matrix: np.ndarray,
        n_images: int = 20,
        negative: bool = False,
    ) -> list[list[Image.Image]]:
        top_images: list[list[Image.Image]] = []
        for image_topic_vector in image_topic_matrix.T:
            if negative:
                image_topic_vector = -image_topic_vector
            top_im_ind = np.argsort(-image_topic_vector)[:20]
            top_im = [images[i] for i in top_im_ind]
            top_images.append(top_im)
        return top_images

    def prepare_multimodal_topic_data(
        self,
        corpus: list[str],
        images: list[ImageRepr],
        embeddings: Optional[MultimodalEmbeddings] = None,
    ) -> TopicData:
        """Produces multimodal topic inference data for a given corpus, that can be then used and reused.
        Exists to allow visualizations out of the box with topicwizard.

        Parameters
        ----------
        corpus: list[str]
            Documents to infer topical content for.
        images: list[ImageRepr]
            Images belonging to the documents.
        embeddings: MultimodalEmbeddings
            Embeddings of documents.

        Returns
        -------
        TopicData
            Information about topical inference in a dictionary.
        """
        if embeddings is None:
            embeddings = self.encode_multimodal(corpus, images)
        document_topic_matrix = self.fit_transform_multimodal(
            corpus, images=images, embeddings=embeddings
        )
        dtm = self.vectorizer.transform(corpus)  # type: ignore
        try:
            classes = self.classes_
        except AttributeError:
            classes = list(range(self.components_.shape[0]))
        res = TopicData(
            corpus=corpus,
            document_term_matrix=dtm,
            vocab=self.get_vocab(),
            document_topic_matrix=document_topic_matrix,
            document_representation=embeddings["document_embeddings"],
            topic_term_matrix=self.components_,  # type: ignore
            transform=getattr(self, "transform", None),
            topic_names=self.topic_names,
            classes=classes,
            has_negative_side=self.has_negative_side,
            hierarchy=getattr(self, "hierarchy", None),
            images=images,
            top_images=self.top_images,
            negative_images=getattr(self, "negative_images", None),
        )
        return res

`encode_multimodal(sentences, images)`

Produce multimodal embeddings of the documents passed to the model.

Parameters:

Name	Type	Description	Default
`sentences`	`list[str]`	Textual documents to encode.	required
`images`	`list[ImageRepr]`	Corresponding images for each document.	required

Returns:

Type	Description
`MultimodalEmbeddings`	Text, image and joint document embeddings.

Source code in turftopic/multimodal.py

def encode_multimodal(
    self,
    sentences: list[str],
    images: list[ImageRepr],
) -> dict[str, np.ndarray]:
    """Produce multimodal embeddings of the documents passed to the model.

    Parameters
    ----------
    sentences: list[str]
        Textual documents to encode.
    images: list[ImageRepr]
        Corresponding images for each document.

    Returns
    -------
    MultimodalEmbeddings
        Text, image and joint document embeddings.

    """
    return encode_multimodal(self.encoder_, sentences, images)

`fit_multimodal(raw_documents, images, y=None, embeddings=None)`

Fits topic model on a multimodal corpus.

Parameters:

Name	Type	Description	Default
`raw_documents`	`list[str]`	Documents to fit the model on.	required
`images`	`list[ImageRepr]`	Images corresponding to each document.	required
`y`		Ignored, exists for sklearn compatibility.	`None`
`embeddings`	`Optional[MultimodalEmbeddings]`	Precomputed multimodal embeddings.	`None`

Returns:

Type	Description
`Self`	The fitted topic model

Source code in turftopic/multimodal.py

def fit_multimodal(
    self,
    raw_documents: list[str],
    images: list[ImageRepr],
    y=None,
    embeddings: Optional[MultimodalEmbeddings] = None,
):
    """Fits topic model on a multimodal corpus.

    Parameters
    ----------
    raw_documents: iterable of str
        Documents to fit the model on.
    images: list[ImageRepr]
        Images corresponding to each document.
    y: None
        Ignored, exists for sklearn compatibility.
    embeddings: MultimodalEmbeddings
        Precomputed multimodal embeddings.

    Returns
    -------
    Self
        The fitted topic model
    """
    self.fit_transform_multimodal(raw_documents, images, y, embeddings)
    return self

`fit_transform_multimodal(raw_documents, images, y=None, embeddings=None)` `abstractmethod`

Fits topic model in a multimodal context and returns the document-topic matrix.

Parameters:

Name	Type	Description	Default
`raw_documents`	`list[str]`	Documents to fit the model on.	required
`images`	`list[ImageRepr]`	Images corresponding to each document.	required
`y`		Ignored, exists for sklearn compatibility.	`None`
`embeddings`	`Optional[MultimodalEmbeddings]`	Precomputed multimodal embeddings.	`None`

Returns:

Type	Description
`ndarray of shape (n_documents, n_topics)`	Document-topic matrix.

Source code in turftopic/multimodal.py

@abstractmethod
def fit_transform_multimodal(
    self,
    raw_documents: list[str],
    images: list[ImageRepr],
    y=None,
    embeddings: Optional[MultimodalEmbeddings] = None,
) -> np.ndarray:
    """Fits topic model in a multimodal context and returns the document-topic matrix.

    Parameters
    ----------
    raw_documents: iterable of str
        Documents to fit the model on.
    images: list[ImageRepr]
        Images corresponding to each document.
    y: None
        Ignored, exists for sklearn compatibility.
    embeddings: MultimodalEmbeddings
        Precomputed multimodal embeddings.

    Returns
    -------
    ndarray of shape (n_documents, n_topics)
        Document-topic matrix.
    """
    pass

`prepare_multimodal_topic_data(corpus, images, embeddings=None)`

Produces multimodal topic inference data for a given corpus, that can be then used and reused. Exists to allow visualizations out of the box with topicwizard.

Parameters:

Name	Type	Description	Default
`corpus`	`list[str]`	Documents to infer topical content for.	required
`images`	`list[ImageRepr]`	Images belonging to the documents.	required
`embeddings`	`Optional[MultimodalEmbeddings]`	Embeddings of documents.	`None`

Returns:

Type	Description
`TopicData`	Information about topical inference in a dictionary.

Source code in turftopic/multimodal.py

def prepare_multimodal_topic_data(
    self,
    corpus: list[str],
    images: list[ImageRepr],
    embeddings: Optional[MultimodalEmbeddings] = None,
) -> TopicData:
    """Produces multimodal topic inference data for a given corpus, that can be then used and reused.
    Exists to allow visualizations out of the box with topicwizard.

    Parameters
    ----------
    corpus: list[str]
        Documents to infer topical content for.
    images: list[ImageRepr]
        Images belonging to the documents.
    embeddings: MultimodalEmbeddings
        Embeddings of documents.

    Returns
    -------
    TopicData
        Information about topical inference in a dictionary.
    """
    if embeddings is None:
        embeddings = self.encode_multimodal(corpus, images)
    document_topic_matrix = self.fit_transform_multimodal(
        corpus, images=images, embeddings=embeddings
    )
    dtm = self.vectorizer.transform(corpus)  # type: ignore
    try:
        classes = self.classes_
    except AttributeError:
        classes = list(range(self.components_.shape[0]))
    res = TopicData(
        corpus=corpus,
        document_term_matrix=dtm,
        vocab=self.get_vocab(),
        document_topic_matrix=document_topic_matrix,
        document_representation=embeddings["document_embeddings"],
        topic_term_matrix=self.components_,  # type: ignore
        transform=getattr(self, "transform", None),
        topic_names=self.topic_names,
        classes=classes,
        has_negative_side=self.has_negative_side,
        hierarchy=getattr(self, "hierarchy", None),
        images=images,
        top_images=self.top_images,
        negative_images=getattr(self, "negative_images", None),
    )
    return res

`turftopic.encoders.multimodal.MultimodalEncoder`

Bases: Protocol

Base class for external encoder models.

Source code in turftopic/encoders/multimodal.py

class MultimodalEncoder(Protocol):
    """Base class for external encoder models."""

    def get_text_embeddings(
        self,
        texts: list[str],
        *,
        batch_size: int = 8,
        **kwargs,
    ): ...

    def get_image_embeddings(
        self,
        images: list[Image.Image],
        *,
        batch_size: int = 8,
        **kwargs,
    ): ...

    def get_fused_embeddings(
        self,
        texts: list[str] = None,
        images: list[Image.Image] = None,
        batch_size: int = 8,
        **kwargs,
    ): ...

Multimodal Topic Modelling (BETA)

Multimodal Encoders

Corpus Structure

Basic Usage

API reference

turftopic.multimodal.MultimodalModel

encode_multimodal(sentences, images)

fit_multimodal(raw_documents, images, y=None, embeddings=None)

fit_transform_multimodal(raw_documents, images, y=None, embeddings=None) abstractmethod

prepare_multimodal_topic_data(corpus, images, embeddings=None)

turftopic.encoders.multimodal.MultimodalEncoder

`turftopic.multimodal.MultimodalModel`

`encode_multimodal(sentences, images)`

`fit_multimodal(raw_documents, images, y=None, embeddings=None)`

`fit_transform_multimodal(raw_documents, images, y=None, embeddings=None)` `abstractmethod`

`prepare_multimodal_topic_data(corpus, images, embeddings=None)`

`turftopic.encoders.multimodal.MultimodalEncoder`