Topic Analysis with LLMs

Topic analyzers are large language models, that are capable of interpreting topics' contents and can give human-readable descriptions of topics. This can be incredibly useful when it would require excessive manual labour to label and understand topics.

The role of analyzers in topic modelling.

Analyzers can do the following tasks:

Summarize documents to make it easier for your topic model to consume.
Name topics topics in a sensible and human-readable way based on top documents and keywords
Describe topics in a couple of sentences

While previously, smaller language models were not able to meaningfully accomplish this task, advances in in the field now allow you to generate highly accurate topic descriptions on your own laptop using the power of small LLMs.

Warning

The namers API is now deprecated and will be removed in Turftopic 1.1.0. Analyzers have full feature parity, and are able to accomplish way more.

Getting Started

There are multiple types of analyzers in Turftopic that you can utilize for these tasks, all of which can be imported for the analyzers module:

Choose an analyzer

Local LLM (recommended)OpenAI APIT5

LLMs from HF Hub are natively supported in Turftopic. Our default choice of LLM is SmolLM3-3B, as it runs effortlessly on consumer hardware, is permissively licensed, allowing commercial use, and generates high-quality output.

You can specify your model of choice by specifying model_name="<your_model_here>".

SmolLM is also fine-tuned for reasoning. This is disabled by default to reduce computational burden, but you can enable it by specifying enable_thinking=True.

from turftopic.analyzers import LLMAnalyzer

# We enable document summaries for topic analysis
analyzer = LLMAnalyzer(use_summaries=True)

You will have to install OpenAI, as it is not installed by default:

pip install turftopic[openai]
export OPENAI_API_KEY="sk-<your key goes here>"

The default model is gpt-5-nano, which is the cheapest new model in OpenAI's arsenal, and we found it generates satisfactory results.

from turftopic.analyzers import OpenAIAnalyzer

analyzer = OpenAIAnalyzer('gpt-5-nano')

T5 is less resource-intensive then causal language models, but it also generates lower quality results. You might have to fiddle around with it to get satisfactory results.

from turftopic import T5Analyzer

model = T5Analyzer("google/flan-t5-large")

Document summarization

You can utilize large-language models for summarizing documents as a pre-processing step. This might make it easier for certain topic models to find patterns. You can also instruct the language model to summarize documents from a certain aspect.

from turftopic import KeyNMF

# Your documents
corpus: list[str] = [...]

summarized_documents = [analyzer.summarize_document(doc) for doc in corpus]

# Then we fit the topic model on the document summaries, which might be easier to analyze
model = KeyNMF(10)
model.fit(summarized_documents)

Topic analysis

You can also use LLMs after having trained a topic model to analyze topics' contents. Analysis in this case consists of:

Naming the topics in a model and
giving a short description of its contents.

There are a number of options you should be aware of when doing this:

The LLMs will always utilize the top keywords extracted by a topic model
When use_documents is set to True (default), the analyzer will also use the top 10 documents from the topic model.
When use_summaries is active, the analyzer first summarizes top 10 documents before feeding them to the analyzer. This can be a massive help, since it makes it easier for the analyzer to process the content, and makes sure that the analyzer's context length is enough. It does require more computation, though.

Let's see what this looks like in action:

Analyze topics

with modelwith topic_data

from turftopic import KeyNMF
from turftopic.analyzers import LLMAnalyzer

analyzer = LLMAnalyzer(use_summaries=False)

model = KeyNMF(10).fit(corpus)
analysis_result = model.analyze_topics(analyzer, use_documents=True)

from turftopic import KeyNMF
from turftopic.analyzers import LLMAnalyzer

analyzer = LLMAnalyzer(use_summaries=False)

model = KeyNMF(10)
topic_data = model.prepare_topic_data(corpus)
analysis_result = topic_data.analyze_topics(analyzer, use_documents=True)

Topic Naming

If you only wish to assign topic names, but not generate a full analysis, you can still use rename_topics:

model.rename_topics(analyzer, use_documents=False)

This will do multiple things:

Return an AnalysisResults object which contains: topic_names, topic_descriptions and document_summaries, which are the top documents' summaries, when applicable
Set these properties on the object it gets called on (model or topic_data)

AnalysisResults can also be turned into a DataFrame or dictionary, by calling to_df() and to_dict() respectively.

analysis_result.to_df()

                                         topic_names                                 topic_descriptions
0                         Dialogue and Communication  This topic examines how conversation functions...
1  AI Assistant: Requesting Detailed User Informa...  It describes an assistant that asks the user f...
2          Ethical Generative AI and Language Models  It covers the design and deployment of generat...
3   French–English Translation in Law and Literature  It examines translation between French and Eng...
4  France: Social, Economic, Legal Information an...  It covers how social conversations in France e...
5                   Email-based Python code requests  It depicts a user making requests that involve...
6           Lesson Planning and Classroom Activities  It covers the school-based process of teaching...
7         French cultural conversations for children  It explores how people talk about culture in F...
8            Data Analytics Training and Development  It focuses on structured training programs tha...
9                 Sustainable Energy and Environment  It explores how energy production and use infl...

`turftopic.analyzers.base.AnalysisResults` `dataclass`

Container class for results of topic analysis.

Attributes:

Name	Type	Description
`topic_names`	`list[str]`	Generated topic names.
`topic_descriptions`	`list[str]`	Genreated topic descriptions.
`document_summaries`	`list[list[str]], default None`	Summaries of top 10 documents for each topic, when use_summaries is enabled.

Source code in turftopic/analyzers/base.py

@dataclass
class AnalysisResults:
    """Container class for results of topic analysis.

    Attributes
    ----------
    topic_names: list[str]
        Generated topic names.
    topic_descriptions: list[str]
        Genreated topic descriptions.
    document_summaries: list[list[str]], default None
        Summaries of top 10 documents for each topic, when use_summaries is enabled.
    """

    topic_names: list[str]
    topic_descriptions: list[str]
    document_summaries: Optional[list[list[str]]] = None

    def to_dict(self) -> dict:
        """Returns the analysis result as a dictionary"""
        res = dict(
            topic_names=self.topic_names,
            topic_descriptions=self.topic_descriptions,
        )
        if self.document_summaries is not None:
            res["document_summaries"] = self.document_summaries
        return res

    def to_df(self):
        """Turns analysis result object into a dataframe"""
        try:
            import pandas as pd
        except ModuleNotFoundError:
            raise ModuleNotFoundError(
                "You need to pip install pandas to be able to use dataframes."
            )
        return pd.DataFrame(self.to_dict())

`to_df()`

Turns analysis result object into a dataframe

Source code in turftopic/analyzers/base.py

def to_df(self):
    """Turns analysis result object into a dataframe"""
    try:
        import pandas as pd
    except ModuleNotFoundError:
        raise ModuleNotFoundError(
            "You need to pip install pandas to be able to use dataframes."
        )
    return pd.DataFrame(self.to_dict())

`to_dict()`

Returns the analysis result as a dictionary

Source code in turftopic/analyzers/base.py

def to_dict(self) -> dict:
    """Returns the analysis result as a dictionary"""
    res = dict(
        topic_names=self.topic_names,
        topic_descriptions=self.topic_descriptions,
    )
    if self.document_summaries is not None:
        res["document_summaries"] = self.document_summaries
    return res

Prompting

You can instruct analyzers to specifically deal with the task you are trying to accomplish by using prompts. Here we will give an overview of how you can do this.

Providing Task Context

Sometimes you might have a specific task that might require additional information to analyze correctly. You can add information to the prompts by using the context attribute:

from turftopic.analyzers import LLMAnalyzer

analyzer = LLMAnalyzer(context="Analyze topical content in financial documents published by the central bank.")

Fully Custom Prompts

Since all analyzers are generative language models, you can prompt them however you wish. We provide default prompts, which we found to prove well, but you are more than free to modify these.

Prompts internally get formatted with str.format(), so all templated content should be in-between curly brackets. Analyzers have a number of prompts:

system_prompt = DEFAULT_SYSTEM_PROMPT
summary_prompt = SUMMARY_PROMPT
namer_prompt = NAMER_PROMPT
description_prompt = DESCRIPTION_PROMPT

system_prompt describes the general role of the language model, and is not templated.
summary_prompt, which is responsible for providing document summaries, and is templated with {document}
namer_prompt, which describes how topics should be named, and is templated with {keywords}
description_prompt, which dictates how topic descriptions should be generated and is templated with {keywords}

Documents are added at the end, when use_documents=True.

Click to see example

from turftopic.analyzers import LLMAnalyzer

system_prompt = """
You are a topic analyzer.
Follow instructions closely and exactly.
"""

namer_prompt = """
Please provide a human-readable name for a topic.
The topic is described by the following set of keywords: {keywords}.
"""

description_prompt = """
Describe the following topic in a couple of sentences.
The topic is described by the following set of keywords: {keywords}.
"""

summary_prompt = """
Summarize the following document: {document}
"""

namer = LLMAnalyzer(
    system_prompt=system_prompt,
    namer_prompt=namer_prompt,
    description_prompt=description_prompt,
    summary_prompt=summary_prompt
)

API Reference

`turftopic.analyzers.base.Analyzer`

Bases: ABC

Source code in turftopic/analyzers/base.py

class Analyzer(ABC):
    system_prompt = DEFAULT_SYSTEM_PROMPT
    summary_prompt = SUMMARY_PROMPT
    namer_prompt = NAMER_PROMPT
    description_prompt = DESCRIPTION_PROMPT
    context = None
    use_summaries = False

    @abstractmethod
    def generate_text(self, prompt: str) -> str:
        """Generates response to a given prompt."""
        pass

    def summarize_document(self, document: str) -> str:
        """Summarizes document so that analysis becomes easier."""
        prompt = self.summary_prompt.format(document=document)
        return self.generate_text(prompt)

    def describe_topic(
        self,
        keywords: list[str],
        documents: Optional[list] = None,
    ):
        """Gives abstract summarization of topic content."""
        _keys = ", ".join(keywords)
        prompt = self.description_prompt.format(keywords=_keys)
        if documents:
            prompt += self.template_documents(documents)
        if self.context:
            prompt += CONTEXT_TEMPLATE.format(context=self.context)
        return self.generate_text(prompt)

    def name_topic(
        self,
        keywords: list[str],
        documents: Optional[list] = None,
    ) -> str:
        """Names one topic based on top descriptive aspects."""
        _keys = ", ".join(keywords)
        prompt = self.namer_prompt.format(keywords=_keys)
        if documents:
            prompt += self.template_documents(documents)
        if self.context:
            prompt += CONTEXT_TEMPLATE.format(context=self.context)
        return self.generate_text(prompt)

    def name_topics(
        self,
        keywords: list[list[str]],
        documents: list[list[str]] = None,
    ) -> list[str]:
        """Names all topics based on top descriptive terms.

        Parameters
        ----------
        keywords: list[list[str]]
            Top K highest ranking terms on the topics.
        documents: list[list[str]], optional
            Top K relevant documents to each topic.

        Returns
        -------
        list[str]
            Topic names returned by the namer.
        """
        names = []
        if documents is not None:
            key_doc = list(zip(keywords, documents))
            for keys, docs in track(key_doc, description="Naming topics..."):
                names.append(self.name_topic(keys, documents=docs))
        else:
            for keys in track(keywords, description="Naming topics..."):
                names.append(self.name_topic(keys))
        return names

    def template_documents(self, documents: list[str]) -> str:
        doc_list = "\n".join([f" - {doc}" for doc in documents])
        return """
        In addition the topic is characterized by the following documents:
        {documents}
        """.format(
            documents=doc_list
        )

    def analyze_topics(
        self,
        keywords: list[list[str]],
        documents: Optional[list[list[str]]] = None,
        use_summaries: Optional[bool] = None,
    ) -> AnalysisResults:
        """Analyzes topic model with a language model.
        Generates topic names, descriptions and document summaries (optional).

        Parameters
        ----------
        keywords: list[list[str]]
            Keywords for each topic.
        documents: list[list[str]], optional
            Top documents for each topic.
        use_summaries: bool, optional
            Indicates whether the analyzer should summarize documents
            prior to analyzing the topic.

        Returns
        -------
        dict
            Dictionary containing `topic_names`, `topic_descriptions` and `document_summaries` if relevant.
        """
        console = Console()
        output = {"topic_names": [], "topic_descriptions": []}
        use_summaries = (
            use_summaries if use_summaries is not None else self.use_summaries
        )
        if documents is not None:
            if use_summaries:
                output["document_summaries"] = []
                for docs in track(
                    documents, description="Summarizing documents"
                ):
                    _sums = []
                    for doc in docs:
                        _sums.append(self.summarize_document(doc))
                    output["document_summaries"].append(_sums)
                console.log("Documents summarized.")
                # Updating parameter so summaries are used down-stream
                documents = output["document_summaries"]
            # Organizing into a list so we can iterate and know the length at the same time.
            key_doc_pairs = list(zip(keywords, documents))
            for keys, docs in track(
                key_doc_pairs, description="Generating topic names"
            ):
                output["topic_names"].append(
                    self.name_topic(keys, documents=docs)
                )
            console.log("Topic names generated.")
            for keys, docs in track(
                key_doc_pairs, description="Generating topic descriptions."
            ):
                output["topic_descriptions"].append(
                    self.describe_topic(keys, documents=docs)
                )
            console.log("Topic descriptions generated.")
        else:
            for keys in track(keywords, description="Naming"):
                output["topic_names"].append(
                    self.name_topic(keys, documents=None)
                )
            console.log("Topic names generated.")
            for keys in track(keywords, description="Describing topics."):
                output["topic_descriptions"].append(
                    self.describe_topic(keys, documents=None)
                )
            console.log("Topic descriptions generated.")
        return AnalysisResults(**output)

`analyze_topics(keywords, documents=None, use_summaries=None)`

Analyzes topic model with a language model. Generates topic names, descriptions and document summaries (optional).

Parameters:

Name	Type	Description	Default
`keywords`	`list[list[str]]`	Keywords for each topic.	required
`documents`	`Optional[list[list[str]]]`	Top documents for each topic.	`None`
`use_summaries`	`Optional[bool]`	Indicates whether the analyzer should summarize documents prior to analyzing the topic.	`None`

Returns:

Type	Description
`dict`	Dictionary containing `topic_names`, `topic_descriptions` and `document_summaries` if relevant.

Source code in turftopic/analyzers/base.py

def analyze_topics(
    self,
    keywords: list[list[str]],
    documents: Optional[list[list[str]]] = None,
    use_summaries: Optional[bool] = None,
) -> AnalysisResults:
    """Analyzes topic model with a language model.
    Generates topic names, descriptions and document summaries (optional).

    Parameters
    ----------
    keywords: list[list[str]]
        Keywords for each topic.
    documents: list[list[str]], optional
        Top documents for each topic.
    use_summaries: bool, optional
        Indicates whether the analyzer should summarize documents
        prior to analyzing the topic.

    Returns
    -------
    dict
        Dictionary containing `topic_names`, `topic_descriptions` and `document_summaries` if relevant.
    """
    console = Console()
    output = {"topic_names": [], "topic_descriptions": []}
    use_summaries = (
        use_summaries if use_summaries is not None else self.use_summaries
    )
    if documents is not None:
        if use_summaries:
            output["document_summaries"] = []
            for docs in track(
                documents, description="Summarizing documents"
            ):
                _sums = []
                for doc in docs:
                    _sums.append(self.summarize_document(doc))
                output["document_summaries"].append(_sums)
            console.log("Documents summarized.")
            # Updating parameter so summaries are used down-stream
            documents = output["document_summaries"]
        # Organizing into a list so we can iterate and know the length at the same time.
        key_doc_pairs = list(zip(keywords, documents))
        for keys, docs in track(
            key_doc_pairs, description="Generating topic names"
        ):
            output["topic_names"].append(
                self.name_topic(keys, documents=docs)
            )
        console.log("Topic names generated.")
        for keys, docs in track(
            key_doc_pairs, description="Generating topic descriptions."
        ):
            output["topic_descriptions"].append(
                self.describe_topic(keys, documents=docs)
            )
        console.log("Topic descriptions generated.")
    else:
        for keys in track(keywords, description="Naming"):
            output["topic_names"].append(
                self.name_topic(keys, documents=None)
            )
        console.log("Topic names generated.")
        for keys in track(keywords, description="Describing topics."):
            output["topic_descriptions"].append(
                self.describe_topic(keys, documents=None)
            )
        console.log("Topic descriptions generated.")
    return AnalysisResults(**output)

`describe_topic(keywords, documents=None)`

Gives abstract summarization of topic content.

Source code in turftopic/analyzers/base.py

def describe_topic(
    self,
    keywords: list[str],
    documents: Optional[list] = None,
):
    """Gives abstract summarization of topic content."""
    _keys = ", ".join(keywords)
    prompt = self.description_prompt.format(keywords=_keys)
    if documents:
        prompt += self.template_documents(documents)
    if self.context:
        prompt += CONTEXT_TEMPLATE.format(context=self.context)
    return self.generate_text(prompt)

`generate_text(prompt)` `abstractmethod`

Generates response to a given prompt.

Source code in turftopic/analyzers/base.py

@abstractmethod
def generate_text(self, prompt: str) -> str:
    """Generates response to a given prompt."""
    pass

`name_topic(keywords, documents=None)`

Names one topic based on top descriptive aspects.

Source code in turftopic/analyzers/base.py

def name_topic(
    self,
    keywords: list[str],
    documents: Optional[list] = None,
) -> str:
    """Names one topic based on top descriptive aspects."""
    _keys = ", ".join(keywords)
    prompt = self.namer_prompt.format(keywords=_keys)
    if documents:
        prompt += self.template_documents(documents)
    if self.context:
        prompt += CONTEXT_TEMPLATE.format(context=self.context)
    return self.generate_text(prompt)

`name_topics(keywords, documents=None)`

Names all topics based on top descriptive terms.

Parameters:

Name	Type	Description	Default
`keywords`	`list[list[str]]`	Top K highest ranking terms on the topics.	required
`documents`	`list[list[str]]`	Top K relevant documents to each topic.	`None`

Returns:

Type	Description
`list[str]`	Topic names returned by the namer.

Source code in turftopic/analyzers/base.py

def name_topics(
    self,
    keywords: list[list[str]],
    documents: list[list[str]] = None,
) -> list[str]:
    """Names all topics based on top descriptive terms.

    Parameters
    ----------
    keywords: list[list[str]]
        Top K highest ranking terms on the topics.
    documents: list[list[str]], optional
        Top K relevant documents to each topic.

    Returns
    -------
    list[str]
        Topic names returned by the namer.
    """
    names = []
    if documents is not None:
        key_doc = list(zip(keywords, documents))
        for keys, docs in track(key_doc, description="Naming topics..."):
            names.append(self.name_topic(keys, documents=docs))
    else:
        for keys in track(keywords, description="Naming topics..."):
            names.append(self.name_topic(keys))
    return names

`summarize_document(document)`

Summarizes document so that analysis becomes easier.

Source code in turftopic/analyzers/base.py

def summarize_document(self, document: str) -> str:
    """Summarizes document so that analysis becomes easier."""
    prompt = self.summary_prompt.format(document=document)
    return self.generate_text(prompt)

`turftopic.analyzers.hf_llm.LLMAnalyzer`

Bases: Analyzer

Analyze topic model with an open LLM from HF Hub.

Parameters:

Name	Type	Description	Default
`model_name`	`str`	Open LLM to use from HF Hub.	`'HuggingFaceTB/SmolLM3-3B'`
`use_summaries`	`bool`	Indicates whether the language model should summarize documents before analyzing the topics.	`False`
`context`	`Optional[str]`	Additional context provided to the analyzer for analysis. e.g. "Analyze topics from blog posts related to morality and religion"	`None`
`system_prompt`	`Optional[str]`	Ignored, exists for compatibility	`None`
`summary_prompt`	`Optional[str]`	Prompt to use for abstractive summarization.	`None`
`namer_prompt`	`Optional[str]`	Prompt template for naming topics.	`None`
`description_prompt`	`Optional[str]`	Prompt template for generating topic descriptions.	`None`
`device`	`str`	ID of the device to run the language model on.	`'cpu'`
`max_new_tokens`	`int`	Max new tokens to generate when analyzing.	`32768`
`enable_thinking`	`bool`	Indicates whether thinking mode should be enabled.	`False`

Source code in turftopic/analyzers/hf_llm.py

class LLMAnalyzer(Analyzer):
    """Analyze topic model with an open LLM from HF Hub.

    Parameters
    ----------
    model_name: str, default 'HuggingFaceTB/SmolLM3-3B'
        Open LLM to use from HF Hub.
    use_summaries: bool, default False
        Indicates whether the language model should summarize documents before
        analyzing the topics.
    context: str, default None
        Additional context provided to the analyzer for analysis.
        e.g. "Analyze topics from blog posts related to morality and religion"
    system_prompt: str, default None
        Ignored, exists for compatibility
    summary_prompt: str, default None
        Prompt to use for abstractive summarization.
    namer_prompt: str, default None
        Prompt template for naming topics.
    description_prompt: str, default None
        Prompt template for generating topic descriptions.
    device: str, default "cpu"
        ID of the device to run the language model on.
    max_new_tokens: int, default 32768
        Max new tokens to generate when analyzing.
    enable_thinking: bool, default False
        Indicates whether thinking mode should be enabled.
    """

    def __init__(
        self,
        model_name: str = "HuggingFaceTB/SmolLM3-3B",
        context: Optional[str] = None,
        use_summaries: bool = False,
        system_prompt: Optional[str] = None,
        summary_prompt: Optional[str] = None,
        namer_prompt: Optional[str] = None,
        description_prompt: Optional[str] = None,
        max_new_tokens: int = 32768,
        device: str = "cpu",
        enable_thinking: bool = False,
    ):
        self.device = device
        self.model_name = model_name
        # load the tokenizer and the model
        self.tokenizer = AutoTokenizer.from_pretrained(self.model_name)
        self.model = AutoModelForCausalLM.from_pretrained(
            self.model_name,
        ).to(self.device)
        self.summary_prompt = summary_prompt or self.summary_prompt
        self.namer_prompt = namer_prompt or self.namer_prompt
        self.description_prompt = description_prompt or self.description_prompt
        self.use_summaries = use_summaries
        self.max_new_tokens = max_new_tokens
        self.enable_thinking = enable_thinking

    def generate_text(self, prompt: str) -> str:
        thinking = "/think" if self.enable_thinking else "/no_think"
        system_prompt = self.system_prompt + thinking
        messages = [
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": prompt},
        ]
        text = self.tokenizer.apply_chat_template(
            messages,
            tokenize=False,
            add_generation_prompt=True,
        )
        model_inputs = self.tokenizer([text], return_tensors="pt").to(
            self.model.device
        )
        # Generate the output
        generated_ids = self.model.generate(
            **model_inputs, max_new_tokens=32768
        )
        # Get and decode the output
        output_ids = generated_ids[0][len(model_inputs.input_ids[0]) :]
        result = self.tokenizer.decode(output_ids, skip_special_tokens=True)
        result = remove_thinking_trace(result)
        return result

`turftopic.analyzers.openai.OpenAIAnalyzer`

Bases: Analyzer

Analyze topic model with an OpenAI LLM.

Parameters:

Name	Type	Description	Default
`model_name`	`str`	OpenAI model to use.	`'gpt-5-nano'`
`use_summaries`	`bool`	Indicates whether the language model should summarize documents before analyzing the topics.	`False`
`context`	`Optional[str]`	Additional context provided to the analyzer for analysis. e.g. "Analyze topics from blog posts related to morality and religion"	`None`
`system_prompt`	`Optional[str]`	System prompt to use for the language model.	`None`
`summary_prompt`	`Optional[str]`	Prompt to use for abstractive summarization.	`None`
`namer_prompt`	`Optional[str]`	Prompt template for naming topics.	`None`
`description_prompt`	`Optional[str]`	Prompt template for generating topic descriptions.	`None`

Source code in turftopic/analyzers/openai.py

class OpenAIAnalyzer(Analyzer):
    """Analyze topic model with an OpenAI LLM.

    Parameters
    ----------
    model_name: str, default 'gpt-5-nano'
        OpenAI model to use.
    use_summaries: bool, default False
        Indicates whether the language model should summarize documents before
        analyzing the topics.
    context: str, default None
        Additional context provided to the analyzer for analysis.
        e.g. "Analyze topics from blog posts related to morality and religion"
    system_prompt: str, default None
        System prompt to use for the language model.
    summary_prompt: str, default None
        Prompt to use for abstractive summarization.
    namer_prompt: str, default None
        Prompt template for naming topics.
    description_prompt: str, default None
        Prompt template for generating topic descriptions.
    """

    def __init__(
        self,
        model_name: str = "gpt-5-nano",
        context: Optional[str] = None,
        use_summaries: bool = False,
        system_prompt: Optional[str] = None,
        summary_prompt: Optional[str] = None,
        namer_prompt: Optional[str] = None,
        description_prompt: Optional[str] = None,
    ):
        self.client = openai.OpenAI()
        self.model_name = model_name
        self.system_prompt = system_prompt or self.system_prompt
        self.summary_prompt = summary_prompt or self.summary_prompt
        self.namer_prompt = namer_prompt or self.namer_prompt
        self.description_prompt = description_prompt or self.description_prompt
        self.use_summaries = use_summaries

    def generate_text(self, prompt: str) -> str:
        messages = [
            {"role": "system", "content": self.system_prompt},
            {"role": "user", "content": prompt},
        ]
        response = self.client.chat.completions.create(
            messages=messages,
            model=self.model_name,
        )
        return response.choices[0].message.content

`turftopic.analyzers.t5.T5Analyzer`

Bases: Analyzer

Analyze topic model with a text-to-text model.

Parameters:

Name	Type	Description	Default
`model_name`	`str`	Text-to-text model to use for analyses.	`'google/flan-t5-small'`
`use_summaries`	`bool`	Indicates whether the language model should summarize documents before analyzing the topics.	`False`
`context`	`Optional[str]`	Additional context provided to the analyzer for analysis. e.g. "Analyze topics from blog posts related to morality and religion"	`None`
`system_prompt`	`Optional[str]`	Ignored, exists for compatibility	`None`
`summary_prompt`	`Optional[str]`	Prompt to use for abstractive summarization.	`None`
`namer_prompt`	`Optional[str]`	Prompt template for naming topics.	`T5_NAME_PROMPT`
`description_prompt`	`Optional[str]`	Prompt template for generating topic descriptions.	`T5_DESC_PROMPT`
`device`	`str`	ID of the device to run the language model on.	`'cpu'`

Source code in turftopic/analyzers/t5.py

class T5Analyzer(Analyzer):
    """Analyze topic model with a text-to-text model.

    Parameters
    ----------
    model_name: str, default 'google/flan-t5-small'
        Text-to-text model to use for analyses.
    use_summaries: bool, default False
        Indicates whether the language model should summarize documents before
        analyzing the topics.
    context: str, default None
        Additional context provided to the analyzer for analysis.
        e.g. "Analyze topics from blog posts related to morality and religion"
    system_prompt: str, default None
        Ignored, exists for compatibility
    summary_prompt: str, default None
        Prompt to use for abstractive summarization.
    namer_prompt: str, default None
        Prompt template for naming topics.
    description_prompt: str, default None
        Prompt template for generating topic descriptions.
    device: str, default "cpu"
        ID of the device to run the language model on.
    """

    def __init__(
        self,
        model_name: str = "google/flan-t5-small",
        context: Optional[str] = None,
        use_summaries: bool = False,
        system_prompt: Optional[str] = None,
        summary_prompt: Optional[str] = None,
        namer_prompt: Optional[str] = T5_NAME_PROMPT,
        description_prompt: Optional[str] = T5_DESC_PROMPT,
        device: str = "cpu",
    ):
        self.device = device
        self.model_name = model_name
        self.pipeline = pipeline(
            task="text2text-generation",
            model=self.model_name,
            device=self.device,
        )
        self.summary_prompt = summary_prompt or self.summary_prompt
        self.namer_prompt = namer_prompt or self.namer_prompt
        self.description_prompt = description_prompt or self.description_prompt
        self.use_summaries = use_summaries

    def generate_text(self, prompt: str) -> str:
        return self.pipeline(prompt)

Topic Analysis with LLMs

Getting Started

Document summarization

Topic analysis

turftopic.analyzers.base.AnalysisResults dataclass

to_df()

to_dict()

Prompting

Providing Task Context

Fully Custom Prompts

API Reference

turftopic.analyzers.base.Analyzer

analyze_topics(keywords, documents=None, use_summaries=None)

describe_topic(keywords, documents=None)

generate_text(prompt) abstractmethod

name_topic(keywords, documents=None)

name_topics(keywords, documents=None)

summarize_document(document)

turftopic.analyzers.hf_llm.LLMAnalyzer

turftopic.analyzers.openai.OpenAIAnalyzer

turftopic.analyzers.t5.T5Analyzer

`turftopic.analyzers.base.AnalysisResults` `dataclass`

`to_df()`

`to_dict()`

`turftopic.analyzers.base.Analyzer`

`analyze_topics(keywords, documents=None, use_summaries=None)`

`describe_topic(keywords, documents=None)`

`generate_text(prompt)` `abstractmethod`

`name_topic(keywords, documents=None)`

`name_topics(keywords, documents=None)`

`summarize_document(document)`

`turftopic.analyzers.hf_llm.LLMAnalyzer`

`turftopic.analyzers.openai.OpenAIAnalyzer`

`turftopic.analyzers.t5.T5Analyzer`