Skip to content

Topic Analysis with LLMs

Topic analyzers are large language models, that are capable of interpreting topics' contents and can give human-readable descriptions of topics. This can be incredibly useful when it would require excessive manual labour to label and understand topics.

The role of analyzers in topic modelling.

Analyzers can do the following tasks:

  • Summarize documents to make it easier for your topic model to consume.
  • Name topics topics in a sensible and human-readable way based on top documents and keywords
  • Describe topics in a couple of sentences

While previously, smaller language models were not able to meaningfully accomplish this task, advances in in the field now allow you to generate highly accurate topic descriptions on your own laptop using the power of small LLMs.

Warning

The namers API is now deprecated and will be removed in Turftopic 1.1.0. Analyzers have full feature parity, and are able to accomplish way more.

Getting Started

There are multiple types of analyzers in Turftopic that you can utilize for these tasks, all of which can be imported for the analyzers module:

Choose an analyzer

LLMs from HF Hub are natively supported in Turftopic. Our default choice of LLM is SmolLM3-3B, as it runs effortlessly on consumer hardware, is permissively licensed, allowing commercial use, and generates high-quality output.

You can specify your model of choice by specifying model_name="<your_model_here>".

SmolLM is also fine-tuned for reasoning. This is disabled by default to reduce computational burden, but you can enable it by specifying enable_thinking=True.

from turftopic.analyzers import LLMAnalyzer

# We enable document summaries for topic analysis
analyzer = LLMAnalyzer(use_summaries=True)

You will have to install OpenAI, as it is not installed by default:

pip install turftopic[openai]
export OPENAI_API_KEY="sk-<your key goes here>"

The default model is gpt-5-nano, which is the cheapest new model in OpenAI's arsenal, and we found it generates satisfactory results.

from turftopic.analyzers import OpenAIAnalyzer

analyzer = OpenAIAnalyzer('gpt-5-nano')

T5 is less resource-intensive then causal language models, but it also generates lower quality results. You might have to fiddle around with it to get satisfactory results.

from turftopic import T5Analyzer

model = T5Analyzer("google/flan-t5-large")

Document summarization

You can utilize large-language models for summarizing documents as a pre-processing step. This might make it easier for certain topic models to find patterns. You can also instruct the language model to summarize documents from a certain aspect.

from turftopic import KeyNMF

# Your documents
corpus: list[str] = [...]

summarized_documents = [analyzer.summarize_document(doc) for doc in corpus]

# Then we fit the topic model on the document summaries, which might be easier to analyze
model = KeyNMF(10)
model.fit(summarized_documents)

Topic analysis

You can also use LLMs after having trained a topic model to analyze topics' contents. Analysis in this case consists of:

  1. Naming the topics in a model and
  2. giving a short description of its contents.

There are a number of options you should be aware of when doing this:

  • The LLMs will always utilize the top keywords extracted by a topic model
  • When use_documents is set to True (default), the analyzer will also use the top 10 documents from the topic model.
  • When use_summaries is active, the analyzer first summarizes top 10 documents before feeding them to the analyzer. This can be a massive help, since it makes it easier for the analyzer to process the content, and makes sure that the analyzer's context length is enough. It does require more computation, though.

Let's see what this looks like in action:

Analyze topics

from turftopic import KeyNMF
from turftopic.analyzers import LLMAnalyzer

analyzer = LLMAnalyzer(use_summaries=False)

model = KeyNMF(10).fit(corpus)
analysis_result = model.analyze_topics(analyzer, use_documents=True)
from turftopic import KeyNMF
from turftopic.analyzers import LLMAnalyzer

analyzer = LLMAnalyzer(use_summaries=False)

model = KeyNMF(10)
topic_data = model.prepare_topic_data(corpus)
analysis_result = topic_data.analyze_topics(analyzer, use_documents=True)

Topic Naming

If you only wish to assign topic names, but not generate a full analysis, you can still use rename_topics:

model.rename_topics(analyzer, use_documents=False)

This will do multiple things:

  1. Return an AnalysisResults object which contains: topic_names, topic_descriptions and document_summaries, which are the top documents' summaries, when applicable
  2. Set these properties on the object it gets called on (model or topic_data)

AnalysisResults can also be turned into a DataFrame or dictionary, by calling to_df() and to_dict() respectively.

analysis_result.to_df()
                                         topic_names                                 topic_descriptions
0                         Dialogue and Communication  This topic examines how conversation functions...
1  AI Assistant: Requesting Detailed User Informa...  It describes an assistant that asks the user f...
2          Ethical Generative AI and Language Models  It covers the design and deployment of generat...
3   French–English Translation in Law and Literature  It examines translation between French and Eng...
4  France: Social, Economic, Legal Information an...  It covers how social conversations in France e...
5                   Email-based Python code requests  It depicts a user making requests that involve...
6           Lesson Planning and Classroom Activities  It covers the school-based process of teaching...
7         French cultural conversations for children  It explores how people talk about culture in F...
8            Data Analytics Training and Development  It focuses on structured training programs tha...
9                 Sustainable Energy and Environment  It explores how energy production and use infl...

turftopic.analyzers.base.AnalysisResults dataclass

Container class for results of topic analysis.

Attributes:

Name Type Description
topic_names list[str]

Generated topic names.

topic_descriptions list[str]

Genreated topic descriptions.

document_summaries list[list[str]], default None

Summaries of top 10 documents for each topic, when use_summaries is enabled.

Source code in turftopic/analyzers/base.py
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
@dataclass
class AnalysisResults:
    """Container class for results of topic analysis.

    Attributes
    ----------
    topic_names: list[str]
        Generated topic names.
    topic_descriptions: list[str]
        Genreated topic descriptions.
    document_summaries: list[list[str]], default None
        Summaries of top 10 documents for each topic, when use_summaries is enabled.
    """

    topic_names: list[str]
    topic_descriptions: list[str]
    document_summaries: Optional[list[list[str]]] = None

    def to_dict(self) -> dict:
        """Returns the analysis result as a dictionary"""
        res = dict(
            topic_names=self.topic_names,
            topic_descriptions=self.topic_descriptions,
        )
        if self.document_summaries is not None:
            res["document_summaries"] = self.document_summaries
        return res

    def to_df(self):
        """Turns analysis result object into a dataframe"""
        try:
            import pandas as pd
        except ModuleNotFoundError:
            raise ModuleNotFoundError(
                "You need to pip install pandas to be able to use dataframes."
            )
        return pd.DataFrame(self.to_dict())

to_df()

Turns analysis result object into a dataframe

Source code in turftopic/analyzers/base.py
72
73
74
75
76
77
78
79
80
def to_df(self):
    """Turns analysis result object into a dataframe"""
    try:
        import pandas as pd
    except ModuleNotFoundError:
        raise ModuleNotFoundError(
            "You need to pip install pandas to be able to use dataframes."
        )
    return pd.DataFrame(self.to_dict())

to_dict()

Returns the analysis result as a dictionary

Source code in turftopic/analyzers/base.py
62
63
64
65
66
67
68
69
70
def to_dict(self) -> dict:
    """Returns the analysis result as a dictionary"""
    res = dict(
        topic_names=self.topic_names,
        topic_descriptions=self.topic_descriptions,
    )
    if self.document_summaries is not None:
        res["document_summaries"] = self.document_summaries
    return res

Prompting

You can instruct analyzers to specifically deal with the task you are trying to accomplish by using prompts. Here we will give an overview of how you can do this.

Providing Task Context

Sometimes you might have a specific task that might require additional information to analyze correctly. You can add information to the prompts by using the context attribute:

from turftopic.analyzers import LLMAnalyzer

analyzer = LLMAnalyzer(context="Analyze topical content in financial documents published by the central bank.")

Fully Custom Prompts

Since all analyzers are generative language models, you can prompt them however you wish. We provide default prompts, which we found to prove well, but you are more than free to modify these.

Prompts internally get formatted with str.format(), so all templated content should be in-between curly brackets. Analyzers have a number of prompts:

system_prompt = DEFAULT_SYSTEM_PROMPT
summary_prompt = SUMMARY_PROMPT
namer_prompt = NAMER_PROMPT
description_prompt = DESCRIPTION_PROMPT
  1. system_prompt describes the general role of the language model, and is not templated.
  2. summary_prompt, which is responsible for providing document summaries, and is templated with {document}
  3. namer_prompt, which describes how topics should be named, and is templated with {keywords}
  4. description_prompt, which dictates how topic descriptions should be generated and is templated with {keywords}

Documents are added at the end, when use_documents=True.

Click to see example
from turftopic.analyzers import LLMAnalyzer

system_prompt = """
You are a topic analyzer.
Follow instructions closely and exactly.
"""

namer_prompt = """
Please provide a human-readable name for a topic.
The topic is described by the following set of keywords: {keywords}.
"""

description_prompt = """
Describe the following topic in a couple of sentences.
The topic is described by the following set of keywords: {keywords}.
"""

summary_prompt = """
Summarize the following document: {document}
"""

namer = LLMAnalyzer(
    system_prompt=system_prompt,
    namer_prompt=namer_prompt,
    description_prompt=description_prompt,
    summary_prompt=summary_prompt
)

API Reference

turftopic.analyzers.base.Analyzer

Bases: ABC

Source code in turftopic/analyzers/base.py
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
class Analyzer(ABC):
    system_prompt = DEFAULT_SYSTEM_PROMPT
    summary_prompt = SUMMARY_PROMPT
    namer_prompt = NAMER_PROMPT
    description_prompt = DESCRIPTION_PROMPT
    context = None
    use_summaries = False

    @abstractmethod
    def generate_text(self, prompt: str) -> str:
        """Generates response to a given prompt."""
        pass

    def summarize_document(self, document: str) -> str:
        """Summarizes document so that analysis becomes easier."""
        prompt = self.summary_prompt.format(document=document)
        return self.generate_text(prompt)

    def describe_topic(
        self,
        keywords: list[str],
        documents: Optional[list] = None,
    ):
        """Gives abstract summarization of topic content."""
        _keys = ", ".join(keywords)
        prompt = self.description_prompt.format(keywords=_keys)
        if documents:
            prompt += self.template_documents(documents)
        if self.context:
            prompt += CONTEXT_TEMPLATE.format(context=self.context)
        return self.generate_text(prompt)

    def name_topic(
        self,
        keywords: list[str],
        documents: Optional[list] = None,
    ) -> str:
        """Names one topic based on top descriptive aspects."""
        _keys = ", ".join(keywords)
        prompt = self.namer_prompt.format(keywords=_keys)
        if documents:
            prompt += self.template_documents(documents)
        if self.context:
            prompt += CONTEXT_TEMPLATE.format(context=self.context)
        return self.generate_text(prompt)

    def name_topics(
        self,
        keywords: list[list[str]],
        documents: list[list[str]] = None,
    ) -> list[str]:
        """Names all topics based on top descriptive terms.

        Parameters
        ----------
        keywords: list[list[str]]
            Top K highest ranking terms on the topics.
        documents: list[list[str]], optional
            Top K relevant documents to each topic.

        Returns
        -------
        list[str]
            Topic names returned by the namer.
        """
        names = []
        if documents is not None:
            key_doc = list(zip(keywords, documents))
            for keys, docs in track(key_doc, description="Naming topics..."):
                names.append(self.name_topic(keys, documents=docs))
        else:
            for keys in track(keywords, description="Naming topics..."):
                names.append(self.name_topic(keys))
        return names

    def template_documents(self, documents: list[str]) -> str:
        doc_list = "\n".join([f" - {doc}" for doc in documents])
        return """
        In addition the topic is characterized by the following documents:
        {documents}
        """.format(
            documents=doc_list
        )

    def analyze_topics(
        self,
        keywords: list[list[str]],
        documents: Optional[list[list[str]]] = None,
        use_summaries: Optional[bool] = None,
    ) -> AnalysisResults:
        """Analyzes topic model with a language model.
        Generates topic names, descriptions and document summaries (optional).

        Parameters
        ----------
        keywords: list[list[str]]
            Keywords for each topic.
        documents: list[list[str]], optional
            Top documents for each topic.
        use_summaries: bool, optional
            Indicates whether the analyzer should summarize documents
            prior to analyzing the topic.

        Returns
        -------
        dict
            Dictionary containing `topic_names`, `topic_descriptions` and `document_summaries` if relevant.
        """
        console = Console()
        output = {"topic_names": [], "topic_descriptions": []}
        use_summaries = (
            use_summaries if use_summaries is not None else self.use_summaries
        )
        if documents is not None:
            if use_summaries:
                output["document_summaries"] = []
                for docs in track(
                    documents, description="Summarizing documents"
                ):
                    _sums = []
                    for doc in docs:
                        _sums.append(self.summarize_document(doc))
                    output["document_summaries"].append(_sums)
                console.log("Documents summarized.")
                # Updating parameter so summaries are used down-stream
                documents = output["document_summaries"]
            # Organizing into a list so we can iterate and know the length at the same time.
            key_doc_pairs = list(zip(keywords, documents))
            for keys, docs in track(
                key_doc_pairs, description="Generating topic names"
            ):
                output["topic_names"].append(
                    self.name_topic(keys, documents=docs)
                )
            console.log("Topic names generated.")
            for keys, docs in track(
                key_doc_pairs, description="Generating topic descriptions."
            ):
                output["topic_descriptions"].append(
                    self.describe_topic(keys, documents=docs)
                )
            console.log("Topic descriptions generated.")
        else:
            for keys in track(keywords, description="Naming"):
                output["topic_names"].append(
                    self.name_topic(keys, documents=None)
                )
            console.log("Topic names generated.")
            for keys in track(keywords, description="Describing topics."):
                output["topic_descriptions"].append(
                    self.describe_topic(keys, documents=None)
                )
            console.log("Topic descriptions generated.")
        return AnalysisResults(**output)

analyze_topics(keywords, documents=None, use_summaries=None)

Analyzes topic model with a language model. Generates topic names, descriptions and document summaries (optional).

Parameters:

Name Type Description Default
keywords list[list[str]]

Keywords for each topic.

required
documents Optional[list[list[str]]]

Top documents for each topic.

None
use_summaries Optional[bool]

Indicates whether the analyzer should summarize documents prior to analyzing the topic.

None

Returns:

Type Description
dict

Dictionary containing topic_names, topic_descriptions and document_summaries if relevant.

Source code in turftopic/analyzers/base.py
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
def analyze_topics(
    self,
    keywords: list[list[str]],
    documents: Optional[list[list[str]]] = None,
    use_summaries: Optional[bool] = None,
) -> AnalysisResults:
    """Analyzes topic model with a language model.
    Generates topic names, descriptions and document summaries (optional).

    Parameters
    ----------
    keywords: list[list[str]]
        Keywords for each topic.
    documents: list[list[str]], optional
        Top documents for each topic.
    use_summaries: bool, optional
        Indicates whether the analyzer should summarize documents
        prior to analyzing the topic.

    Returns
    -------
    dict
        Dictionary containing `topic_names`, `topic_descriptions` and `document_summaries` if relevant.
    """
    console = Console()
    output = {"topic_names": [], "topic_descriptions": []}
    use_summaries = (
        use_summaries if use_summaries is not None else self.use_summaries
    )
    if documents is not None:
        if use_summaries:
            output["document_summaries"] = []
            for docs in track(
                documents, description="Summarizing documents"
            ):
                _sums = []
                for doc in docs:
                    _sums.append(self.summarize_document(doc))
                output["document_summaries"].append(_sums)
            console.log("Documents summarized.")
            # Updating parameter so summaries are used down-stream
            documents = output["document_summaries"]
        # Organizing into a list so we can iterate and know the length at the same time.
        key_doc_pairs = list(zip(keywords, documents))
        for keys, docs in track(
            key_doc_pairs, description="Generating topic names"
        ):
            output["topic_names"].append(
                self.name_topic(keys, documents=docs)
            )
        console.log("Topic names generated.")
        for keys, docs in track(
            key_doc_pairs, description="Generating topic descriptions."
        ):
            output["topic_descriptions"].append(
                self.describe_topic(keys, documents=docs)
            )
        console.log("Topic descriptions generated.")
    else:
        for keys in track(keywords, description="Naming"):
            output["topic_names"].append(
                self.name_topic(keys, documents=None)
            )
        console.log("Topic names generated.")
        for keys in track(keywords, description="Describing topics."):
            output["topic_descriptions"].append(
                self.describe_topic(keys, documents=None)
            )
        console.log("Topic descriptions generated.")
    return AnalysisResults(**output)

describe_topic(keywords, documents=None)

Gives abstract summarization of topic content.

Source code in turftopic/analyzers/base.py
101
102
103
104
105
106
107
108
109
110
111
112
113
def describe_topic(
    self,
    keywords: list[str],
    documents: Optional[list] = None,
):
    """Gives abstract summarization of topic content."""
    _keys = ", ".join(keywords)
    prompt = self.description_prompt.format(keywords=_keys)
    if documents:
        prompt += self.template_documents(documents)
    if self.context:
        prompt += CONTEXT_TEMPLATE.format(context=self.context)
    return self.generate_text(prompt)

generate_text(prompt) abstractmethod

Generates response to a given prompt.

Source code in turftopic/analyzers/base.py
91
92
93
94
@abstractmethod
def generate_text(self, prompt: str) -> str:
    """Generates response to a given prompt."""
    pass

name_topic(keywords, documents=None)

Names one topic based on top descriptive aspects.

Source code in turftopic/analyzers/base.py
115
116
117
118
119
120
121
122
123
124
125
126
127
def name_topic(
    self,
    keywords: list[str],
    documents: Optional[list] = None,
) -> str:
    """Names one topic based on top descriptive aspects."""
    _keys = ", ".join(keywords)
    prompt = self.namer_prompt.format(keywords=_keys)
    if documents:
        prompt += self.template_documents(documents)
    if self.context:
        prompt += CONTEXT_TEMPLATE.format(context=self.context)
    return self.generate_text(prompt)

name_topics(keywords, documents=None)

Names all topics based on top descriptive terms.

Parameters:

Name Type Description Default
keywords list[list[str]]

Top K highest ranking terms on the topics.

required
documents list[list[str]]

Top K relevant documents to each topic.

None

Returns:

Type Description
list[str]

Topic names returned by the namer.

Source code in turftopic/analyzers/base.py
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
def name_topics(
    self,
    keywords: list[list[str]],
    documents: list[list[str]] = None,
) -> list[str]:
    """Names all topics based on top descriptive terms.

    Parameters
    ----------
    keywords: list[list[str]]
        Top K highest ranking terms on the topics.
    documents: list[list[str]], optional
        Top K relevant documents to each topic.

    Returns
    -------
    list[str]
        Topic names returned by the namer.
    """
    names = []
    if documents is not None:
        key_doc = list(zip(keywords, documents))
        for keys, docs in track(key_doc, description="Naming topics..."):
            names.append(self.name_topic(keys, documents=docs))
    else:
        for keys in track(keywords, description="Naming topics..."):
            names.append(self.name_topic(keys))
    return names

summarize_document(document)

Summarizes document so that analysis becomes easier.

Source code in turftopic/analyzers/base.py
96
97
98
99
def summarize_document(self, document: str) -> str:
    """Summarizes document so that analysis becomes easier."""
    prompt = self.summary_prompt.format(document=document)
    return self.generate_text(prompt)

turftopic.analyzers.hf_llm.LLMAnalyzer

Bases: Analyzer

Analyze topic model with an open LLM from HF Hub.

Parameters:

Name Type Description Default
model_name str

Open LLM to use from HF Hub.

'HuggingFaceTB/SmolLM3-3B'
use_summaries bool

Indicates whether the language model should summarize documents before analyzing the topics.

False
context Optional[str]

Additional context provided to the analyzer for analysis. e.g. "Analyze topics from blog posts related to morality and religion"

None
system_prompt Optional[str]

Ignored, exists for compatibility

None
summary_prompt Optional[str]

Prompt to use for abstractive summarization.

None
namer_prompt Optional[str]

Prompt template for naming topics.

None
description_prompt Optional[str]

Prompt template for generating topic descriptions.

None
device str

ID of the device to run the language model on.

'cpu'
max_new_tokens int

Max new tokens to generate when analyzing.

32768
enable_thinking bool

Indicates whether thinking mode should be enabled.

False
Source code in turftopic/analyzers/hf_llm.py
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
class LLMAnalyzer(Analyzer):
    """Analyze topic model with an open LLM from HF Hub.

    Parameters
    ----------
    model_name: str, default 'HuggingFaceTB/SmolLM3-3B'
        Open LLM to use from HF Hub.
    use_summaries: bool, default False
        Indicates whether the language model should summarize documents before
        analyzing the topics.
    context: str, default None
        Additional context provided to the analyzer for analysis.
        e.g. "Analyze topics from blog posts related to morality and religion"
    system_prompt: str, default None
        Ignored, exists for compatibility
    summary_prompt: str, default None
        Prompt to use for abstractive summarization.
    namer_prompt: str, default None
        Prompt template for naming topics.
    description_prompt: str, default None
        Prompt template for generating topic descriptions.
    device: str, default "cpu"
        ID of the device to run the language model on.
    max_new_tokens: int, default 32768
        Max new tokens to generate when analyzing.
    enable_thinking: bool, default False
        Indicates whether thinking mode should be enabled.
    """

    def __init__(
        self,
        model_name: str = "HuggingFaceTB/SmolLM3-3B",
        context: Optional[str] = None,
        use_summaries: bool = False,
        system_prompt: Optional[str] = None,
        summary_prompt: Optional[str] = None,
        namer_prompt: Optional[str] = None,
        description_prompt: Optional[str] = None,
        max_new_tokens: int = 32768,
        device: str = "cpu",
        enable_thinking: bool = False,
    ):
        self.device = device
        self.model_name = model_name
        # load the tokenizer and the model
        self.tokenizer = AutoTokenizer.from_pretrained(self.model_name)
        self.model = AutoModelForCausalLM.from_pretrained(
            self.model_name,
        ).to(self.device)
        self.summary_prompt = summary_prompt or self.summary_prompt
        self.namer_prompt = namer_prompt or self.namer_prompt
        self.description_prompt = description_prompt or self.description_prompt
        self.use_summaries = use_summaries
        self.max_new_tokens = max_new_tokens
        self.enable_thinking = enable_thinking

    def generate_text(self, prompt: str) -> str:
        thinking = "/think" if self.enable_thinking else "/no_think"
        system_prompt = self.system_prompt + thinking
        messages = [
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": prompt},
        ]
        text = self.tokenizer.apply_chat_template(
            messages,
            tokenize=False,
            add_generation_prompt=True,
        )
        model_inputs = self.tokenizer([text], return_tensors="pt").to(
            self.model.device
        )
        # Generate the output
        generated_ids = self.model.generate(
            **model_inputs, max_new_tokens=32768
        )
        # Get and decode the output
        output_ids = generated_ids[0][len(model_inputs.input_ids[0]) :]
        result = self.tokenizer.decode(output_ids, skip_special_tokens=True)
        result = remove_thinking_trace(result)
        return result

turftopic.analyzers.openai.OpenAIAnalyzer

Bases: Analyzer

Analyze topic model with an OpenAI LLM.

Parameters:

Name Type Description Default
model_name str

OpenAI model to use.

'gpt-5-nano'
use_summaries bool

Indicates whether the language model should summarize documents before analyzing the topics.

False
context Optional[str]

Additional context provided to the analyzer for analysis. e.g. "Analyze topics from blog posts related to morality and religion"

None
system_prompt Optional[str]

System prompt to use for the language model.

None
summary_prompt Optional[str]

Prompt to use for abstractive summarization.

None
namer_prompt Optional[str]

Prompt template for naming topics.

None
description_prompt Optional[str]

Prompt template for generating topic descriptions.

None
Source code in turftopic/analyzers/openai.py
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
class OpenAIAnalyzer(Analyzer):
    """Analyze topic model with an OpenAI LLM.

    Parameters
    ----------
    model_name: str, default 'gpt-5-nano'
        OpenAI model to use.
    use_summaries: bool, default False
        Indicates whether the language model should summarize documents before
        analyzing the topics.
    context: str, default None
        Additional context provided to the analyzer for analysis.
        e.g. "Analyze topics from blog posts related to morality and religion"
    system_prompt: str, default None
        System prompt to use for the language model.
    summary_prompt: str, default None
        Prompt to use for abstractive summarization.
    namer_prompt: str, default None
        Prompt template for naming topics.
    description_prompt: str, default None
        Prompt template for generating topic descriptions.
    """

    def __init__(
        self,
        model_name: str = "gpt-5-nano",
        context: Optional[str] = None,
        use_summaries: bool = False,
        system_prompt: Optional[str] = None,
        summary_prompt: Optional[str] = None,
        namer_prompt: Optional[str] = None,
        description_prompt: Optional[str] = None,
    ):
        self.client = openai.OpenAI()
        self.model_name = model_name
        self.system_prompt = system_prompt or self.system_prompt
        self.summary_prompt = summary_prompt or self.summary_prompt
        self.namer_prompt = namer_prompt or self.namer_prompt
        self.description_prompt = description_prompt or self.description_prompt
        self.use_summaries = use_summaries

    def generate_text(self, prompt: str) -> str:
        messages = [
            {"role": "system", "content": self.system_prompt},
            {"role": "user", "content": prompt},
        ]
        response = self.client.chat.completions.create(
            messages=messages,
            model=self.model_name,
        )
        return response.choices[0].message.content

turftopic.analyzers.t5.T5Analyzer

Bases: Analyzer

Analyze topic model with a text-to-text model.

Parameters:

Name Type Description Default
model_name str

Text-to-text model to use for analyses.

'google/flan-t5-small'
use_summaries bool

Indicates whether the language model should summarize documents before analyzing the topics.

False
context Optional[str]

Additional context provided to the analyzer for analysis. e.g. "Analyze topics from blog posts related to morality and religion"

None
system_prompt Optional[str]

Ignored, exists for compatibility

None
summary_prompt Optional[str]

Prompt to use for abstractive summarization.

None
namer_prompt Optional[str]

Prompt template for naming topics.

T5_NAME_PROMPT
description_prompt Optional[str]

Prompt template for generating topic descriptions.

T5_DESC_PROMPT
device str

ID of the device to run the language model on.

'cpu'
Source code in turftopic/analyzers/t5.py
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
class T5Analyzer(Analyzer):
    """Analyze topic model with a text-to-text model.

    Parameters
    ----------
    model_name: str, default 'google/flan-t5-small'
        Text-to-text model to use for analyses.
    use_summaries: bool, default False
        Indicates whether the language model should summarize documents before
        analyzing the topics.
    context: str, default None
        Additional context provided to the analyzer for analysis.
        e.g. "Analyze topics from blog posts related to morality and religion"
    system_prompt: str, default None
        Ignored, exists for compatibility
    summary_prompt: str, default None
        Prompt to use for abstractive summarization.
    namer_prompt: str, default None
        Prompt template for naming topics.
    description_prompt: str, default None
        Prompt template for generating topic descriptions.
    device: str, default "cpu"
        ID of the device to run the language model on.
    """

    def __init__(
        self,
        model_name: str = "google/flan-t5-small",
        context: Optional[str] = None,
        use_summaries: bool = False,
        system_prompt: Optional[str] = None,
        summary_prompt: Optional[str] = None,
        namer_prompt: Optional[str] = T5_NAME_PROMPT,
        description_prompt: Optional[str] = T5_DESC_PROMPT,
        device: str = "cpu",
    ):
        self.device = device
        self.model_name = model_name
        self.pipeline = pipeline(
            task="text2text-generation",
            model=self.model_name,
            device=self.device,
        )
        self.summary_prompt = summary_prompt or self.summary_prompt
        self.namer_prompt = namer_prompt or self.namer_prompt
        self.description_prompt = description_prompt or self.description_prompt
        self.use_summaries = use_summaries

    def generate_text(self, prompt: str) -> str:
        return self.pipeline(prompt)