Topic Analysis with LLMs
Topic analyzers are large language models, that are capable of interpreting topics' contents and can give human-readable descriptions of topics. This can be incredibly useful when it would require excessive manual labour to label and understand topics.

Analyzers can do the following tasks:
- Summarize documents to make it easier for your topic model to consume.
- Name topics topics in a sensible and human-readable way based on top documents and keywords
- Describe topics in a couple of sentences
While previously, smaller language models were not able to meaningfully accomplish this task, advances in in the field now allow you to generate highly accurate topic descriptions on your own laptop using the power of small LLMs.
Warning
The namers
API is now deprecated and will be removed in Turftopic 1.1.0. Analyzers have full feature parity, and are able to accomplish way more.
Getting Started
There are multiple types of analyzers in Turftopic that you can utilize for these tasks, all of which can be imported for the analyzers
module:
Choose an analyzer
LLMs from HF Hub are natively supported in Turftopic. Our default choice of LLM is SmolLM3-3B, as it runs effortlessly on consumer hardware, is permissively licensed, allowing commercial use, and generates high-quality output.
You can specify your model of choice by specifying model_name="<your_model_here>"
.
SmolLM is also fine-tuned for reasoning. This is disabled by default to reduce computational burden, but you can enable it by specifying enable_thinking=True
.
from turftopic.analyzers import LLMAnalyzer
# We enable document summaries for topic analysis
analyzer = LLMAnalyzer(use_summaries=True)
You will have to install OpenAI, as it is not installed by default:
pip install turftopic[openai]
export OPENAI_API_KEY="sk-<your key goes here>"
The default model is gpt-5-nano
, which is the cheapest new model in OpenAI's arsenal,
and we found it generates satisfactory results.
from turftopic.analyzers import OpenAIAnalyzer
analyzer = OpenAIAnalyzer('gpt-5-nano')
T5 is less resource-intensive then causal language models, but it also generates lower quality results. You might have to fiddle around with it to get satisfactory results.
from turftopic import T5Analyzer
model = T5Analyzer("google/flan-t5-large")
Document summarization
You can utilize large-language models for summarizing documents as a pre-processing step. This might make it easier for certain topic models to find patterns. You can also instruct the language model to summarize documents from a certain aspect.
from turftopic import KeyNMF
# Your documents
corpus: list[str] = [...]
summarized_documents = [analyzer.summarize_document(doc) for doc in corpus]
# Then we fit the topic model on the document summaries, which might be easier to analyze
model = KeyNMF(10)
model.fit(summarized_documents)
Topic analysis
You can also use LLMs after having trained a topic model to analyze topics' contents. Analysis in this case consists of:
- Naming the topics in a model and
- giving a short description of its contents.
There are a number of options you should be aware of when doing this:
- The LLMs will always utilize the top keywords extracted by a topic model
- When
use_documents
is set toTrue
(default), the analyzer will also use the top 10 documents from the topic model. - When
use_summaries
is active, the analyzer first summarizes top 10 documents before feeding them to the analyzer. This can be a massive help, since it makes it easier for the analyzer to process the content, and makes sure that the analyzer's context length is enough. It does require more computation, though.
Let's see what this looks like in action:
Analyze topics
from turftopic import KeyNMF
from turftopic.analyzers import LLMAnalyzer
analyzer = LLMAnalyzer(use_summaries=False)
model = KeyNMF(10).fit(corpus)
analysis_result = model.analyze_topics(analyzer, use_documents=True)
from turftopic import KeyNMF
from turftopic.analyzers import LLMAnalyzer
analyzer = LLMAnalyzer(use_summaries=False)
model = KeyNMF(10)
topic_data = model.prepare_topic_data(corpus)
analysis_result = topic_data.analyze_topics(analyzer, use_documents=True)
Topic Naming
If you only wish to assign topic names, but not generate a full analysis, you can still use rename_topics
:
model.rename_topics(analyzer, use_documents=False)
This will do multiple things:
- Return an
AnalysisResults
object which contains:topic_names
,topic_descriptions
anddocument_summaries
, which are the top documents' summaries, when applicable - Set these properties on the object it gets called on (
model
ortopic_data
)
AnalysisResults
can also be turned into a DataFrame or dictionary, by calling to_df()
and to_dict()
respectively.
analysis_result.to_df()
topic_names topic_descriptions
0 Dialogue and Communication This topic examines how conversation functions...
1 AI Assistant: Requesting Detailed User Informa... It describes an assistant that asks the user f...
2 Ethical Generative AI and Language Models It covers the design and deployment of generat...
3 French–English Translation in Law and Literature It examines translation between French and Eng...
4 France: Social, Economic, Legal Information an... It covers how social conversations in France e...
5 Email-based Python code requests It depicts a user making requests that involve...
6 Lesson Planning and Classroom Activities It covers the school-based process of teaching...
7 French cultural conversations for children It explores how people talk about culture in F...
8 Data Analytics Training and Development It focuses on structured training programs tha...
9 Sustainable Energy and Environment It explores how energy production and use infl...
turftopic.analyzers.base.AnalysisResults
dataclass
Container class for results of topic analysis.
Attributes:
Name | Type | Description |
---|---|---|
topic_names |
list[str]
|
Generated topic names. |
topic_descriptions |
list[str]
|
Genreated topic descriptions. |
document_summaries |
list[list[str]], default None
|
Summaries of top 10 documents for each topic, when use_summaries is enabled. |
Source code in turftopic/analyzers/base.py
44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 |
|
to_df()
Turns analysis result object into a dataframe
Source code in turftopic/analyzers/base.py
72 73 74 75 76 77 78 79 80 |
|
to_dict()
Returns the analysis result as a dictionary
Source code in turftopic/analyzers/base.py
62 63 64 65 66 67 68 69 70 |
|
Prompting
You can instruct analyzers to specifically deal with the task you are trying to accomplish by using prompts. Here we will give an overview of how you can do this.
Providing Task Context
Sometimes you might have a specific task that might require additional information to analyze correctly.
You can add information to the prompts by using the context
attribute:
from turftopic.analyzers import LLMAnalyzer
analyzer = LLMAnalyzer(context="Analyze topical content in financial documents published by the central bank.")
Fully Custom Prompts
Since all analyzers are generative language models, you can prompt them however you wish. We provide default prompts, which we found to prove well, but you are more than free to modify these.
Prompts internally get formatted with str.format()
, so all templated content should be in-between curly brackets.
Analyzers have a number of prompts:
system_prompt = DEFAULT_SYSTEM_PROMPT
summary_prompt = SUMMARY_PROMPT
namer_prompt = NAMER_PROMPT
description_prompt = DESCRIPTION_PROMPT
system_prompt
describes the general role of the language model, and is not templated.summary_prompt
, which is responsible for providing document summaries, and is templated with{document}
namer_prompt
, which describes how topics should be named, and is templated with{keywords}
description_prompt
, which dictates how topic descriptions should be generated and is templated with{keywords}
Documents are added at the end, when use_documents=True
.
Click to see example
from turftopic.analyzers import LLMAnalyzer
system_prompt = """
You are a topic analyzer.
Follow instructions closely and exactly.
"""
namer_prompt = """
Please provide a human-readable name for a topic.
The topic is described by the following set of keywords: {keywords}.
"""
description_prompt = """
Describe the following topic in a couple of sentences.
The topic is described by the following set of keywords: {keywords}.
"""
summary_prompt = """
Summarize the following document: {document}
"""
namer = LLMAnalyzer(
system_prompt=system_prompt,
namer_prompt=namer_prompt,
description_prompt=description_prompt,
summary_prompt=summary_prompt
)
API Reference
turftopic.analyzers.base.Analyzer
Bases: ABC
Source code in turftopic/analyzers/base.py
83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 |
|
analyze_topics(keywords, documents=None, use_summaries=None)
Analyzes topic model with a language model. Generates topic names, descriptions and document summaries (optional).
Parameters:
Name | Type | Description | Default |
---|---|---|---|
keywords |
list[list[str]]
|
Keywords for each topic. |
required |
documents |
Optional[list[list[str]]]
|
Top documents for each topic. |
None
|
use_summaries |
Optional[bool]
|
Indicates whether the analyzer should summarize documents prior to analyzing the topic. |
None
|
Returns:
Type | Description |
---|---|
dict
|
Dictionary containing |
Source code in turftopic/analyzers/base.py
167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 |
|
describe_topic(keywords, documents=None)
Gives abstract summarization of topic content.
Source code in turftopic/analyzers/base.py
101 102 103 104 105 106 107 108 109 110 111 112 113 |
|
generate_text(prompt)
abstractmethod
Generates response to a given prompt.
Source code in turftopic/analyzers/base.py
91 92 93 94 |
|
name_topic(keywords, documents=None)
Names one topic based on top descriptive aspects.
Source code in turftopic/analyzers/base.py
115 116 117 118 119 120 121 122 123 124 125 126 127 |
|
name_topics(keywords, documents=None)
Names all topics based on top descriptive terms.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
keywords |
list[list[str]]
|
Top K highest ranking terms on the topics. |
required |
documents |
list[list[str]]
|
Top K relevant documents to each topic. |
None
|
Returns:
Type | Description |
---|---|
list[str]
|
Topic names returned by the namer. |
Source code in turftopic/analyzers/base.py
129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 |
|
summarize_document(document)
Summarizes document so that analysis becomes easier.
Source code in turftopic/analyzers/base.py
96 97 98 99 |
|
turftopic.analyzers.hf_llm.LLMAnalyzer
Bases: Analyzer
Analyze topic model with an open LLM from HF Hub.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model_name |
str
|
Open LLM to use from HF Hub. |
'HuggingFaceTB/SmolLM3-3B'
|
use_summaries |
bool
|
Indicates whether the language model should summarize documents before analyzing the topics. |
False
|
context |
Optional[str]
|
Additional context provided to the analyzer for analysis. e.g. "Analyze topics from blog posts related to morality and religion" |
None
|
system_prompt |
Optional[str]
|
Ignored, exists for compatibility |
None
|
summary_prompt |
Optional[str]
|
Prompt to use for abstractive summarization. |
None
|
namer_prompt |
Optional[str]
|
Prompt template for naming topics. |
None
|
description_prompt |
Optional[str]
|
Prompt template for generating topic descriptions. |
None
|
device |
str
|
ID of the device to run the language model on. |
'cpu'
|
max_new_tokens |
int
|
Max new tokens to generate when analyzing. |
32768
|
enable_thinking |
bool
|
Indicates whether thinking mode should be enabled. |
False
|
Source code in turftopic/analyzers/hf_llm.py
17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 |
|
turftopic.analyzers.openai.OpenAIAnalyzer
Bases: Analyzer
Analyze topic model with an OpenAI LLM.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model_name |
str
|
OpenAI model to use. |
'gpt-5-nano'
|
use_summaries |
bool
|
Indicates whether the language model should summarize documents before analyzing the topics. |
False
|
context |
Optional[str]
|
Additional context provided to the analyzer for analysis. e.g. "Analyze topics from blog posts related to morality and religion" |
None
|
system_prompt |
Optional[str]
|
System prompt to use for the language model. |
None
|
summary_prompt |
Optional[str]
|
Prompt to use for abstractive summarization. |
None
|
namer_prompt |
Optional[str]
|
Prompt template for naming topics. |
None
|
description_prompt |
Optional[str]
|
Prompt template for generating topic descriptions. |
None
|
Source code in turftopic/analyzers/openai.py
8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 |
|
turftopic.analyzers.t5.T5Analyzer
Bases: Analyzer
Analyze topic model with a text-to-text model.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model_name |
str
|
Text-to-text model to use for analyses. |
'google/flan-t5-small'
|
use_summaries |
bool
|
Indicates whether the language model should summarize documents before analyzing the topics. |
False
|
context |
Optional[str]
|
Additional context provided to the analyzer for analysis. e.g. "Analyze topics from blog posts related to morality and religion" |
None
|
system_prompt |
Optional[str]
|
Ignored, exists for compatibility |
None
|
summary_prompt |
Optional[str]
|
Prompt to use for abstractive summarization. |
None
|
namer_prompt |
Optional[str]
|
Prompt template for naming topics. |
T5_NAME_PROMPT
|
description_prompt |
Optional[str]
|
Prompt template for generating topic descriptions. |
T5_DESC_PROMPT
|
device |
str
|
ID of the device to run the language model on. |
'cpu'
|
Source code in turftopic/analyzers/t5.py
11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 |
|