Skip to content

Seeded Topic Modeling

When investigating a set of documents, you might already have an idea about what aspects you would like to explore. Some models are able to account for this by taking seed phrases or words. This is currently only possible with KeyNMF in Turftopic, but will likely be extended in the future.

In KeyNMF, you can describe the aspect, from which you want to investigate your corpus, using a free-text seed-phrase, which will then be used to only extract topics, which are relevant to your research question.

In this example we investigate the 20Newsgroups corpus from three different aspects:

from sklearn.datasets import fetch_20newsgroups

from turftopic import KeyNMF

corpus = fetch_20newsgroups(
    subset="all",
    remove=("headers", "footers", "quotes"),
).data

model = KeyNMF(5, seed_phrase="<your seed phrase>")
model.fit(corpus)

model.print_topics()
Topic ID Highest Ranking
0 morality, moral, immoral, morals, objective, morally, animals, society, species, behavior
1 armenian, armenians, genocide, armenia, turkish, turks, soviet, massacre, azerbaijan, kurdish
2 murder, punishment, death, innocent, penalty, kill, crime, moral, criminals, executed
3 gun, guns, firearms, crime, handgun, firearm, weapons, handguns, law, criminals
4 jews, israeli, israel, god, jewish, christians, sin, christian, palestinians, christianity
Topic ID Highest Ranking
0 atheist, atheists, religion, religious, theists, beliefs, christianity, christian, religions, agnostic
1 bible, christians, christian, christianity, church, scripture, religion, jesus, faith, biblical
2 god, existence, exist, exists, universe, creation, argument, creator, believe, life
3 believe, faith, belief, evidence, blindly, believing, gods, believed, beliefs, convince
4 atheism, atheists, agnosticism, belief, arguments, believe, existence, alt, believing, argument
Topic ID Highest Ranking
0 windows, dos, os, microsoft, ms, apps, pc, nt, file, shareware
1 ram, motherboard, card, monitor, memory, cpu, vga, mhz, bios, intel
2 unix, os, linux, intel, systems, programming, applications, compiler, software, platform
3 disk, scsi, disks, drive, floppy, drives, dos, controller, cd, boot
4 software, mac, hardware, ibm, graphics, apple, computer, pc, modem, program