TopSBM: Topic Modeling with Stochastic Block Models¶

A basic tutorial.

In [1]:
%load_ext autoreload
%autoreload 2

import os
import pylab as plt
%matplotlib inline  

from sbmtm import sbmtm
import graph_tool.all as gt

Setup: Load a corpus¶

  1. We have a list of documents, each document contains a list of words.

  2. We have a list of document titles (optional)

The example corpus consists of 63 articles from Wikipedia taken from 3 different categories (Experimental Physics, Chemical Physics, and Computational Biology).

In [2]:
path_data = 'data/'

## texts
fname_data = 'corpus.txt'
filename = os.path.join(path_data,fname_data)

with open(filename,'r', encoding = 'utf8') as f:
    x = f.readlines()
texts = [h.split() for h in x]

## titles
fname_data = 'titles.txt'
filename = os.path.join(path_data,fname_data)

with open(filename,'r', encoding = 'utf8') as f:
    x = f.readlines()
titles = [h.split()[0] for h in x]
In [3]:
i_doc = 0
print(titles[0])
print(texts[i_doc][:10])
Nuclear_Overhauser_effect
['the', 'nuclear', 'overhauser', 'effect', 'noe', 'is', 'the', 'transfer', 'of', 'nuclear']

Fitting the model¶

In [4]:
## we create an instance of the sbmtm-class
model = sbmtm()

## we have to create the word-document network from the corpus
model.make_graph(texts,documents=titles)

## we can also skip the previous step by saving/loading a graph
# model.save_graph(filename = 'graph.xml.gz')
# model.load_graph(filename = 'graph.xml.gz')

## fit the model
gt.seed_rng(32) ## seed for graph-tool's random number generator --> same results
model.fit()
<NestedBlockState object, with base <BlockState object with 3203 blocks (123 nonempty), degree-corrected, for graph <Graph object, undirected, with 3203 vertices and 13050 edges, 2 internal vertex properties, 1 internal edge property, at 0x7392097ecb60>, at 0x7391fc568170>, and 5 levels of sizes [(3203, 123), (123, 27), (27, 5), (5, 2), (2, 1)] at 0x7391fc56a4e0>

Plotting the result¶

The output shows the (hierarchical) community structure in the word-document network as inferred by the stochastic block model:

  • document-nodes are on the left
  • word-nodes are on the right
  • different colors correspond to the different groups

The result is a grouping of nodes into groups on multiple levels in the hierarchy:

  • on the uppermost level, each node belongs to the same group (square in the middle)
  • on the next-lower level, we split the network into two groups: the word-nodes and the document-nodes (blue sqaures to the left and right, respectively). This is a trivial structure due to the bipartite character of the network.
  • only next lower levels constitute a non-trivial structure: We now further divide nodes into smaller groups (document-nodes into document-groups on the left and word-nodes into word-groups on the right)
In [5]:
model.plot(nedges=10000)
No description has been provided for this image

The basics¶

Topics¶

For each word-group on a given level in the hierarchy, we retrieve the $n$ most common words in each group -- these are the topics!

In [6]:
model.topics(l=1,n=20)
Out[6]:
{0: [('the', 0.20760347735824575),
  ('of', 0.11716621253405994),
  ('a', 0.06799013883482548),
  ('to', 0.06591410406124303),
  ('in', 0.0635136888542883),
  ('is', 0.05261450629298041),
  ('as', 0.025885558583106268),
  ('that', 0.02095497599584793),
  ('are', 0.020046710782405604),
  ('by', 0.018554560788893212),
  ('be', 0.01797067600882315),
  ('with', 0.01589464123524069),
  ('an', 0.015700012975217333),
  ('this', 0.014467367328402751),
  ('can', 0.013429349941611522),
  ('or', 0.012521084728169197),
  ('from', 0.012456208641494744),
  ('it', 0.012196704294796938),
  ('at', 0.010380173867912288),
  ('used', 0.007850006487608667)],
 1: [('formula', 0.13903394255874674),
  ('electron', 0.03981723237597911),
  ('x', 0.033942558746736295),
  ('spin', 0.030678851174934726),
  ('surface', 0.030026109660574413),
  ('magnetic', 0.026762402088772844),
  ('electrons', 0.022845953002610966),
  ('effect', 0.022193211488250653),
  ('ray', 0.02154046997389034),
  ('nuclear', 0.018929503916449087),
  ('polarization', 0.018276762402088774),
  ('observed', 0.016318537859007835),
  ('intensity', 0.015665796344647518),
  ('sample', 0.015665796344647518),
  ('direction', 0.015013054830287207),
  ('cross', 0.010443864229765013),
  ('external', 0.010443864229765013),
  ('dnp', 0.0097911227154047),
  ('defined', 0.0097911227154047),
  ('left', 0.0097911227154047)],
 2: [('analysis', 0.09090909090909091),
  ('chemical', 0.039160839160839164),
  ('angle', 0.026573426573426574),
  ('proton', 0.025174825174825177),
  ('shown', 0.025174825174825177),
  ('loss', 0.016783216783216783),
  ('fermi', 0.016783216783216783),
  ('elements', 0.016783216783216783),
  ('empirical', 0.015384615384615385),
  ('oscillations', 0.015384615384615385),
  ('ratio', 0.013986013986013986),
  ('noe', 0.013986013986013986),
  ('whereas', 0.012587412587412588),
  ('fuel', 0.012587412587412588),
  ('landau', 0.011188811188811189),
  ('knot', 0.011188811188811189),
  ('shows', 0.011188811188811189),
  ('shell', 0.009790209790209791),
  ('compounds', 0.009790209790209791),
  ('micropixe', 0.009790209790209791)],
 3: [('and', 0.12306679828891083),
  ('for', 0.04694526708347044),
  ('on', 0.027201930459581),
  ('which', 0.022485466710540747),
  ('s', 0.0202917626412197),
  ('folding', 0.017439947351102335),
  ('was', 0.017001206537238127),
  ('home', 0.013930020840188658),
  ('such', 0.01173631677086761),
  ('these', 0.011407261160469452),
  ('also', 0.00943292749808051),
  ('its', 0.008884501480750246),
  ('first', 0.008774816277284195),
  ('molecular', 0.008445760666886038),
  ('more', 0.008336075463419985),
  ('been', 0.007568279039157618),
  ('if', 0.007239223428759461),
  ('but', 0.007239223428759461),
  ('research', 0.007239223428759461),
  ('time', 0.007019853021827355)],
 4: [('structure', 0.0982367758186398),
  ('computer', 0.05667506297229219),
  ('model', 0.05667506297229219),
  ('high', 0.05163727959697733),
  ('function', 0.042821158690176324),
  ('models', 0.03904282115869018),
  ('real', 0.031486146095717885),
  ('potential', 0.028967254408060455),
  ('fields', 0.027707808564231738),
  ('structural', 0.02644836272040302),
  ('de', 0.02392947103274559),
  ('usually', 0.020151133501259445),
  ('types', 0.020151133501259445),
  ('generally', 0.020151133501259445),
  ('k', 0.020151133501259445),
  ('group', 0.018891687657430732),
  ('modeling', 0.018891687657430732),
  ('short', 0.018891687657430732),
  ('levels', 0.018891687657430732),
  ('presence', 0.017632241813602016)],
 5: [('energy', 0.05807002561912895),
  ('law', 0.04867634500426986),
  ('momentum', 0.04782237403928266),
  ('molecule', 0.04269854824935952),
  ('motion', 0.035866780529461996),
  ('body', 0.035012809564474806),
  ('frequency', 0.032450896669513236),
  ('laws', 0.030742954739538857),
  ('normal', 0.029888983774551667),
  ('second', 0.026473099914602904),
  ('vibration', 0.01964133219470538),
  ('angular', 0.01964133219470538),
  ('equal', 0.018787361229718188),
  ('rotational', 0.018787361229718188),
  ('frame', 0.017079419299743808),
  ('above', 0.017079419299743808),
  ('vibrational', 0.016225448334756618),
  ('o', 0.015371477369769428),
  ('spectroscopy', 0.015371477369769428),
  ('coordinates', 0.014517506404782237)],
 6: [('physics', 0.028583264291632146),
  ('force', 0.01988400994200497),
  ('uncertainty', 0.01946975973487987),
  ('newton', 0.017812758906379452),
  ('experiments', 0.016570008285004142),
  ('mechanics', 0.013670256835128418),
  ('velocity', 0.013256006628003313),
  ('detector', 0.01284175642087821),
  ('classical', 0.012013256006628004),
  ('dimensional', 0.011184755592377795),
  ('imaging', 0.010770505385252692),
  ('distribution', 0.010356255178127589),
  ('product', 0.009527754763877383),
  ('physical', 0.009113504556752278),
  ('reference', 0.008699254349627174),
  ('position', 0.008699254349627174),
  ('ion', 0.008285004142502071),
  ('laser', 0.008285004142502071),
  ('early', 0.007870753935376968),
  ('measurement', 0.007870753935376968)],
 7: [('when', 0.04302832244008715),
  ('mass', 0.032679738562091505),
  ('quantum', 0.03104575163398693),
  ('atoms', 0.02505446623093682),
  ('applied', 0.020697167755991286),
  ('often', 0.020697167755991286),
  ('atomic', 0.0196078431372549),
  ('number', 0.01906318082788671),
  ('technique', 0.018518518518518517),
  ('atom', 0.016339869281045753),
  ('scattering', 0.015250544662309368),
  ('i', 0.014161220043572984),
  ('polarizability', 0.013616557734204794),
  ('approximation', 0.013071895424836602),
  ('particles', 0.013071895424836602),
  ('therefore', 0.01252723311546841),
  ('c', 0.011982570806100218),
  ('constant', 0.011982570806100218),
  ('nucleon', 0.011437908496732025),
  ('h', 0.010893246187363835)],
 8: [('protein', 0.09494535519125682),
  ('proteins', 0.031420765027322405),
  ('software', 0.028688524590163935),
  ('project', 0.028005464480874317),
  ('structures', 0.018442622950819672),
  ('determine', 0.01366120218579235),
  ('cell', 0.01366120218579235),
  ('specific', 0.01366120218579235),
  ('assembly', 0.01366120218579235),
  ('native', 0.012295081967213115),
  ('approaches', 0.012295081967213115),
  ('prediction', 0.012295081967213115),
  ('core', 0.012295081967213115),
  ('functions', 0.011612021857923498),
  ('mutations', 0.01092896174863388),
  ('cancer', 0.01092896174863388),
  ('transcriptome', 0.01092896174863388),
  ('company', 0.010245901639344262),
  ('cells', 0.009562841530054645),
  ('functional', 0.009562841530054645)],
 9: [('source', 0.044657097288676235),
  ('point', 0.04226475279106858),
  ('theory', 0.037480063795853266),
  ('linear', 0.03508771929824561),
  ('light', 0.03110047846889952),
  ('interactions', 0.030303030303030304),
  ('wave', 0.024720893141945772),
  ('sources', 0.023923444976076555),
  ('air', 0.0215311004784689),
  ('waves', 0.02073365231259968),
  ('water', 0.019936204146730464),
  ('noise', 0.019138755980861243),
  ('tank', 0.017543859649122806),
  ('properties', 0.01594896331738437),
  ('radiation', 0.013556618819776715),
  ('line', 0.013556618819776715),
  ('similar', 0.012759170653907496),
  ('effective', 0.012759170653907496),
  ('fluid', 0.011961722488038277),
  ('diffraction', 0.011164274322169059)],
 10: [('data', 0.048013245033112585),
  ('sequence', 0.025938189845474614),
  ('genome', 0.02152317880794702),
  ('available', 0.018763796909492272),
  ('biological', 0.018763796909492272),
  ('iscb', 0.018211920529801324),
  ('conference', 0.017660044150110375),
  ('gene', 0.01545253863134658),
  ('sequencing', 0.014900662251655629),
  ('society', 0.014900662251655629),
  ('sequences', 0.01434878587196468),
  ('genes', 0.01434878587196468),
  ('algorithms', 0.01379690949227373),
  ('dna', 0.012693156732891833),
  ('ismb', 0.011589403973509934),
  ('complex', 0.011589403973509934),
  ('tools', 0.009933774834437087),
  ('current', 0.009933774834437087),
  ('year', 0.009381898454746136),
  ('application', 0.008278145695364239)],
 11: [('experimental', 0.06167400881057269),
  ('he', 0.041116005873715125),
  ('his', 0.0381791483113069),
  ('experiment', 0.03671071953010279),
  ('pauli', 0.030837004405286344),
  ('accelerator', 0.030837004405286344),
  ('stanford', 0.027900146842878122),
  ('main', 0.024963289280469897),
  ('laboratory', 0.022026431718061675),
  ('material', 0.020558002936857563),
  ('slac', 0.020558002936857563),
  ('target', 0.01908957415565345),
  ('during', 0.01908957415565345),
  ('strong', 0.01908957415565345),
  ('physicist', 0.014684287812041116),
  ('factor', 0.014684287812041116),
  ('haas', 0.014684287812041116),
  ('history', 0.014684287812041116),
  ('synchrotron', 0.013215859030837005),
  ('national', 0.013215859030837005)],
 12: [('field', 0.16181229773462782),
  ('electric', 0.08737864077669903),
  ('beam', 0.061488673139158574),
  ('case', 0.038834951456310676),
  ('relative', 0.02912621359223301),
  ('charge', 0.02912621359223301),
  ('measure', 0.02912621359223301),
  ('dipole', 0.025889967637540454),
  ('beams', 0.022653721682847898),
  ('moment', 0.022653721682847898),
  ('shift', 0.021035598705501618),
  ('liquid', 0.021035598705501618),
  ('metal', 0.019417475728155338),
  ('induced', 0.019417475728155338),
  ('detect', 0.01779935275080906),
  ('faraday', 0.014563106796116505),
  ('rule', 0.014563106796116505),
  ('deflection', 0.012944983818770227),
  ('charged', 0.012944983818770227),
  ('fig', 0.012944983818770227)],
 13: [('computational', 0.0873015873015873),
  ('bioinformatics', 0.08276643990929705),
  ('biology', 0.05782312925170068),
  ('large', 0.045351473922902494),
  ('university', 0.04195011337868481),
  ('new', 0.03741496598639456),
  ('program', 0.027210884353741496),
  ('researchers', 0.026077097505668934),
  ('sciences', 0.02040816326530612),
  ('biouml', 0.02040816326530612),
  ('now', 0.017006802721088437),
  ('institute', 0.015873015873015872),
  ('include', 0.015873015873015872),
  ('networks', 0.013605442176870748),
  ('contributions', 0.013605442176870748),
  ('network', 0.012471655328798186),
  ('before', 0.012471655328798186),
  ('low', 0.012471655328798186),
  ('platform', 0.011337868480725623),
  ('microarray', 0.011337868480725623)]}

Topic-distribution in each document¶

Which topics contribute to each document?

In [7]:
## select a document (by its index)
i_doc = 0
print(model.documents[i_doc])
## get a list of tuples (topic-index, probability)
model.topicdist(i_doc,l=1)
Nuclear_Overhauser_effect
Out[7]:
[(0, 0.3881118881118881),
 (1, 0.17832167832167833),
 (2, 0.0944055944055944),
 (3, 0.14335664335664336),
 (4, 0.013986013986013986),
 (5, 0.02097902097902098),
 (6, 0.038461538461538464),
 (7, 0.03496503496503497),
 (8, 0.02097902097902098),
 (9, 0.02097902097902098),
 (10, 0.02097902097902098),
 (11, 0.01048951048951049),
 (12, 0.006993006993006993),
 (13, 0.006993006993006993)]

Extra: Clustering of documents - for free.¶

The stochastic block models clusters the documents into groups. We do not need to run an additional clustering to obtain this grouping.

In [8]:
model.clusters(l=1,n=5)
Out[8]:
{0: [('Nuclear_Overhauser_effect', 1.0),
  ('Empirical_formula', 1.0),
  ('Magic_angle_(EELS)', 1.0),
  ('Fuel_mass_fraction', 1.0),
  ('Dynamic_mode_decomposition', 1.0)],
 1: [('Reactive_empirical_bond_order', 1.0),
  ('Rotating_wave_approximation', 1.0),
  ('Rovibrational_coupling', 1.0),
  ('Complementary_experiments', 1.0),
  ('Rotational_transition', 1.0)],
 2: [('Wave_tank', 1.0),
  ('Ripple_tank', 1.0),
  ('Effective_field_theory', 1.0),
  ('Line_source', 1.0),
  ('Point_source', 1.0)],
 3: [('Philosophical_interpretation_of_classical_physics', 1.0),
  ('Experimental_physics', 1.0),
  ('Chemical_physics', 1.0),
  ('Uncertainty', 1.0)],
 4: [('Elevator_paradox_(physics)', 1.0),
  ('X-ray_crystal_truncation_rod', 1.0),
  ('Dynamic_nuclear_polarisation', 1.0),
  ('X-ray_standing_waves', 1.0)],
 5: [('Knight_shift', 1.0),
  ('Anisotropic_liquid', 1.0),
  ('Electrostatic_deflection_(structural_element)', 1.0),
  ('Molecular_beam', 1.0),
  ('Faraday_cup_electrometer', 1.0)],
 6: [('Polarizability', 1.0)],
 7: [('Einstein–de_Haas_effect', 1.0),
  ('Holometer', 1.0),
  ('RRKM_theory', 1.0),
  ('Fragment_separator', 1.0),
  ('Pauli_effect', 1.0)],
 8: [("Newton's_laws_of_motion", 1.0), ('Photofragment-ion_imaging', 1.0)],
 9: [('SLAC_National_Accelerator_Laboratory', 1.0)],
 10: [('Bioinformatics', 1.0), ('Folding@home', 1.0)],
 11: [('IEEE/ACM_Transactions_on_Computational_Biology_and_Bioinformatics',
   1.0),
  ('BioUML', 1.0),
  ('Sepp_Hochreiter', 1.0),
  ('Computational_biology', 1.0),
  ('Journal_of_Computational_Biology', 1.0)],
 12: [('De_novo_transcriptome_assembly', 1.0),
  ('Enzyme_Function_Initiative', 1.0),
  ('Foldit', 1.0),
  ('Premier_Biosoft', 1.0),
  ('Louis_and_Beatrice_Laufer_Center_for_Physical_and_Quantitative_Biology',
   1.0)]}

Application -- Finding similar articles:

For a query-article, we return all articles from the same group

In [9]:
## select a document (index)
i_doc = 2
print(i_doc,model.documents[i_doc])
## find all articles from the same group
## print: (doc-index, doc-title)
model.clusters_query(i_doc,l=1,)
2 Rovibrational_coupling
Out[9]:
[(5, 'Rotational_transition'),
 (10, 'Rotating_wave_approximation'),
 (12, 'Molecular_vibration'),
 (16, 'Reactive_empirical_bond_order'),
 (20, 'Ziff-Gulari-Barshad_model'),
 (29, 'Complementary_experiments'),
 (42, "Euler's_laws_of_motion"),
 (54, 'Law_of_Maximum')]

More technical: Group membership¶

In the stochastic block model, word (-nodes) and document (-nodes) are clustered into different groups.

The group membership can be represented by the conditional probability $P(\text{group}\, |\, \text{node})$. Since words and documents belong to different groups (the word-document network is bipartite) we can show separately:

  • P(bd | d), the probability of document $d$ to belong to document group $bd$
  • P(bw | w), the probability of word $w$ to belong to word group $bw$.
In [10]:
p_td_d,p_tw_w = model.group_membership(l=1)

plt.figure(figsize=(15,4))
plt.subplot(121)
plt.imshow(p_td_d,origin='lower',aspect='auto',interpolation='none')
plt.title(r'Document group membership $P(bd | d)$')
plt.xlabel('Document d (index)')
plt.ylabel('Document group, bd')
plt.colorbar()

plt.subplot(122)
plt.imshow(p_tw_w,origin='lower',aspect='auto',interpolation='none')
plt.title(r'Word group membership $P(bw | w)$')
plt.xlabel('Word w (index)')
plt.ylabel('Word group, bw')
plt.colorbar()
Out[10]:
<matplotlib.colorbar.Colorbar at 0x7391fc40e330>
No description has been provided for this image

Relative topical distribution¶

Compare the frequency $f^i_d$ of words from topic $i$ in document $d$ with the expected value across all documents:

$$ \tau_d^i = (f^i_d -\langle f^i \rangle ) / \langle f^i \rangle $$

as in Eq. (10) of Hyland et al.

In [11]:
model.print_overview()
Level 0 has 39 document groups and 84 topics (word groups)
Level 1 has 13 document groups and 14 topics (word groups)
Level 2 has 2 document groups and 3 topics (word groups)
Level 3 has 1 document groups and 1 topics (word groups)

In [12]:
model.topics(l=2)
Out[12]:
{0: [('the', 0.1304471892707187),
  ('of', 0.07362113244466185),
  ('and', 0.04573804573804574),
  ('a', 0.04272145448616037),
  ('to', 0.04141698259345318),
  ('in', 0.039908686967510494),
  ('is', 0.03306020953079777),
  ('for', 0.017447311564958625),
  ('as', 0.016265133912192736),
  ('that', 0.013167013167013167)],
 1: [('formula', 0.02083944819489287),
  ('field', 0.009783778495254868),
  ('when', 0.007729185011251345),
  ('physics', 0.0067508071617258586),
  ('energy', 0.0066529693767733095),
  ('analysis', 0.006359456021915664),
  ('electron', 0.0059681048821054695),
  ('mass', 0.0058702670971529205),
  ('quantum', 0.005576753742295274),
  ('law', 0.005576753742295274)],
 2: [('protein', 0.02806946688206785),
  ('data', 0.017568659127625202),
  ('structure', 0.015751211631663976),
  ('computational', 0.015549273021001616),
  ('bioinformatics', 0.01474151857835218),
  ('biology', 0.01029886914378029),
  ('sequence', 0.009491114701130857),
  ('proteins', 0.009289176090468497),
  ('computer', 0.009087237479806139),
  ('model', 0.009087237479806139)]}

Relative contribution of topics in each document

In [13]:
print("Document title [relative contribution of each topic]\n")
tau_d=model.topicdist_relative(l=2)

for i in range(len(model.documents)):
    print(model.documents[i],tau_d[i])
Document title [relative contribution of each topic]

Nuclear_Overhauser_effect [-0.13980569  0.57555232 -0.49538507]
Quantum_solvent [ 0.06809446  0.22670045 -0.80523634]
Rovibrational_coupling [-0.10582642  0.61371087 -0.7424699 ]
Effective_field_theory [-0.02818675  0.41790322 -0.72292801]
Chemical_physics [-0.09703426  0.67649061 -0.91560241]
Rotational_transition [-0.05899796  0.59597849 -0.93784674]
Dynamic_nuclear_polarisation [-0.17109143  0.78734273 -0.77754163]
Knight_shift [-0.15027516  0.74804814 -0.79955574]
Polarizability [-0.11599365  0.61549243 -0.69578105]
Anisotropic_liquid [ 0.07901567  0.07904206 -0.5545683 ]
Rotating_wave_approximation [-0.00589727  0.46136675 -0.92305402]
RRKM_theory [ 0.0982838   0.04050484 -0.57047658]
Molecular_vibration [-0.07950154  0.55947082 -0.76092466]
Fuel_mass_fraction [-0.2643075   1.11884623 -1.        ]
Electrostatic_deflection_(structural_element) [-0.11414146  0.58259502 -0.63705565]
Magic_angle_(EELS) [-0.07988048  0.54317796 -0.72541882]
Reactive_empirical_bond_order [-0.09975616  0.29868139 -0.12231505]
Photofragment-ion_imaging [ 0.0198688   0.35720495 -0.83570142]
Molecular_beam [ 0.08143498  0.2019464  -0.82022936]
McConnell_equation [-0.29189597  0.89371881 -0.39866721]
Ziff-Gulari-Barshad_model [-0.04359975  0.48823723 -0.79174622]
Empirical_formula [-0.14218254  0.65093435 -0.63920032]
Pauli_effect [-0.01879878  0.38655376 -0.70472799]
SLAC_National_Accelerator_Laboratory [-0.1263539   0.49296101 -0.39155229]
Newton's_laws_of_motion [-0.0521876   0.54361795 -0.86351071]
Uncertainty [ 0.03969931  0.31319855 -0.84310705]
Ripple_tank [-0.05327744  0.50817833 -0.78496401]
Particle-induced_X-ray_emission [-0.10179632  0.54433319 -0.61923749]
Experimental_physics [-0.01724507  0.37007    -0.67840201]
Complementary_experiments [ 0.04937238  0.34465241 -0.95594632]
Elevator_paradox_(physics) [-0.0448058   0.54108433 -0.89484891]
Wave_tank [ 3.37441514e-04  1.86946265e-01 -3.87531413e-01]
Philosophical_interpretation_of_classical_physics [ 0.04185242  0.28716134 -0.80003185]
X-ray_crystal_truncation_rod [-0.1673191   0.69038357 -0.59610373]
Faraday_cup_electrometer [-0.05472787  0.46894801 -0.69680699]
Line_source [-0.09171466  0.3276315  -0.22190412]
X-ray_standing_waves [-0.20085402  0.76018736 -0.57405594]
Point_source [-0.07701307  0.47154086 -0.59176321]
Einstein–de_Haas_effect [-0.20167422  0.75854692 -0.56660699]
List_of_Directors_General_of_CERN [-0.29189597 -0.02886215  1.50555331]
Fragment_separator [-0.0288859   0.23010795 -0.33185245]
Dynamic_mode_decomposition [-0.03165261  0.30813099 -0.47918755]
Euler's_laws_of_motion [-0.14191624  0.81089691 -0.97068457]
Holometer [ 0.04529643  0.24629358 -0.73274098]
Quantum_oscillations_(experimental_technique) [-0.12624695  0.60774493 -0.62899776]
Bioinformatics [-0.06787589 -0.68556497  1.75125668]
Computational_biology [-0.12109851 -0.61961366  1.8787841 ]
Folding@home [ 0.38049065 -0.81236658 -0.20812142]
K-mer [-0.15555296 -0.23997907  1.26589169]
Journal_of_Computational_Biology [-0.31524006 -0.70118835  3.0088853 ]
Foldit [ 0.05021632 -0.7926396   1.38726028]
Premier_Biosoft [-0.13444178 -0.66221292  2.03280888]
International_Society_for_Computational_Biology [-0.13505951 -0.84563171  2.41444798]
Louis_and_Beatrice_Laufer_Center_for_Physical_and_Quantitative_Biology [-0.05067372 -0.73853981  1.77538213]
Law_of_Maximum [ 0.08640619  0.27711279 -1.        ]
Enzyme_Function_Initiative [-0.19765502 -0.75099029  2.52918962]
SnoRNA_prediction_software [-0.09787215  0.0825799   0.31438862]
Sepp_Hochreiter [-0.21831309 -0.53960872  2.19523006]
Aureus_Sciences [-0.04792735 -0.54299395  1.35816782]
IEEE/ACM_Transactions_on_Computational_Biology_and_Bioinformatics [-0.15555296 -0.91555323  2.66028658]
Knotted_protein [-4.65133831e-02 -1.64677460e-05  2.30449943e-01]
BioUML [-0.04992331 -0.84526944  1.99195632]
De_novo_transcriptome_assembly [-0.11545809 -0.80195042  2.22718852]

Documents associated to each of the topics

In [14]:
model.docs_of_topic(l=2, n=10)
Topic 0
[0.38049064650371833, 'Folding@home']
[0.0982838041661571, 'RRKM_theory']
[0.08640618551980356, 'Law_of_Maximum']
[0.08143498489053753, 'Molecular_beam']
[0.07901566725096126, 'Anisotropic_liquid']
[0.0680944560439678, 'Quantum_solvent']
[0.050216316701558475, 'Foldit']
[0.04937237968912177, 'Complementary_experiments']
[0.04529642764936884, 'Holometer']
[0.04185241980407554, 'Philosophical_interpretation_of_classical_physics']

Topic 1
[1.1188462256850868, 'Fuel_mass_fraction']
[0.8937188142060465, 'McConnell_equation']
[0.8108969113487717, "Euler's_laws_of_motion"]
[0.7873427314160522, 'Dynamic_nuclear_polarisation']
[0.7601873593581843, 'X-ray_standing_waves']
[0.7585469237949426, 'Einstein–de_Haas_effect']
[0.7480481361901968, 'Knight_shift']
[0.6903835734029293, 'X-ray_crystal_truncation_rod']
[0.6764906101473233, 'Chemical_physics']
[0.6509343508462969, 'Empirical_formula']

Topic 2
[3.008885298869144, 'Journal_of_Computational_Biology']
[2.6602865772283484, 'IEEE/ACM_Transactions_on_Computational_Biology_and_Bioinformatics']
[2.529189622081383, 'Enzyme_Function_Initiative']
[2.4144479769727702, 'International_Society_for_Computational_Biology']
[2.2271885232399193, 'De_novo_transcriptome_assembly']
[2.195230060431999, 'Sepp_Hochreiter']
[2.0328088782749174, 'Premier_Biosoft']
[1.9919563236843087, 'BioUML']
[1.8787841018585545, 'Computational_biology']
[1.7753821299863304, 'Louis_and_Beatrice_Laufer_Center_for_Physical_and_Quantitative_Biology']