TopSBM: Topic Modeling with Stochastic Block Models¶
A basic tutorial.
%load_ext autoreload
%autoreload 2
import os
import pylab as plt
%matplotlib inline
from sbmtm import sbmtm
import graph_tool.all as gt
Setup: Load a corpus¶
We have a list of documents, each document contains a list of words.
We have a list of document titles (optional)
The example corpus consists of 63 articles from Wikipedia taken from 3 different categories (Experimental Physics, Chemical Physics, and Computational Biology).
path_data = 'data/'
## texts
fname_data = 'corpus.txt'
filename = os.path.join(path_data,fname_data)
with open(filename,'r', encoding = 'utf8') as f:
x = f.readlines()
texts = [h.split() for h in x]
## titles
fname_data = 'titles.txt'
filename = os.path.join(path_data,fname_data)
with open(filename,'r', encoding = 'utf8') as f:
x = f.readlines()
titles = [h.split()[0] for h in x]
i_doc = 0
print(titles[0])
print(texts[i_doc][:10])
Nuclear_Overhauser_effect ['the', 'nuclear', 'overhauser', 'effect', 'noe', 'is', 'the', 'transfer', 'of', 'nuclear']
Fitting the model¶
## we create an instance of the sbmtm-class
model = sbmtm()
## we have to create the word-document network from the corpus
model.make_graph(texts,documents=titles)
## we can also skip the previous step by saving/loading a graph
# model.save_graph(filename = 'graph.xml.gz')
# model.load_graph(filename = 'graph.xml.gz')
## fit the model
gt.seed_rng(32) ## seed for graph-tool's random number generator --> same results
model.fit()
<NestedBlockState object, with base <BlockState object with 3203 blocks (123 nonempty), degree-corrected, for graph <Graph object, undirected, with 3203 vertices and 13050 edges, 2 internal vertex properties, 1 internal edge property, at 0x7392097ecb60>, at 0x7391fc568170>, and 5 levels of sizes [(3203, 123), (123, 27), (27, 5), (5, 2), (2, 1)] at 0x7391fc56a4e0>
Plotting the result¶
The output shows the (hierarchical) community structure in the word-document network as inferred by the stochastic block model:
- document-nodes are on the left
- word-nodes are on the right
- different colors correspond to the different groups
The result is a grouping of nodes into groups on multiple levels in the hierarchy:
- on the uppermost level, each node belongs to the same group (square in the middle)
- on the next-lower level, we split the network into two groups: the word-nodes and the document-nodes (blue sqaures to the left and right, respectively). This is a trivial structure due to the bipartite character of the network.
- only next lower levels constitute a non-trivial structure: We now further divide nodes into smaller groups (document-nodes into document-groups on the left and word-nodes into word-groups on the right)
model.plot(nedges=10000)
The basics¶
Topics¶
For each word-group on a given level in the hierarchy, we retrieve the $n$ most common words in each group -- these are the topics!
model.topics(l=1,n=20)
{0: [('the', 0.20760347735824575),
('of', 0.11716621253405994),
('a', 0.06799013883482548),
('to', 0.06591410406124303),
('in', 0.0635136888542883),
('is', 0.05261450629298041),
('as', 0.025885558583106268),
('that', 0.02095497599584793),
('are', 0.020046710782405604),
('by', 0.018554560788893212),
('be', 0.01797067600882315),
('with', 0.01589464123524069),
('an', 0.015700012975217333),
('this', 0.014467367328402751),
('can', 0.013429349941611522),
('or', 0.012521084728169197),
('from', 0.012456208641494744),
('it', 0.012196704294796938),
('at', 0.010380173867912288),
('used', 0.007850006487608667)],
1: [('formula', 0.13903394255874674),
('electron', 0.03981723237597911),
('x', 0.033942558746736295),
('spin', 0.030678851174934726),
('surface', 0.030026109660574413),
('magnetic', 0.026762402088772844),
('electrons', 0.022845953002610966),
('effect', 0.022193211488250653),
('ray', 0.02154046997389034),
('nuclear', 0.018929503916449087),
('polarization', 0.018276762402088774),
('observed', 0.016318537859007835),
('intensity', 0.015665796344647518),
('sample', 0.015665796344647518),
('direction', 0.015013054830287207),
('cross', 0.010443864229765013),
('external', 0.010443864229765013),
('dnp', 0.0097911227154047),
('defined', 0.0097911227154047),
('left', 0.0097911227154047)],
2: [('analysis', 0.09090909090909091),
('chemical', 0.039160839160839164),
('angle', 0.026573426573426574),
('proton', 0.025174825174825177),
('shown', 0.025174825174825177),
('loss', 0.016783216783216783),
('fermi', 0.016783216783216783),
('elements', 0.016783216783216783),
('empirical', 0.015384615384615385),
('oscillations', 0.015384615384615385),
('ratio', 0.013986013986013986),
('noe', 0.013986013986013986),
('whereas', 0.012587412587412588),
('fuel', 0.012587412587412588),
('landau', 0.011188811188811189),
('knot', 0.011188811188811189),
('shows', 0.011188811188811189),
('shell', 0.009790209790209791),
('compounds', 0.009790209790209791),
('micropixe', 0.009790209790209791)],
3: [('and', 0.12306679828891083),
('for', 0.04694526708347044),
('on', 0.027201930459581),
('which', 0.022485466710540747),
('s', 0.0202917626412197),
('folding', 0.017439947351102335),
('was', 0.017001206537238127),
('home', 0.013930020840188658),
('such', 0.01173631677086761),
('these', 0.011407261160469452),
('also', 0.00943292749808051),
('its', 0.008884501480750246),
('first', 0.008774816277284195),
('molecular', 0.008445760666886038),
('more', 0.008336075463419985),
('been', 0.007568279039157618),
('if', 0.007239223428759461),
('but', 0.007239223428759461),
('research', 0.007239223428759461),
('time', 0.007019853021827355)],
4: [('structure', 0.0982367758186398),
('computer', 0.05667506297229219),
('model', 0.05667506297229219),
('high', 0.05163727959697733),
('function', 0.042821158690176324),
('models', 0.03904282115869018),
('real', 0.031486146095717885),
('potential', 0.028967254408060455),
('fields', 0.027707808564231738),
('structural', 0.02644836272040302),
('de', 0.02392947103274559),
('usually', 0.020151133501259445),
('types', 0.020151133501259445),
('generally', 0.020151133501259445),
('k', 0.020151133501259445),
('group', 0.018891687657430732),
('modeling', 0.018891687657430732),
('short', 0.018891687657430732),
('levels', 0.018891687657430732),
('presence', 0.017632241813602016)],
5: [('energy', 0.05807002561912895),
('law', 0.04867634500426986),
('momentum', 0.04782237403928266),
('molecule', 0.04269854824935952),
('motion', 0.035866780529461996),
('body', 0.035012809564474806),
('frequency', 0.032450896669513236),
('laws', 0.030742954739538857),
('normal', 0.029888983774551667),
('second', 0.026473099914602904),
('vibration', 0.01964133219470538),
('angular', 0.01964133219470538),
('equal', 0.018787361229718188),
('rotational', 0.018787361229718188),
('frame', 0.017079419299743808),
('above', 0.017079419299743808),
('vibrational', 0.016225448334756618),
('o', 0.015371477369769428),
('spectroscopy', 0.015371477369769428),
('coordinates', 0.014517506404782237)],
6: [('physics', 0.028583264291632146),
('force', 0.01988400994200497),
('uncertainty', 0.01946975973487987),
('newton', 0.017812758906379452),
('experiments', 0.016570008285004142),
('mechanics', 0.013670256835128418),
('velocity', 0.013256006628003313),
('detector', 0.01284175642087821),
('classical', 0.012013256006628004),
('dimensional', 0.011184755592377795),
('imaging', 0.010770505385252692),
('distribution', 0.010356255178127589),
('product', 0.009527754763877383),
('physical', 0.009113504556752278),
('reference', 0.008699254349627174),
('position', 0.008699254349627174),
('ion', 0.008285004142502071),
('laser', 0.008285004142502071),
('early', 0.007870753935376968),
('measurement', 0.007870753935376968)],
7: [('when', 0.04302832244008715),
('mass', 0.032679738562091505),
('quantum', 0.03104575163398693),
('atoms', 0.02505446623093682),
('applied', 0.020697167755991286),
('often', 0.020697167755991286),
('atomic', 0.0196078431372549),
('number', 0.01906318082788671),
('technique', 0.018518518518518517),
('atom', 0.016339869281045753),
('scattering', 0.015250544662309368),
('i', 0.014161220043572984),
('polarizability', 0.013616557734204794),
('approximation', 0.013071895424836602),
('particles', 0.013071895424836602),
('therefore', 0.01252723311546841),
('c', 0.011982570806100218),
('constant', 0.011982570806100218),
('nucleon', 0.011437908496732025),
('h', 0.010893246187363835)],
8: [('protein', 0.09494535519125682),
('proteins', 0.031420765027322405),
('software', 0.028688524590163935),
('project', 0.028005464480874317),
('structures', 0.018442622950819672),
('determine', 0.01366120218579235),
('cell', 0.01366120218579235),
('specific', 0.01366120218579235),
('assembly', 0.01366120218579235),
('native', 0.012295081967213115),
('approaches', 0.012295081967213115),
('prediction', 0.012295081967213115),
('core', 0.012295081967213115),
('functions', 0.011612021857923498),
('mutations', 0.01092896174863388),
('cancer', 0.01092896174863388),
('transcriptome', 0.01092896174863388),
('company', 0.010245901639344262),
('cells', 0.009562841530054645),
('functional', 0.009562841530054645)],
9: [('source', 0.044657097288676235),
('point', 0.04226475279106858),
('theory', 0.037480063795853266),
('linear', 0.03508771929824561),
('light', 0.03110047846889952),
('interactions', 0.030303030303030304),
('wave', 0.024720893141945772),
('sources', 0.023923444976076555),
('air', 0.0215311004784689),
('waves', 0.02073365231259968),
('water', 0.019936204146730464),
('noise', 0.019138755980861243),
('tank', 0.017543859649122806),
('properties', 0.01594896331738437),
('radiation', 0.013556618819776715),
('line', 0.013556618819776715),
('similar', 0.012759170653907496),
('effective', 0.012759170653907496),
('fluid', 0.011961722488038277),
('diffraction', 0.011164274322169059)],
10: [('data', 0.048013245033112585),
('sequence', 0.025938189845474614),
('genome', 0.02152317880794702),
('available', 0.018763796909492272),
('biological', 0.018763796909492272),
('iscb', 0.018211920529801324),
('conference', 0.017660044150110375),
('gene', 0.01545253863134658),
('sequencing', 0.014900662251655629),
('society', 0.014900662251655629),
('sequences', 0.01434878587196468),
('genes', 0.01434878587196468),
('algorithms', 0.01379690949227373),
('dna', 0.012693156732891833),
('ismb', 0.011589403973509934),
('complex', 0.011589403973509934),
('tools', 0.009933774834437087),
('current', 0.009933774834437087),
('year', 0.009381898454746136),
('application', 0.008278145695364239)],
11: [('experimental', 0.06167400881057269),
('he', 0.041116005873715125),
('his', 0.0381791483113069),
('experiment', 0.03671071953010279),
('pauli', 0.030837004405286344),
('accelerator', 0.030837004405286344),
('stanford', 0.027900146842878122),
('main', 0.024963289280469897),
('laboratory', 0.022026431718061675),
('material', 0.020558002936857563),
('slac', 0.020558002936857563),
('target', 0.01908957415565345),
('during', 0.01908957415565345),
('strong', 0.01908957415565345),
('physicist', 0.014684287812041116),
('factor', 0.014684287812041116),
('haas', 0.014684287812041116),
('history', 0.014684287812041116),
('synchrotron', 0.013215859030837005),
('national', 0.013215859030837005)],
12: [('field', 0.16181229773462782),
('electric', 0.08737864077669903),
('beam', 0.061488673139158574),
('case', 0.038834951456310676),
('relative', 0.02912621359223301),
('charge', 0.02912621359223301),
('measure', 0.02912621359223301),
('dipole', 0.025889967637540454),
('beams', 0.022653721682847898),
('moment', 0.022653721682847898),
('shift', 0.021035598705501618),
('liquid', 0.021035598705501618),
('metal', 0.019417475728155338),
('induced', 0.019417475728155338),
('detect', 0.01779935275080906),
('faraday', 0.014563106796116505),
('rule', 0.014563106796116505),
('deflection', 0.012944983818770227),
('charged', 0.012944983818770227),
('fig', 0.012944983818770227)],
13: [('computational', 0.0873015873015873),
('bioinformatics', 0.08276643990929705),
('biology', 0.05782312925170068),
('large', 0.045351473922902494),
('university', 0.04195011337868481),
('new', 0.03741496598639456),
('program', 0.027210884353741496),
('researchers', 0.026077097505668934),
('sciences', 0.02040816326530612),
('biouml', 0.02040816326530612),
('now', 0.017006802721088437),
('institute', 0.015873015873015872),
('include', 0.015873015873015872),
('networks', 0.013605442176870748),
('contributions', 0.013605442176870748),
('network', 0.012471655328798186),
('before', 0.012471655328798186),
('low', 0.012471655328798186),
('platform', 0.011337868480725623),
('microarray', 0.011337868480725623)]}
Topic-distribution in each document¶
Which topics contribute to each document?
## select a document (by its index)
i_doc = 0
print(model.documents[i_doc])
## get a list of tuples (topic-index, probability)
model.topicdist(i_doc,l=1)
Nuclear_Overhauser_effect
[(0, 0.3881118881118881), (1, 0.17832167832167833), (2, 0.0944055944055944), (3, 0.14335664335664336), (4, 0.013986013986013986), (5, 0.02097902097902098), (6, 0.038461538461538464), (7, 0.03496503496503497), (8, 0.02097902097902098), (9, 0.02097902097902098), (10, 0.02097902097902098), (11, 0.01048951048951049), (12, 0.006993006993006993), (13, 0.006993006993006993)]
Extra: Clustering of documents - for free.¶
The stochastic block models clusters the documents into groups. We do not need to run an additional clustering to obtain this grouping.
model.clusters(l=1,n=5)
{0: [('Nuclear_Overhauser_effect', 1.0),
('Empirical_formula', 1.0),
('Magic_angle_(EELS)', 1.0),
('Fuel_mass_fraction', 1.0),
('Dynamic_mode_decomposition', 1.0)],
1: [('Reactive_empirical_bond_order', 1.0),
('Rotating_wave_approximation', 1.0),
('Rovibrational_coupling', 1.0),
('Complementary_experiments', 1.0),
('Rotational_transition', 1.0)],
2: [('Wave_tank', 1.0),
('Ripple_tank', 1.0),
('Effective_field_theory', 1.0),
('Line_source', 1.0),
('Point_source', 1.0)],
3: [('Philosophical_interpretation_of_classical_physics', 1.0),
('Experimental_physics', 1.0),
('Chemical_physics', 1.0),
('Uncertainty', 1.0)],
4: [('Elevator_paradox_(physics)', 1.0),
('X-ray_crystal_truncation_rod', 1.0),
('Dynamic_nuclear_polarisation', 1.0),
('X-ray_standing_waves', 1.0)],
5: [('Knight_shift', 1.0),
('Anisotropic_liquid', 1.0),
('Electrostatic_deflection_(structural_element)', 1.0),
('Molecular_beam', 1.0),
('Faraday_cup_electrometer', 1.0)],
6: [('Polarizability', 1.0)],
7: [('Einstein–de_Haas_effect', 1.0),
('Holometer', 1.0),
('RRKM_theory', 1.0),
('Fragment_separator', 1.0),
('Pauli_effect', 1.0)],
8: [("Newton's_laws_of_motion", 1.0), ('Photofragment-ion_imaging', 1.0)],
9: [('SLAC_National_Accelerator_Laboratory', 1.0)],
10: [('Bioinformatics', 1.0), ('Folding@home', 1.0)],
11: [('IEEE/ACM_Transactions_on_Computational_Biology_and_Bioinformatics',
1.0),
('BioUML', 1.0),
('Sepp_Hochreiter', 1.0),
('Computational_biology', 1.0),
('Journal_of_Computational_Biology', 1.0)],
12: [('De_novo_transcriptome_assembly', 1.0),
('Enzyme_Function_Initiative', 1.0),
('Foldit', 1.0),
('Premier_Biosoft', 1.0),
('Louis_and_Beatrice_Laufer_Center_for_Physical_and_Quantitative_Biology',
1.0)]}
Application -- Finding similar articles:
For a query-article, we return all articles from the same group
## select a document (index)
i_doc = 2
print(i_doc,model.documents[i_doc])
## find all articles from the same group
## print: (doc-index, doc-title)
model.clusters_query(i_doc,l=1,)
2 Rovibrational_coupling
[(5, 'Rotational_transition'), (10, 'Rotating_wave_approximation'), (12, 'Molecular_vibration'), (16, 'Reactive_empirical_bond_order'), (20, 'Ziff-Gulari-Barshad_model'), (29, 'Complementary_experiments'), (42, "Euler's_laws_of_motion"), (54, 'Law_of_Maximum')]
More technical: Group membership¶
In the stochastic block model, word (-nodes) and document (-nodes) are clustered into different groups.
The group membership can be represented by the conditional probability $P(\text{group}\, |\, \text{node})$. Since words and documents belong to different groups (the word-document network is bipartite) we can show separately:
- P(bd | d), the probability of document $d$ to belong to document group $bd$
- P(bw | w), the probability of word $w$ to belong to word group $bw$.
p_td_d,p_tw_w = model.group_membership(l=1)
plt.figure(figsize=(15,4))
plt.subplot(121)
plt.imshow(p_td_d,origin='lower',aspect='auto',interpolation='none')
plt.title(r'Document group membership $P(bd | d)$')
plt.xlabel('Document d (index)')
plt.ylabel('Document group, bd')
plt.colorbar()
plt.subplot(122)
plt.imshow(p_tw_w,origin='lower',aspect='auto',interpolation='none')
plt.title(r'Word group membership $P(bw | w)$')
plt.xlabel('Word w (index)')
plt.ylabel('Word group, bw')
plt.colorbar()
<matplotlib.colorbar.Colorbar at 0x7391fc40e330>
Relative topical distribution¶
Compare the frequency $f^i_d$ of words from topic $i$ in document $d$ with the expected value across all documents:
$$ \tau_d^i = (f^i_d -\langle f^i \rangle ) / \langle f^i \rangle $$
as in Eq. (10) of Hyland et al.
model.print_overview()
Level 0 has 39 document groups and 84 topics (word groups) Level 1 has 13 document groups and 14 topics (word groups) Level 2 has 2 document groups and 3 topics (word groups) Level 3 has 1 document groups and 1 topics (word groups)
model.topics(l=2)
{0: [('the', 0.1304471892707187),
('of', 0.07362113244466185),
('and', 0.04573804573804574),
('a', 0.04272145448616037),
('to', 0.04141698259345318),
('in', 0.039908686967510494),
('is', 0.03306020953079777),
('for', 0.017447311564958625),
('as', 0.016265133912192736),
('that', 0.013167013167013167)],
1: [('formula', 0.02083944819489287),
('field', 0.009783778495254868),
('when', 0.007729185011251345),
('physics', 0.0067508071617258586),
('energy', 0.0066529693767733095),
('analysis', 0.006359456021915664),
('electron', 0.0059681048821054695),
('mass', 0.0058702670971529205),
('quantum', 0.005576753742295274),
('law', 0.005576753742295274)],
2: [('protein', 0.02806946688206785),
('data', 0.017568659127625202),
('structure', 0.015751211631663976),
('computational', 0.015549273021001616),
('bioinformatics', 0.01474151857835218),
('biology', 0.01029886914378029),
('sequence', 0.009491114701130857),
('proteins', 0.009289176090468497),
('computer', 0.009087237479806139),
('model', 0.009087237479806139)]}
Relative contribution of topics in each document
print("Document title [relative contribution of each topic]\n")
tau_d=model.topicdist_relative(l=2)
for i in range(len(model.documents)):
print(model.documents[i],tau_d[i])
Document title [relative contribution of each topic] Nuclear_Overhauser_effect [-0.13980569 0.57555232 -0.49538507] Quantum_solvent [ 0.06809446 0.22670045 -0.80523634] Rovibrational_coupling [-0.10582642 0.61371087 -0.7424699 ] Effective_field_theory [-0.02818675 0.41790322 -0.72292801] Chemical_physics [-0.09703426 0.67649061 -0.91560241] Rotational_transition [-0.05899796 0.59597849 -0.93784674] Dynamic_nuclear_polarisation [-0.17109143 0.78734273 -0.77754163] Knight_shift [-0.15027516 0.74804814 -0.79955574] Polarizability [-0.11599365 0.61549243 -0.69578105] Anisotropic_liquid [ 0.07901567 0.07904206 -0.5545683 ] Rotating_wave_approximation [-0.00589727 0.46136675 -0.92305402] RRKM_theory [ 0.0982838 0.04050484 -0.57047658] Molecular_vibration [-0.07950154 0.55947082 -0.76092466] Fuel_mass_fraction [-0.2643075 1.11884623 -1. ] Electrostatic_deflection_(structural_element) [-0.11414146 0.58259502 -0.63705565] Magic_angle_(EELS) [-0.07988048 0.54317796 -0.72541882] Reactive_empirical_bond_order [-0.09975616 0.29868139 -0.12231505] Photofragment-ion_imaging [ 0.0198688 0.35720495 -0.83570142] Molecular_beam [ 0.08143498 0.2019464 -0.82022936] McConnell_equation [-0.29189597 0.89371881 -0.39866721] Ziff-Gulari-Barshad_model [-0.04359975 0.48823723 -0.79174622] Empirical_formula [-0.14218254 0.65093435 -0.63920032] Pauli_effect [-0.01879878 0.38655376 -0.70472799] SLAC_National_Accelerator_Laboratory [-0.1263539 0.49296101 -0.39155229] Newton's_laws_of_motion [-0.0521876 0.54361795 -0.86351071] Uncertainty [ 0.03969931 0.31319855 -0.84310705] Ripple_tank [-0.05327744 0.50817833 -0.78496401] Particle-induced_X-ray_emission [-0.10179632 0.54433319 -0.61923749] Experimental_physics [-0.01724507 0.37007 -0.67840201] Complementary_experiments [ 0.04937238 0.34465241 -0.95594632] Elevator_paradox_(physics) [-0.0448058 0.54108433 -0.89484891] Wave_tank [ 3.37441514e-04 1.86946265e-01 -3.87531413e-01] Philosophical_interpretation_of_classical_physics [ 0.04185242 0.28716134 -0.80003185] X-ray_crystal_truncation_rod [-0.1673191 0.69038357 -0.59610373] Faraday_cup_electrometer [-0.05472787 0.46894801 -0.69680699] Line_source [-0.09171466 0.3276315 -0.22190412] X-ray_standing_waves [-0.20085402 0.76018736 -0.57405594] Point_source [-0.07701307 0.47154086 -0.59176321] Einstein–de_Haas_effect [-0.20167422 0.75854692 -0.56660699] List_of_Directors_General_of_CERN [-0.29189597 -0.02886215 1.50555331] Fragment_separator [-0.0288859 0.23010795 -0.33185245] Dynamic_mode_decomposition [-0.03165261 0.30813099 -0.47918755] Euler's_laws_of_motion [-0.14191624 0.81089691 -0.97068457] Holometer [ 0.04529643 0.24629358 -0.73274098] Quantum_oscillations_(experimental_technique) [-0.12624695 0.60774493 -0.62899776] Bioinformatics [-0.06787589 -0.68556497 1.75125668] Computational_biology [-0.12109851 -0.61961366 1.8787841 ] Folding@home [ 0.38049065 -0.81236658 -0.20812142] K-mer [-0.15555296 -0.23997907 1.26589169] Journal_of_Computational_Biology [-0.31524006 -0.70118835 3.0088853 ] Foldit [ 0.05021632 -0.7926396 1.38726028] Premier_Biosoft [-0.13444178 -0.66221292 2.03280888] International_Society_for_Computational_Biology [-0.13505951 -0.84563171 2.41444798] Louis_and_Beatrice_Laufer_Center_for_Physical_and_Quantitative_Biology [-0.05067372 -0.73853981 1.77538213] Law_of_Maximum [ 0.08640619 0.27711279 -1. ] Enzyme_Function_Initiative [-0.19765502 -0.75099029 2.52918962] SnoRNA_prediction_software [-0.09787215 0.0825799 0.31438862] Sepp_Hochreiter [-0.21831309 -0.53960872 2.19523006] Aureus_Sciences [-0.04792735 -0.54299395 1.35816782] IEEE/ACM_Transactions_on_Computational_Biology_and_Bioinformatics [-0.15555296 -0.91555323 2.66028658] Knotted_protein [-4.65133831e-02 -1.64677460e-05 2.30449943e-01] BioUML [-0.04992331 -0.84526944 1.99195632] De_novo_transcriptome_assembly [-0.11545809 -0.80195042 2.22718852]
Documents associated to each of the topics
model.docs_of_topic(l=2, n=10)
Topic 0 [0.38049064650371833, 'Folding@home'] [0.0982838041661571, 'RRKM_theory'] [0.08640618551980356, 'Law_of_Maximum'] [0.08143498489053753, 'Molecular_beam'] [0.07901566725096126, 'Anisotropic_liquid'] [0.0680944560439678, 'Quantum_solvent'] [0.050216316701558475, 'Foldit'] [0.04937237968912177, 'Complementary_experiments'] [0.04529642764936884, 'Holometer'] [0.04185241980407554, 'Philosophical_interpretation_of_classical_physics'] Topic 1 [1.1188462256850868, 'Fuel_mass_fraction'] [0.8937188142060465, 'McConnell_equation'] [0.8108969113487717, "Euler's_laws_of_motion"] [0.7873427314160522, 'Dynamic_nuclear_polarisation'] [0.7601873593581843, 'X-ray_standing_waves'] [0.7585469237949426, 'Einstein–de_Haas_effect'] [0.7480481361901968, 'Knight_shift'] [0.6903835734029293, 'X-ray_crystal_truncation_rod'] [0.6764906101473233, 'Chemical_physics'] [0.6509343508462969, 'Empirical_formula'] Topic 2 [3.008885298869144, 'Journal_of_Computational_Biology'] [2.6602865772283484, 'IEEE/ACM_Transactions_on_Computational_Biology_and_Bioinformatics'] [2.529189622081383, 'Enzyme_Function_Initiative'] [2.4144479769727702, 'International_Society_for_Computational_Biology'] [2.2271885232399193, 'De_novo_transcriptome_assembly'] [2.195230060431999, 'Sepp_Hochreiter'] [2.0328088782749174, 'Premier_Biosoft'] [1.9919563236843087, 'BioUML'] [1.8787841018585545, 'Computational_biology'] [1.7753821299863304, 'Louis_and_Beatrice_Laufer_Center_for_Physical_and_Quantitative_Biology']