TopSBM: Topic Modeling with Stochastic Block Models¶
A basic tutorial.
%load_ext autoreload
%autoreload 2
import os
import pylab as plt
%matplotlib inline
from sbmtm import sbmtm
import graph_tool.all as gt
Setup: Load a corpus¶
We have a list of documents, each document contains a list of words.
We have a list of document titles (optional)
The example corpus consists of 63 articles from Wikipedia taken from 3 different categories (Experimental Physics, Chemical Physics, and Computational Biology).
path_data = 'data/'
## texts
fname_data = 'corpus.txt'
filename = os.path.join(path_data,fname_data)
with open(filename,'r', encoding = 'utf8') as f:
x = f.readlines()
texts = [h.split() for h in x]
## titles
fname_data = 'titles.txt'
filename = os.path.join(path_data,fname_data)
with open(filename,'r', encoding = 'utf8') as f:
x = f.readlines()
titles = [h.split()[0] for h in x]
i_doc = 0
print(titles[0])
print(texts[i_doc][:10])
Nuclear_Overhauser_effect ['the', 'nuclear', 'overhauser', 'effect', 'noe', 'is', 'the', 'transfer', 'of', 'nuclear']
Fitting the model¶
## we create an instance of the sbmtm-class
model = sbmtm()
## we have to create the word-document network from the corpus
model.make_graph(texts,documents=titles)
## we can also skip the previous step by saving/loading a graph
# model.save_graph(filename = 'graph.xml.gz')
# model.load_graph(filename = 'graph.xml.gz')
## fit the model
gt.seed_rng(32) ## seed for graph-tool's random number generator --> same results
model.fit()
<NestedBlockState object, with base <BlockState object with 3203 blocks (123 nonempty), degree-corrected, for graph <Graph object, undirected, with 3203 vertices and 13050 edges, 2 internal vertex properties, 1 internal edge property, at 0x7392097ecb60>, at 0x7391fc568170>, and 5 levels of sizes [(3203, 123), (123, 27), (27, 5), (5, 2), (2, 1)] at 0x7391fc56a4e0>
Plotting the result¶
The output shows the (hierarchical) community structure in the word-document network as inferred by the stochastic block model:
- document-nodes are on the left
- word-nodes are on the right
- different colors correspond to the different groups
The result is a grouping of nodes into groups on multiple levels in the hierarchy:
- on the uppermost level, each node belongs to the same group (square in the middle)
- on the next-lower level, we split the network into two groups: the word-nodes and the document-nodes (blue sqaures to the left and right, respectively). This is a trivial structure due to the bipartite character of the network.
- only next lower levels constitute a non-trivial structure: We now further divide nodes into smaller groups (document-nodes into document-groups on the left and word-nodes into word-groups on the right)
model.plot(nedges=10000)
The basics¶
Topics¶
For each word-group on a given level in the hierarchy, we retrieve the $n$ most common words in each group -- these are the topics!
model.topics(l=1,n=20)
{0: [('the', 0.20760347735824575), ('of', 0.11716621253405994), ('a', 0.06799013883482548), ('to', 0.06591410406124303), ('in', 0.0635136888542883), ('is', 0.05261450629298041), ('as', 0.025885558583106268), ('that', 0.02095497599584793), ('are', 0.020046710782405604), ('by', 0.018554560788893212), ('be', 0.01797067600882315), ('with', 0.01589464123524069), ('an', 0.015700012975217333), ('this', 0.014467367328402751), ('can', 0.013429349941611522), ('or', 0.012521084728169197), ('from', 0.012456208641494744), ('it', 0.012196704294796938), ('at', 0.010380173867912288), ('used', 0.007850006487608667)], 1: [('formula', 0.13903394255874674), ('electron', 0.03981723237597911), ('x', 0.033942558746736295), ('spin', 0.030678851174934726), ('surface', 0.030026109660574413), ('magnetic', 0.026762402088772844), ('electrons', 0.022845953002610966), ('effect', 0.022193211488250653), ('ray', 0.02154046997389034), ('nuclear', 0.018929503916449087), ('polarization', 0.018276762402088774), ('observed', 0.016318537859007835), ('intensity', 0.015665796344647518), ('sample', 0.015665796344647518), ('direction', 0.015013054830287207), ('cross', 0.010443864229765013), ('external', 0.010443864229765013), ('dnp', 0.0097911227154047), ('defined', 0.0097911227154047), ('left', 0.0097911227154047)], 2: [('analysis', 0.09090909090909091), ('chemical', 0.039160839160839164), ('angle', 0.026573426573426574), ('proton', 0.025174825174825177), ('shown', 0.025174825174825177), ('loss', 0.016783216783216783), ('fermi', 0.016783216783216783), ('elements', 0.016783216783216783), ('empirical', 0.015384615384615385), ('oscillations', 0.015384615384615385), ('ratio', 0.013986013986013986), ('noe', 0.013986013986013986), ('whereas', 0.012587412587412588), ('fuel', 0.012587412587412588), ('landau', 0.011188811188811189), ('knot', 0.011188811188811189), ('shows', 0.011188811188811189), ('shell', 0.009790209790209791), ('compounds', 0.009790209790209791), ('micropixe', 0.009790209790209791)], 3: [('and', 0.12306679828891083), ('for', 0.04694526708347044), ('on', 0.027201930459581), ('which', 0.022485466710540747), ('s', 0.0202917626412197), ('folding', 0.017439947351102335), ('was', 0.017001206537238127), ('home', 0.013930020840188658), ('such', 0.01173631677086761), ('these', 0.011407261160469452), ('also', 0.00943292749808051), ('its', 0.008884501480750246), ('first', 0.008774816277284195), ('molecular', 0.008445760666886038), ('more', 0.008336075463419985), ('been', 0.007568279039157618), ('if', 0.007239223428759461), ('but', 0.007239223428759461), ('research', 0.007239223428759461), ('time', 0.007019853021827355)], 4: [('structure', 0.0982367758186398), ('computer', 0.05667506297229219), ('model', 0.05667506297229219), ('high', 0.05163727959697733), ('function', 0.042821158690176324), ('models', 0.03904282115869018), ('real', 0.031486146095717885), ('potential', 0.028967254408060455), ('fields', 0.027707808564231738), ('structural', 0.02644836272040302), ('de', 0.02392947103274559), ('usually', 0.020151133501259445), ('types', 0.020151133501259445), ('generally', 0.020151133501259445), ('k', 0.020151133501259445), ('group', 0.018891687657430732), ('modeling', 0.018891687657430732), ('short', 0.018891687657430732), ('levels', 0.018891687657430732), ('presence', 0.017632241813602016)], 5: [('energy', 0.05807002561912895), ('law', 0.04867634500426986), ('momentum', 0.04782237403928266), ('molecule', 0.04269854824935952), ('motion', 0.035866780529461996), ('body', 0.035012809564474806), ('frequency', 0.032450896669513236), ('laws', 0.030742954739538857), ('normal', 0.029888983774551667), ('second', 0.026473099914602904), ('vibration', 0.01964133219470538), ('angular', 0.01964133219470538), ('equal', 0.018787361229718188), ('rotational', 0.018787361229718188), ('frame', 0.017079419299743808), ('above', 0.017079419299743808), ('vibrational', 0.016225448334756618), ('o', 0.015371477369769428), ('spectroscopy', 0.015371477369769428), ('coordinates', 0.014517506404782237)], 6: [('physics', 0.028583264291632146), ('force', 0.01988400994200497), ('uncertainty', 0.01946975973487987), ('newton', 0.017812758906379452), ('experiments', 0.016570008285004142), ('mechanics', 0.013670256835128418), ('velocity', 0.013256006628003313), ('detector', 0.01284175642087821), ('classical', 0.012013256006628004), ('dimensional', 0.011184755592377795), ('imaging', 0.010770505385252692), ('distribution', 0.010356255178127589), ('product', 0.009527754763877383), ('physical', 0.009113504556752278), ('reference', 0.008699254349627174), ('position', 0.008699254349627174), ('ion', 0.008285004142502071), ('laser', 0.008285004142502071), ('early', 0.007870753935376968), ('measurement', 0.007870753935376968)], 7: [('when', 0.04302832244008715), ('mass', 0.032679738562091505), ('quantum', 0.03104575163398693), ('atoms', 0.02505446623093682), ('applied', 0.020697167755991286), ('often', 0.020697167755991286), ('atomic', 0.0196078431372549), ('number', 0.01906318082788671), ('technique', 0.018518518518518517), ('atom', 0.016339869281045753), ('scattering', 0.015250544662309368), ('i', 0.014161220043572984), ('polarizability', 0.013616557734204794), ('approximation', 0.013071895424836602), ('particles', 0.013071895424836602), ('therefore', 0.01252723311546841), ('c', 0.011982570806100218), ('constant', 0.011982570806100218), ('nucleon', 0.011437908496732025), ('h', 0.010893246187363835)], 8: [('protein', 0.09494535519125682), ('proteins', 0.031420765027322405), ('software', 0.028688524590163935), ('project', 0.028005464480874317), ('structures', 0.018442622950819672), ('determine', 0.01366120218579235), ('cell', 0.01366120218579235), ('specific', 0.01366120218579235), ('assembly', 0.01366120218579235), ('native', 0.012295081967213115), ('approaches', 0.012295081967213115), ('prediction', 0.012295081967213115), ('core', 0.012295081967213115), ('functions', 0.011612021857923498), ('mutations', 0.01092896174863388), ('cancer', 0.01092896174863388), ('transcriptome', 0.01092896174863388), ('company', 0.010245901639344262), ('cells', 0.009562841530054645), ('functional', 0.009562841530054645)], 9: [('source', 0.044657097288676235), ('point', 0.04226475279106858), ('theory', 0.037480063795853266), ('linear', 0.03508771929824561), ('light', 0.03110047846889952), ('interactions', 0.030303030303030304), ('wave', 0.024720893141945772), ('sources', 0.023923444976076555), ('air', 0.0215311004784689), ('waves', 0.02073365231259968), ('water', 0.019936204146730464), ('noise', 0.019138755980861243), ('tank', 0.017543859649122806), ('properties', 0.01594896331738437), ('radiation', 0.013556618819776715), ('line', 0.013556618819776715), ('similar', 0.012759170653907496), ('effective', 0.012759170653907496), ('fluid', 0.011961722488038277), ('diffraction', 0.011164274322169059)], 10: [('data', 0.048013245033112585), ('sequence', 0.025938189845474614), ('genome', 0.02152317880794702), ('available', 0.018763796909492272), ('biological', 0.018763796909492272), ('iscb', 0.018211920529801324), ('conference', 0.017660044150110375), ('gene', 0.01545253863134658), ('sequencing', 0.014900662251655629), ('society', 0.014900662251655629), ('sequences', 0.01434878587196468), ('genes', 0.01434878587196468), ('algorithms', 0.01379690949227373), ('dna', 0.012693156732891833), ('ismb', 0.011589403973509934), ('complex', 0.011589403973509934), ('tools', 0.009933774834437087), ('current', 0.009933774834437087), ('year', 0.009381898454746136), ('application', 0.008278145695364239)], 11: [('experimental', 0.06167400881057269), ('he', 0.041116005873715125), ('his', 0.0381791483113069), ('experiment', 0.03671071953010279), ('pauli', 0.030837004405286344), ('accelerator', 0.030837004405286344), ('stanford', 0.027900146842878122), ('main', 0.024963289280469897), ('laboratory', 0.022026431718061675), ('material', 0.020558002936857563), ('slac', 0.020558002936857563), ('target', 0.01908957415565345), ('during', 0.01908957415565345), ('strong', 0.01908957415565345), ('physicist', 0.014684287812041116), ('factor', 0.014684287812041116), ('haas', 0.014684287812041116), ('history', 0.014684287812041116), ('synchrotron', 0.013215859030837005), ('national', 0.013215859030837005)], 12: [('field', 0.16181229773462782), ('electric', 0.08737864077669903), ('beam', 0.061488673139158574), ('case', 0.038834951456310676), ('relative', 0.02912621359223301), ('charge', 0.02912621359223301), ('measure', 0.02912621359223301), ('dipole', 0.025889967637540454), ('beams', 0.022653721682847898), ('moment', 0.022653721682847898), ('shift', 0.021035598705501618), ('liquid', 0.021035598705501618), ('metal', 0.019417475728155338), ('induced', 0.019417475728155338), ('detect', 0.01779935275080906), ('faraday', 0.014563106796116505), ('rule', 0.014563106796116505), ('deflection', 0.012944983818770227), ('charged', 0.012944983818770227), ('fig', 0.012944983818770227)], 13: [('computational', 0.0873015873015873), ('bioinformatics', 0.08276643990929705), ('biology', 0.05782312925170068), ('large', 0.045351473922902494), ('university', 0.04195011337868481), ('new', 0.03741496598639456), ('program', 0.027210884353741496), ('researchers', 0.026077097505668934), ('sciences', 0.02040816326530612), ('biouml', 0.02040816326530612), ('now', 0.017006802721088437), ('institute', 0.015873015873015872), ('include', 0.015873015873015872), ('networks', 0.013605442176870748), ('contributions', 0.013605442176870748), ('network', 0.012471655328798186), ('before', 0.012471655328798186), ('low', 0.012471655328798186), ('platform', 0.011337868480725623), ('microarray', 0.011337868480725623)]}
Topic-distribution in each document¶
Which topics contribute to each document?
## select a document (by its index)
i_doc = 0
print(model.documents[i_doc])
## get a list of tuples (topic-index, probability)
model.topicdist(i_doc,l=1)
Nuclear_Overhauser_effect
[(0, 0.3881118881118881), (1, 0.17832167832167833), (2, 0.0944055944055944), (3, 0.14335664335664336), (4, 0.013986013986013986), (5, 0.02097902097902098), (6, 0.038461538461538464), (7, 0.03496503496503497), (8, 0.02097902097902098), (9, 0.02097902097902098), (10, 0.02097902097902098), (11, 0.01048951048951049), (12, 0.006993006993006993), (13, 0.006993006993006993)]
Extra: Clustering of documents - for free.¶
The stochastic block models clusters the documents into groups. We do not need to run an additional clustering to obtain this grouping.
model.clusters(l=1,n=5)
{0: [('Nuclear_Overhauser_effect', 1.0), ('Empirical_formula', 1.0), ('Magic_angle_(EELS)', 1.0), ('Fuel_mass_fraction', 1.0), ('Dynamic_mode_decomposition', 1.0)], 1: [('Reactive_empirical_bond_order', 1.0), ('Rotating_wave_approximation', 1.0), ('Rovibrational_coupling', 1.0), ('Complementary_experiments', 1.0), ('Rotational_transition', 1.0)], 2: [('Wave_tank', 1.0), ('Ripple_tank', 1.0), ('Effective_field_theory', 1.0), ('Line_source', 1.0), ('Point_source', 1.0)], 3: [('Philosophical_interpretation_of_classical_physics', 1.0), ('Experimental_physics', 1.0), ('Chemical_physics', 1.0), ('Uncertainty', 1.0)], 4: [('Elevator_paradox_(physics)', 1.0), ('X-ray_crystal_truncation_rod', 1.0), ('Dynamic_nuclear_polarisation', 1.0), ('X-ray_standing_waves', 1.0)], 5: [('Knight_shift', 1.0), ('Anisotropic_liquid', 1.0), ('Electrostatic_deflection_(structural_element)', 1.0), ('Molecular_beam', 1.0), ('Faraday_cup_electrometer', 1.0)], 6: [('Polarizability', 1.0)], 7: [('Einstein–de_Haas_effect', 1.0), ('Holometer', 1.0), ('RRKM_theory', 1.0), ('Fragment_separator', 1.0), ('Pauli_effect', 1.0)], 8: [("Newton's_laws_of_motion", 1.0), ('Photofragment-ion_imaging', 1.0)], 9: [('SLAC_National_Accelerator_Laboratory', 1.0)], 10: [('Bioinformatics', 1.0), ('Folding@home', 1.0)], 11: [('IEEE/ACM_Transactions_on_Computational_Biology_and_Bioinformatics', 1.0), ('BioUML', 1.0), ('Sepp_Hochreiter', 1.0), ('Computational_biology', 1.0), ('Journal_of_Computational_Biology', 1.0)], 12: [('De_novo_transcriptome_assembly', 1.0), ('Enzyme_Function_Initiative', 1.0), ('Foldit', 1.0), ('Premier_Biosoft', 1.0), ('Louis_and_Beatrice_Laufer_Center_for_Physical_and_Quantitative_Biology', 1.0)]}
Application -- Finding similar articles:
For a query-article, we return all articles from the same group
## select a document (index)
i_doc = 2
print(i_doc,model.documents[i_doc])
## find all articles from the same group
## print: (doc-index, doc-title)
model.clusters_query(i_doc,l=1,)
2 Rovibrational_coupling
[(5, 'Rotational_transition'), (10, 'Rotating_wave_approximation'), (12, 'Molecular_vibration'), (16, 'Reactive_empirical_bond_order'), (20, 'Ziff-Gulari-Barshad_model'), (29, 'Complementary_experiments'), (42, "Euler's_laws_of_motion"), (54, 'Law_of_Maximum')]
More technical: Group membership¶
In the stochastic block model, word (-nodes) and document (-nodes) are clustered into different groups.
The group membership can be represented by the conditional probability $P(\text{group}\, |\, \text{node})$. Since words and documents belong to different groups (the word-document network is bipartite) we can show separately:
- P(bd | d), the probability of document $d$ to belong to document group $bd$
- P(bw | w), the probability of word $w$ to belong to word group $bw$.
p_td_d,p_tw_w = model.group_membership(l=1)
plt.figure(figsize=(15,4))
plt.subplot(121)
plt.imshow(p_td_d,origin='lower',aspect='auto',interpolation='none')
plt.title(r'Document group membership $P(bd | d)$')
plt.xlabel('Document d (index)')
plt.ylabel('Document group, bd')
plt.colorbar()
plt.subplot(122)
plt.imshow(p_tw_w,origin='lower',aspect='auto',interpolation='none')
plt.title(r'Word group membership $P(bw | w)$')
plt.xlabel('Word w (index)')
plt.ylabel('Word group, bw')
plt.colorbar()
<matplotlib.colorbar.Colorbar at 0x7391fc40e330>
Relative topical distribution¶
Compare the frequency $f^i_d$ of words from topic $i$ in document $d$ with the expected value across all documents:
$$ \tau_d^i = (f^i_d -\langle f^i \rangle ) / \langle f^i \rangle $$
as in Eq. (10) of Hyland et al.
model.print_overview()
Level 0 has 39 document groups and 84 topics (word groups) Level 1 has 13 document groups and 14 topics (word groups) Level 2 has 2 document groups and 3 topics (word groups) Level 3 has 1 document groups and 1 topics (word groups)
model.topics(l=2)
{0: [('the', 0.1304471892707187), ('of', 0.07362113244466185), ('and', 0.04573804573804574), ('a', 0.04272145448616037), ('to', 0.04141698259345318), ('in', 0.039908686967510494), ('is', 0.03306020953079777), ('for', 0.017447311564958625), ('as', 0.016265133912192736), ('that', 0.013167013167013167)], 1: [('formula', 0.02083944819489287), ('field', 0.009783778495254868), ('when', 0.007729185011251345), ('physics', 0.0067508071617258586), ('energy', 0.0066529693767733095), ('analysis', 0.006359456021915664), ('electron', 0.0059681048821054695), ('mass', 0.0058702670971529205), ('quantum', 0.005576753742295274), ('law', 0.005576753742295274)], 2: [('protein', 0.02806946688206785), ('data', 0.017568659127625202), ('structure', 0.015751211631663976), ('computational', 0.015549273021001616), ('bioinformatics', 0.01474151857835218), ('biology', 0.01029886914378029), ('sequence', 0.009491114701130857), ('proteins', 0.009289176090468497), ('computer', 0.009087237479806139), ('model', 0.009087237479806139)]}
Relative contribution of topics in each document
print("Document title [relative contribution of each topic]\n")
tau_d=model.topicdist_relative(l=2)
for i in range(len(model.documents)):
print(model.documents[i],tau_d[i])
Document title [relative contribution of each topic] Nuclear_Overhauser_effect [-0.13980569 0.57555232 -0.49538507] Quantum_solvent [ 0.06809446 0.22670045 -0.80523634] Rovibrational_coupling [-0.10582642 0.61371087 -0.7424699 ] Effective_field_theory [-0.02818675 0.41790322 -0.72292801] Chemical_physics [-0.09703426 0.67649061 -0.91560241] Rotational_transition [-0.05899796 0.59597849 -0.93784674] Dynamic_nuclear_polarisation [-0.17109143 0.78734273 -0.77754163] Knight_shift [-0.15027516 0.74804814 -0.79955574] Polarizability [-0.11599365 0.61549243 -0.69578105] Anisotropic_liquid [ 0.07901567 0.07904206 -0.5545683 ] Rotating_wave_approximation [-0.00589727 0.46136675 -0.92305402] RRKM_theory [ 0.0982838 0.04050484 -0.57047658] Molecular_vibration [-0.07950154 0.55947082 -0.76092466] Fuel_mass_fraction [-0.2643075 1.11884623 -1. ] Electrostatic_deflection_(structural_element) [-0.11414146 0.58259502 -0.63705565] Magic_angle_(EELS) [-0.07988048 0.54317796 -0.72541882] Reactive_empirical_bond_order [-0.09975616 0.29868139 -0.12231505] Photofragment-ion_imaging [ 0.0198688 0.35720495 -0.83570142] Molecular_beam [ 0.08143498 0.2019464 -0.82022936] McConnell_equation [-0.29189597 0.89371881 -0.39866721] Ziff-Gulari-Barshad_model [-0.04359975 0.48823723 -0.79174622] Empirical_formula [-0.14218254 0.65093435 -0.63920032] Pauli_effect [-0.01879878 0.38655376 -0.70472799] SLAC_National_Accelerator_Laboratory [-0.1263539 0.49296101 -0.39155229] Newton's_laws_of_motion [-0.0521876 0.54361795 -0.86351071] Uncertainty [ 0.03969931 0.31319855 -0.84310705] Ripple_tank [-0.05327744 0.50817833 -0.78496401] Particle-induced_X-ray_emission [-0.10179632 0.54433319 -0.61923749] Experimental_physics [-0.01724507 0.37007 -0.67840201] Complementary_experiments [ 0.04937238 0.34465241 -0.95594632] Elevator_paradox_(physics) [-0.0448058 0.54108433 -0.89484891] Wave_tank [ 3.37441514e-04 1.86946265e-01 -3.87531413e-01] Philosophical_interpretation_of_classical_physics [ 0.04185242 0.28716134 -0.80003185] X-ray_crystal_truncation_rod [-0.1673191 0.69038357 -0.59610373] Faraday_cup_electrometer [-0.05472787 0.46894801 -0.69680699] Line_source [-0.09171466 0.3276315 -0.22190412] X-ray_standing_waves [-0.20085402 0.76018736 -0.57405594] Point_source [-0.07701307 0.47154086 -0.59176321] Einstein–de_Haas_effect [-0.20167422 0.75854692 -0.56660699] List_of_Directors_General_of_CERN [-0.29189597 -0.02886215 1.50555331] Fragment_separator [-0.0288859 0.23010795 -0.33185245] Dynamic_mode_decomposition [-0.03165261 0.30813099 -0.47918755] Euler's_laws_of_motion [-0.14191624 0.81089691 -0.97068457] Holometer [ 0.04529643 0.24629358 -0.73274098] Quantum_oscillations_(experimental_technique) [-0.12624695 0.60774493 -0.62899776] Bioinformatics [-0.06787589 -0.68556497 1.75125668] Computational_biology [-0.12109851 -0.61961366 1.8787841 ] Folding@home [ 0.38049065 -0.81236658 -0.20812142] K-mer [-0.15555296 -0.23997907 1.26589169] Journal_of_Computational_Biology [-0.31524006 -0.70118835 3.0088853 ] Foldit [ 0.05021632 -0.7926396 1.38726028] Premier_Biosoft [-0.13444178 -0.66221292 2.03280888] International_Society_for_Computational_Biology [-0.13505951 -0.84563171 2.41444798] Louis_and_Beatrice_Laufer_Center_for_Physical_and_Quantitative_Biology [-0.05067372 -0.73853981 1.77538213] Law_of_Maximum [ 0.08640619 0.27711279 -1. ] Enzyme_Function_Initiative [-0.19765502 -0.75099029 2.52918962] SnoRNA_prediction_software [-0.09787215 0.0825799 0.31438862] Sepp_Hochreiter [-0.21831309 -0.53960872 2.19523006] Aureus_Sciences [-0.04792735 -0.54299395 1.35816782] IEEE/ACM_Transactions_on_Computational_Biology_and_Bioinformatics [-0.15555296 -0.91555323 2.66028658] Knotted_protein [-4.65133831e-02 -1.64677460e-05 2.30449943e-01] BioUML [-0.04992331 -0.84526944 1.99195632] De_novo_transcriptome_assembly [-0.11545809 -0.80195042 2.22718852]
Documents associated to each of the topics
model.docs_of_topic(l=2, n=10)
Topic 0 [0.38049064650371833, 'Folding@home'] [0.0982838041661571, 'RRKM_theory'] [0.08640618551980356, 'Law_of_Maximum'] [0.08143498489053753, 'Molecular_beam'] [0.07901566725096126, 'Anisotropic_liquid'] [0.0680944560439678, 'Quantum_solvent'] [0.050216316701558475, 'Foldit'] [0.04937237968912177, 'Complementary_experiments'] [0.04529642764936884, 'Holometer'] [0.04185241980407554, 'Philosophical_interpretation_of_classical_physics'] Topic 1 [1.1188462256850868, 'Fuel_mass_fraction'] [0.8937188142060465, 'McConnell_equation'] [0.8108969113487717, "Euler's_laws_of_motion"] [0.7873427314160522, 'Dynamic_nuclear_polarisation'] [0.7601873593581843, 'X-ray_standing_waves'] [0.7585469237949426, 'Einstein–de_Haas_effect'] [0.7480481361901968, 'Knight_shift'] [0.6903835734029293, 'X-ray_crystal_truncation_rod'] [0.6764906101473233, 'Chemical_physics'] [0.6509343508462969, 'Empirical_formula'] Topic 2 [3.008885298869144, 'Journal_of_Computational_Biology'] [2.6602865772283484, 'IEEE/ACM_Transactions_on_Computational_Biology_and_Bioinformatics'] [2.529189622081383, 'Enzyme_Function_Initiative'] [2.4144479769727702, 'International_Society_for_Computational_Biology'] [2.2271885232399193, 'De_novo_transcriptome_assembly'] [2.195230060431999, 'Sepp_Hochreiter'] [2.0328088782749174, 'Premier_Biosoft'] [1.9919563236843087, 'BioUML'] [1.8787841018585545, 'Computational_biology'] [1.7753821299863304, 'Louis_and_Beatrice_Laufer_Center_for_Physical_and_Quantitative_Biology']