Topic Browsing has been enhanced. When you’re viewing a grant, and navigate to a topic, the co-occurring words are highlighted in the topic. The same is true in the reverse direction (click a grant).
Monthly Archives: January 2012
Topic Modelling
We have produced a topic model of a subset of the perspectives corpus using Mallet. Our model analyses the derived field (see grant EP/H021000/2 for example) in order to maintain consistency with other RP applications.
Topic Modelling Resources
We have begun topic modelling of the EPSRC ICT portfolio using Mallet. Our initial data can be downloaded below. the *-data files have grant numbers as the filename (‘/’ replaced with ‘_’) and file contents are the grant abstract without any processing applied.
Our output is based on importing the data to Mallet using the ‘–stoplist-file stops.txt’ option (see stops.txt file below containing the Cornell Stop Words. We have processed the data for 50, 100, 150, 200, 250 and 300 topics.