Topic Browsing has been enhanced. When you’re viewing a grant, and navigate to a topic, the co-occurring words are highlighted in the topic. The same is true in the reverse direction (click a grant).
We have produced a topic model of a subset of the perspectives corpus using Mallet. Our model analyses the derived field (see grant EP/H021000/2 for example) in order to maintain consistency with other RP applications.
We have begun topic modelling of the EPSRC ICT portfolio using Mallet. Our initial data can be downloaded below. the *-data files have grant numbers as the filename (‘/’ replaced with ‘_’) and file contents are the grant abstract without any processing applied.
Our output is based on importing the data to Mallet using the ‘–stoplist-file stops.txt’ option (see stops.txt file below containing the Cornell Stop Words. We have processed the data for 50, 100, 150, 200, 250 and 300 topics.