Wikipedia and Semantic Document Representation
This talk will introduce the process of “wikification”; that is, automatically and judiciously augmenting a plain-text document with pertinent hyperlinks to Wikipedia articles — as though the document were itself a Wikipedia article. We first describe how Wikipedia can be used to determine semantic relatedness, and then introduce a new, high-performance method of wikification that exploits Wikipedia’s 60 M internal hyperlinks for relational information and their anchor texts as lexical information, using simple machine learning.