Abstract:
Disambiguating name mentions in text is a crucial task in Natural Language Processing, especially in entity linking. The credibility and efficiency of such systems largely depend on this task. For a given name entity mention in the text, there are many potential candidate entities that may refer to this mention in the knowledge base. Therefore, it is very difficult to assign the correct candidate from the whole candidate entities set to this mention. To solve this problem, collective entity disambiguation is a prominent approach. In this thesis we present a new algorithm called CPSR for collective entity disambiguation which is based on the graph approach and semantic relatedness. A clique partitioning algorithm is used to find the best clique that contains a set of candidate entities. These candidate entities provide the answers to the corresponding mentions in the disambiguation process. To evaluate our algorithm, we carried out a series of experiments on seven well-known datasets namely, AIDA/CoNLL2003-TestB, IITB ,MSNBC, AQUAINT, ACE2004, Cweb and Wiki. The Kensho Derived Wikimedia Dataset (KDWD) is used as the knowledge base for our system. From the experimental results our CPSR algorithm outperforms both the baselines and other well known state of the art approaches.