Coimbatore, Tamil Nadu, India. [email protected]
Abstract - Anaphora is mostly attractive phenomenon in computational linguistics for resolving anaphora with preceding and succeeding referent. The ability to perform Anaphora Resolution is important in NLP application. Traditional research focus on resolving particular type of anaphora only, no one integrates methods or procedure to resolve all type of anaphora. In this paper, we identify all type of anaphora with layered or step by step approach so that every one utilize anaphora paradigm in their application. We proposed a new enhanced framework in which all required rules of resolution are perform, the new system extract most accuracy antecedents of anaphora.
Keywords – Anaphora; Anaphora Resolution; Anaphora
Resolution Process;, antecedent
Anaphora is linguistic phenomenon in which a word in one sentence may refer to word in previous sentence. The first word is Anaphora and the second word is antecedent. The mechanism of interpret antecedent of Anaphora is known as Anaphora Resolution(AR). Normally, Anaphora is a backward relation to a referent previously mentioned in the text but also forward relation exists in natural language.
John is a clever boy, he secured 100% marks in
The word ‘he’ is anaphora and the word ‘John’ is antecedent. AR is the problem of resolving earlier reference of the word phrase. Traditionally, there are three type of anaphoric links are pronominal anaphora, definite noun phrase anaphora and one anaphora. The most common anaphora is known as pronominal anaphora (1). Definite noun phrase is known as when the antecedent referred by the definite noun phrase representing their same concept (2). One anaphora is a case when anaphoric link express as a one noun phrase(3). But classically, we can classify the anaphora as Quantify anaphora means that a sentence have word like one, two, many, all, etc., with form of both subject and object(4), a sentence have first, lost and latter are known as ordinal anaphora(5), a sentence has ‘wh’ phrase known as question anaphora(6), if the anaphora refer to any particular event is known as event anaphora(7) and pronoun does not refer anything known as pleonastic anaphora(8). There are other type of anaphora in a sentence for forward reference is known as cataphora(9). Both anaphora and cataphora are known as endophora. There is a possibility when the anaphora and more than one of the preceding entities have the same referent known as co reference anaphora (10). The noun phrase refer to animal or human being is known as animacy anaphora(11).
Dr. V. Karthikeyani, Department of Computer Science, Govt. Arts College,
Rasipuram, Tamilnadu, India. [email protected]
1) Peter own a car and he is wealthy
2) Students attend the seminar from different city, they enjoy the full session
3) If you can’t attend the seminar in the morning, you can go for after noon one.
4) He chose one blue car
5) Radha’s car came in first
6) Rani asked where is the car key
7) In 1980, he was dead.
8) It is 3’O clock
9) When he enters into the road, Suresh saw the accident.
10) Raja goes to the shop, he purchases all the vegetables and he missed his wrest watch.
11) He owns a horse. Its color is white.
There are inter sentinel and intra sentinel anaphora. Inter sentinel anaphora means that the antecedent of anaphor is different sentence but the antecedent of anaphor in same sentence is known as intra sentinel anaphor. Identifying all the co reference chain is known as co reference resolution.
Understanding a Text is very useful in Machine Translation, Knowledge Extraction and Question Answering system. Understanding Text means that identify the subject, object, verb in a sentence. Traditionally, processing of Text is difficult and complex task in natural language processing. Many AR systems are introduced to interpret the noun phrases only and some systems to interpret only up to two sentences. Because of the complexity of the tasks involve in identifying text segment. In this paper, we lay down the framework to identify all anaphora mentioned here are resolve for which we describe step by step task in which each task perform particular function related to rules of resolution.
The rest of the paper structured as section II consists short summary of existing system. Section III evolves proposed system framework and Section IV conclude with future work.
II. RELATED WORK
In this paper [thayan & dhang], they identify only pronominal, Animacy and AR factors. But they didn’t identify Co-reference chain resolution. They also laid down
step by step process in which form the candidate set and extract AR factors and eliminate ineligible noun phrase. But this system performs search scope up to two or three sentence level only, not the long scope.
In this paper[Helene & David 2012], they design first version of framework in which all type of anaphora are identified except animacy. The name of animal or human being is more concentrate in AR community to categorization of candidate set. It is easy to accept or reject the discourse in candidate set. So, we concentrate much effort to be taken against to recognize the animacy agreement. But this system constructed rule based AR system.
reprocess library has following components,
II. ANAPHORA RESOLUTION FRAMEWORK
We proposed a global workflow framework to process the text using anaphora resolution. We present a new enhanced framework to guide in identify possible anaphoric links in the English sentence. We focus the framework into step by step task and each step performs various subtasks to identify and resolve anaphoric link. In this framework, we classify the task in following manner, A. Preprocess, B.Anaphora Resolution Process, C. Computation Strategy
Natural Language Text usually analyzed with preprocess library to layout valuable information segment or discourse that provide possible anchors to recognize the text.
1) Sentence analyzer: It analyzes the sentence boundary based on the delimiters like commas, dots, question mark or exclamation mark etc. Then split the sentence into tokens up to one of these delimiters. A token may be a word or a digit or punctuation.
2) Part of speech tagger: It has fixed set of grammatical tag which contains conventional POS such as noun, verb, adjective, adverb, conjunction and pronoun. POS is provided grammatical meaning of each word.
3) Parser: It forms various group of words in a sentence into appropriate phrase such as noun phrases, prepositional phrases and verb phrases. It is useful to identify the structure of sentence based on these phrases. The output of parsing is a parse tree in which they provide valuable linkage between verbs and their component.
4) Dependency analyzer: It identifies the dependency between two words in a sentence. Particularly, dependency form between subject and object of the word. The output of Dependency is a graph in which nodes are words and directed edges are link between the dependency words.
B. Anaphora resolution process
Process of Anaphora resolution means that selection of suitable antecedent from the Candidate set which are already prepare in preprocess stage. The following task to be performs in this state, 1. Search Scope, 2. Determination of anaphors, 3. Point out the Candidates for antecedent, 3. Selection of antecedent based on Anaphora resolution
1) Search Scope: Search Scope is used to locate the antecedent of anaphor within the sentence boundary, usually search scope set current to N preceding sentences. Limit of the boundary depends upon the type of anaphor to identify.
Example, pronominal anaphor take two or three sentence but definite noun take long sentence scope.
2) Determination of anaphors: First, determinate of non- anaphoric phrase are found to eliminate from the candidate set then name phrase are to be consider further process.
a) Eliminate, pleonastic pronouns ‘It’ does not refer anything in a sentence.
b) Eliminate, definite noun phrase, because every definite noun phrase does not anaphoric, it may be unique entity, description in general way, specific description.
Finally, find out suitable names recognize as potentially anaphoric of preceding referring phrase.
3) Point out the Candidates for antecedent: Once anaphor are detected, identify the suitable candidates of their antecedent. In typical system, all noun phrases preceding an anaphor within search scope are initially considered as candidates for antecedent. There are two approaches to identify the location of Candidates for antecedent, linear model and Hierarchical Model.
Linear model are search current or linear preceding candidates for antecedent. But Hierarchical model search current or hierarchical preceding candidates for antecedent.
4) Resolve anaphor based on Anaphora Resolution factors: After detection of anaphor, we must resolve them by selecting their antecedent from possible candidate set. The resolution rules used in the resolution process based on different source of knowledge known as Anaphora resolution factors. There are two factors, constraint and preference.
Constraints are said to be essential conditions that are forced on relation between anaphor and its antecedent. But preferences are most likely candidates for antecedent of previous clause. Constraints in Anaphora resolution are
gender and number agreement, C-command constraints and selection restrictions. But preferences in anaphora resolution are recency, center preference and syntactic parallelism.
The division of factors into constraints and preferences has led to distinguishing between constraint-based and preferences-based architectures in anaphora resolution
1) Constraints: There are many constraints will be layout by examples.
a) Gender and number agreement: This constraint requires that anaphors and their antecedents must agree in number and gender
John told Mukesh and his friends that she was in love.
b) Syntactic binding theories' constraints: output in Government and Binding Theory (GB) and Lexical Functional Grammar has provided useful constraints on the anaphors and their antecedents which have been successfully used in anaphor resolution.
c) Semantic consistency: This constraint stipulates that if satisfied by the anaphor, semantic consistency constraints must be satisfied also by its antecedent. Vincent removed the diskette from the computer and then disconnected ii.
vinoth removed the book from the shelf and then read it.
2) Preferences: Preferences, as opposed to constraints, are not mandatory conditions and therefore do not always clasp. We shall illustrate three preferences: syntactic parallelism, semantic parallelism and center of attention.
a) Syntactic parallelism: Syntactic parallelism benefited when other constraints or preferences are not in a position to propose an unambiguous antecedent. This preference is given to NPs with the same syntactic function as the anaphor.
b) Semantic parallelism: This is a useful preference but only systems which can automatically identify semantic roles, can employ it. It says that NPs which have the same semantic role as the anaphor, are favoured.
Akash gave the gift to mari. Jose also gave him a letter. Akash gave the gift to mari. He also gave Jose a letter.
c) Centering: Although the syntactic and semantic criteria for the selection of an antecedent are very strong, they are not always sufficient to distinguish between a set of possible candidates. Moreover, they serve more as filters to eliminate unsuitable candidates than as proposers of the most likely candidate. In the case of antecedent ambiguity, it is the most salient element among the candidates for antecedent which is usually the frontrunner.
The factor constraint eliminate from the candidate set but preference which favor with certain candidates over others.
C. Computational Strategy: Computational strategy is the algorithm, in which track down or compute the antecedents of anaphora in given sentence. There are three approaches
for track down the antecedents of anaphora. These approaches based upon the period at which they are developed. In 1980’s, the algorithm under Traditional approach, in 1990’s come under the group Alternative approach and there is Modern approach or Knowledge Poor approach after 2000.
The Traditional and Alternative approach are also known as Knowledge Rich approach because of the approached based purely on domain knowledge.
1) Traditional approaches: Traditional approach those classic approaches that integrate knowledge sources in which terminate unwanted candidates until an obtain set of most suitable candidates. This following are traditional approach, they are developed in the year of 1980’s,
a) Shallow processing approach: Developed in the 1980 by Carter, in this approach, resolve anaphoric by generating paraphrase in which show various interpretation in simple English story.
b) Distributed architecture: It is developed in the year 1988 by Rich & Luper Foy, in this approach, they describe pronominal anaphora only and scoring procedure used to track down the antecedents.
c) Multi-strategy approach: J. Carbonell and R. Brown developed in the year 1988, they used best accomplished set of strategies framework with integrated knowledge source sentential syntax, case frame semantics.
d) Scalar product coordinating approach: Developed in the year 1994 by Celia Rico perez, in the approach, measure the distance between anaphor and candidates of antecedents.
e) Combination of linguistic and statistical methods: R.Mitkov developed in the 1996, in this approach incorporate traditional methods with new statistical methods to track down the antecedents.
f) Syntax-based approach: Lappin and Leass developed in the year 1994, in this approach, identifying the noun phrase antecedents of third person pronouns and lexical anaphors.
The above mentioned approaches normally used rule based, algorithmic and knowledge rich approach. Evaluation usually carried out by hand with small set of sentences.
2) Alternative approaches
a) knowledge independent approach: It is based on preference according to already existing equal pattern in the text, according to repetition in preceding sentences and also according to syntactic position. Track down the antecedent with the highest value based on the preference. The algorithm uses maximum 1904 consecutive sentences from eight chapters of two different computer manuals.
b) Statistical/corpus processing approach:They report on a corpus-based approach for disambiguating pronouns which is an alternative solution to the expensive implementation of full-scale selectional constraints knowledge. They perform an experiment to resolve references of the pronoun "it" in sentences randomly selected from the corpus.
c) Connolly machine learning approach: Describe a machine learning approach to anaphoric reference. Their idea is to cast anaphoric reference as a classification problem for which a classifier can be discovered empirically using traditional learning methods. In order to apply machine learning techniques to the selection of the antecedent, the authors claim that the best defined problem which is suited to learning algorithms is the classification problem. The approach adopted in their research is to decompose the candidate selection problem into separate two-class classification problems. Each classification problem is defined on a pair of candidates and an anaphor, where the classes correspond to choosing one of the candidates as a "better" antecedent of the anaphor. By applying this classifier to successive pairs of candidates, each time retaining the best candidate, they effectively sort the candidates, thus choosing the best overall.
d)Aone & Bennett's machine learning approach: describe an approach to building an automatically trainable anaphora resolution system. They tag anaphoric links in corpora of Japanese newspaper articles and use them as training examples for a machine learning algorithm. The author employ different training methods using three parameters: anaphoric chains, anaphoric type identification and confidence factors.
e)An uncertainty-reasoning approach:In this approach presents an Artificial Intelligence approach based on uncertainty reasoning. The main idea is that the search for an antecedent can be regarded as an affirmation of the hypothesis that a certain noun phrase is the correct antecedent.
f) Two-engine approach):The two-engine approach is based on the interactivity of two engines which, separately, have been successful in anaphora resolution. The first engine incorporates the constraints and preferences of an integrated approach for anaphor resolution in traditional approach, while the second engine follows the principles of the uncertainty reasoning approach described in alternative approach. The combination of a traditional and an alternative approach aims at providing maximum efficiency in the search for the antecedent. The two-engine strategy evaluates each candidate for anaphor from the point of view of both the integrated approach and the uncertainty reasoning approach. If opinions coincide, the evaluating process is stopped earlier than would be the case if only one engine were acting.
g) Situational semantics approach):A situation semantics approach for anaphor resolution has been proposed by Tin
& Akman in the year 1994 in which pronominal anaphors
are resolved in a situation-theoretic computational environment by means of inference rules which operate on unify with utterance situations.
h)Using punctuation: use punctuation within a DRTframework as an additional constraint for anaphor resolution. They illustrate their approach with the example Raja and Rani write books on India. If her books are best- sellers then they are jealous.
3)Knowledge-poor anaphora resolution: Most of the approaches outlined here rely heavily on linguistic knowledge. One of the disadvantages of developing a knowledge-based system, however, is that it is a very labour-intensive and time-consuming task. Consequently, the need for inexpensive and robust systems, possibly suitable for unrestricted texts, fuelled renewed research efforts in the field and a clear trend towards corpus-based and knowledge-poor approaches was established.
a)Kennedy and Boguraev's approach without a parse approach is a modified and extended version of that developed by Lappin and. Kennedy and Boguraev's system does not require "in-depth, full" syntactic parsing but works from the output of a part of speech tagger, enriched only with annotations of grammatical function of lexical items in the input text stream.
b)Robust, knowledge-poor approach:Mitkov's robust approach works as follows: it takes as an input the output of a text processed by a part-of-speech tagger, identifies the noun phrases which precede the anaphor within a distance of 2 sentences, checks them for gender and number agreement with the anaphor and then applies the so-called antecedent indicators to the remaining candidates by assigning a positive or negative score.
c)CogNIAC is a system developed at the University of Pennsylvania to resolve pronouns with limited knowledge and linguistic resources. The main assumption of CogNIAC is that there is a subclass of anaphora that does not require general purpose reasoning. The system requires for pre- processing its input sentence detection, part-ofspeech tagging, simple noun phrase recognition, basic semantic category information.
CogNIAC is built on the following core rules:
1) Unique in discourse: if there is a single possible antecedent i in the read-in portion of the entire discourse, then pick i as the antecedent
2) Reflexive: pick the nearest possible antecedent in the read-in portion of current sentence if the anaphora is a reflexive pronoun
3) Unique in current and prior: if there is a single possible antecedent i in the priorsentence and the read-in portion of the current sentence, then pick i as the antecedent
4) Possessive pronoun: if the anaphor is a possessive pronoun and there is a single exact string match i of the possessive in the prior sentence, then pick i as the antecedent
5) Unique current sentence: if there is a single possible antecedent i the read-in portion of the current sentence, then pick i as the antecedent
6) If the subject of the prior sentence contains a single possible antecedent i, and the anaphor is the subject of the current sentence, then pick i as the antecedent
8 The identification of clauses in complex sentences is done heuristically
CogNIAC operates as follows. Pronouns are resolved from left to right in the text. For each pronoun, the rules are applied in the presented order. For a given rule, if an antecedent is found, then the appropriate annotations are made to the text and no more rules are tried for that pronoun, otherwise the next rule is tried. If no rules resolve the pronoun, then it is left unresolved anaphora in the particular sentence.
III Conclusion and Future Work
Normally, most of the Anaphora resolution system, they do not support fully automated, they need some human intervention at any one of the part. All described approach in earlier system carried out some kind of anaphora only, they need not cover maximum anaphora link. In addressing this challenge, we implemented this framework, rely what will be done in preprocess and main process. This system covers all existing methodology but highlight where the best task will be available. This framework gives awareness of anaphora apply to NLP application.
Understanding Text means, understanding meaning of context or concept, i.e., doesn’t understanding meaning of word. Anaphora system produces most likelihood antecedent after development of machine learning approach. The knowledge poor strategy provides best results compare to previous knowledge rich strategy. The computational strategy provide maximum share to produce most accurate antecedent. But not least, preprocess task is base for computational strategy perform well good manner.
In this paper we present a novel enhanced framework for
Anaphora resolution process. Proposed system able to recognize Pronoun phrase, Intra and Inter sentinel anaphora, Animacy Agreement and co reference chain. Our system
recognizes maximum anaphora so that it produces more
accuracy antecedents compare to prior anaphora resolution process models.
Our future enhancement is design an AR system in terms of multilingual system that mainly resolving large scales co reference anaphora.