The purpose of morpholexical analysis is to process the individual words in a sentence to recognize their standard forms, their grammatical categories and their semantic relationships with other words in a lexicon. Morpholexical analysis also performs the processing of collocations and idioms.
Two semantic relations between terms are currently considered: synonymy and hyponymy/ hypernymy. The predicate synonym(x,y) means that the term `y' is a synonym of the term `x' in a particular lexical category. The predicate hyponym(x,y,d) means that the term `y' is an hyponym (a specialization) of the term `x' at a d-distance in a thesaurus in a particular lexical category. The predicate hypernym(x,y,d) means that the term `y' is an hypernym (a generalization) of the term `x' at a d-distance in a thesaurus in a particular lexical category.
Just after morpholexical analysis, both syntactic and semantic analysis of software descriptions are performed interactively by using a definite clause grammar. The defined grammar implements a subset of the grammar rules for imperative sentences in English [12] and is considered broad enough for our initial experimental purposes. The grammar supports the case system and states domain- independent knowledge of the English language through a set of syntactic and semantic rules. The classification mechanism uses the grammar to parse software descriptions.
A set of semantic structures is generated as a result of the parsing process, representing the internal structures of software descriptions. A language for modelling these semantic structures is shown in Figure 2.
Case_frame --> FRAME Frame_name Hierarchical_link CASES Case_list.
Hierarchical_link--> IS_A Frame_name | IS_A_KIND_OF Frame_name
Case_list --> Case (Case_list)
Case --> Case_name Facet
Case_name --> Semantic_case | Other_case
Semantic_case --> Action | Agent | Comparison | Condition |
Destination| Duration | Goal | Instrument |
Location | Manner| Purpose| Source | Time
Other_case --> Modifier | Head | Adjective_modifier |
Participle_modifier | Noun_modifier
Facet --> VALUE Value | DOMAIN Frame_name |
CATEGORY Lexical_category
Value --> string | Frame_name
Lexical_category --> verb | adj | noun | adv |component_id | string
The language defines a frame-like classification scheme for software components based on the defined semantic cases. The classification scheme consists of a hierarchical structure of generic frames (`IS-A-KIND-OF' relationship). Frames that are instances of these generic frames (`IS-A' relationship) implement the indexing units of software descriptions.
Major generic frames for the Knowledge Base are shown in Figure 3.
FRAME verb_phrase IS_A_KIND_OF root_frame
CASES
Action CATEGORY verb
Agent DOMAIN component
Comparison DOMAIN noun_phrase
Condition DOMAIN noun_phrase
Destination DOMAIN noun_phrase
Duration DOMAIN noun_phrase
Goal DOMAIN noun_phrase
Instrument DOMAIN noun_phrase
Location DOMAIN noun_phrase
Manner DOMAIN noun_phrase
Purpose DOMAIN verb_phrase
Source DOMAIN noun_phrase
Time DOMAIN noun_phrase.
FRAME noun_phrase IS_A_KIND_OF root_frame
CASES
Adjective_modifier CATEGORY adj
Participle_modifier CATEGORY verb
Noun_modifier CATEGORY noun
Head CATEGORY noun.
FRAME component IS_A_KIND_OF root_frame
CASES
Name CATEGORY component_id
Description CATEGORY string
.
. {Other information associated to
. the component, e.g. source code,
executable examples, reuse attributes, etc}
The generic frames model semantic structures associated to verb phrases, noun phrases and the information associated to software components, like name, description, source code, executable examples, etc.
Semantic cases are represented as slots in the frames. `Facets' are associated to each slot in a frame, describing either the value of the case or the name of the frame where the value is instantiated (`value' facet); the type of the frame that describes its internal structure (`domain' facet) or the lexical category of the case (`category' facet). For instance, the `Location' slot in the verb phrase frame has a `domain' facet indicating that its constituents are described in a frame of type `noun phrase'.
Through the parsing process, the interpretation mechanism maps the verb, the direct object and each prepositional phrase in a sentence into a semantic case, based on both syntactic features and identified case generators.
Figure 4 shows the indexing structure for the `grep' family of Unix commands built from the description `search a file for a string'. An instance of the verb_phrase frame is generated by instantiating the slots corresponding to the semantic cases identified in the description ('Action', `Location' and `Goal'). These cases have an associated `value' facet indicating either the value of the slot (as `search' for the `Action' case) or the name of the instance frame with its value (grep_component, grep_noun_phrase_1 and grep_noun_phrase_2 for the semantic cases `Agent', `Location' and `Goal' respectively).
FRAME verb_phrase_1 IS_A verb_phrase
CASES
Agent VALUE grep_component
Action VALUE `search'
Location VALUE grep_noun_phrase_1
Goal VALUE grep_noun_phrase_2.
FRAME grep_noun_phrase_1 IS_A noun_phrase
CASES
Head VALUE `file'.
FRAME grep_noun_phrase_2 IS_A noun_phrase
CASES
Head VALUE `string'.
FRAME grep_component IS_A component
CASES
Name VALUE `grep'
Description VALUE `search a file for a string'.
Figure 4 - An indexing structure for the "grep" command:
"search a file for a string"