SENAN и GEOBASE - комментарии с сайта Пролога

SENAN

Sentence Analyzer

The file SEN_AN is a very good example of an English sentence analyzer. The program can, just as the GEOBASE, easily be modified to be able to parse more types of sentences. The input to the semantic analyzer is of course a sentence, and the output is a list of Prolog clauses, that shows that every part of the sentence has been recognized as a grammatical component: verbp (for verbphrase), nounp (for nounphrase) etc.The program SEN_AN.PRO demonstrates the basics of how a programmer can put together an English sentence analyzer in Prolog. When run, the sentence analyzer prompts the user to enter an English sentence. The program then attempts to parse (break apart) the sentence into a form that the analyzer can understand.The resulting data object is known as a parse tree. After the parser creates a parse tree successfully, it can pass the tree on to a routine that specifies a task to be performed. SEN_AN passes the parse tree on to a routine that draws a graphic representation of the user''s sentence input. If SEN_AN cannot parse the sentence successfully, it will display an error message indicating the failure. If you enter a word that is not part of the dictionary, SEN_AN will show an error message indicating the word not recognized.The syntax shows that an English sentence (In the Sen_an microworld) is made up of a noun phrase and a verb phrase. A noun phrase is made up of an optional determiner, followed by a noun, followed by a relational clause. A determiner can be empty (no determiner), or it can be one of the determiners found in the dictionary. A noun must be listed in the dictionary. The relational clause can be empty, or it can be a relative followed by a verb phrase. A verb phrase can either be a verb or a verb followed by a noun phrase.For example, if you enter the sentence a mother loves her children, the parser will break this sentence down into the following Prolog data object:sent(nounp(determ("a"), "mother", none),verbp("loves", nounp(determ("her"), "children", none)))This data object shows that the sentence is made up of the noun phrase a mother and the verb phrase loves her children. In order to parse the sentences SEN_AN uses a context-free grammar. A more complex grammar can be specified, which would enable the parser to break down more complex sentences. Take a look at the parser code; you may want to start creating a parser that accepts more complex English sentences.SEN_AN.PRO uses a limited set of English grammar rules to parse sentences. More complex sentences will need to have more rules of the English language coded into the parser. These rules, known as productions, are the heart of the analyzing procedure. Detailed productions make for a more thorough parser (or analyzer). Although intricate productions can be created to deal with the more complicated parts of English, the complexity of the English language creates a domain in which even the most specific productions have exceptions. For this reason, natural language processing (or NLP--not to be confused with Neuro Linguistic Programming) is a heavily-studied branch of Artificial Intelligence.

GEOBASE

Examining Geobase

The database contains the following information:Information about states:1. Area of the state in square kilometers 2. Population of the state in citizens 3. Capital of the state 4. Which states border a given state 5. Rivers in the state 6. Cities in the state 7. Highest and lowest point in the state in meters Information about rivers: 1. Length of river in kilometers Information about cities: 1. Population of the city in citizens Try to ask a few random questions. If Geobase doesn''t understand a question, it will tell you the word it can''t parse.Take a look at the following sample queries. What are the states?
What are the cities of New York?
What is the highest mountain in California?
What are the names of the states which border New Mexico?
Which rivers run through the state that border the state with the capital Olympia?
The language is defined in the file GEOBASE.LAN, and the database is defined in GEOBASE.DBA.Be imaginative! Geobase will understand many English sentences, but occasionally you will find a sentence that Geobase simply does not recognize. This is the dilemma of a natural language interface. If you find a question, you feel Geobase should be able to answer but can''t, you will need to improve Geobase so that it understands the query! The Idea Behind GeobaseGeobase illustrates one way of implementing a natural language interface to a database. However, developing a complete natural language interface to a database is a very complicated task, as natural languages are far more complex than programming languages. There are far more words in the natural language, and natural languages have difficult ambiguities. But Visual Prolog is extremely well suited for natural language processing, because the backtracking mechanism can be used to handle ambiguities.In Geobase the stored data is a USA geographical database. However, you could use the same approach for other types of data.The key idea behind Geobase is simple: The user views the database as a network of entities connected by associations. This is known as an entity association network. The entities are the items stored in the database. In Geobase the entities are states, cities, capitals of states, rivers, lakes, etc. The associations are words that connect the entities in queries. For example:Cities in the state of California. Here the two entities, cities and state, are connected by the association in. The word "the" is just ignored here, and California is regarded as an actual constant for the state entity.Geobase is designed to accept simple English. This means that, rather than worrying whether a sentence is grammatically correct, Geobase tries to extract the meaning by attempting to match the user''s query with the entity association network.Queries can be combined to form rather complex queries. For example:which rivers run through states that border the state with the capital Austin?In order to make the query match the entity association network, Geobase must simplify the various forms of the query. This occurs while Geobase "parses" the query.The first step is to ignore certain words, such as:which, is, are, the, tell, me, what, give, as, that, please to, how, many, live, lives, living, there, do, doesThis step makes the query look like this: rivers run through states border state with capital Austin?The next step is to find the internal names for entities and associations. Entities can have synonyms, and the query can use plural forms of the entity names. Associations can consist of several words, and they can also have synonyms. After these conversions, the query looks like this: river in state border state with capital Austin?Geobase can now classify the words as either entities or associations and group the query into subqueries (E=entity, A=association, C=constant):river in state border state with capital Austin?E A (E A (E A E C))Geobase can then evaluate the query by first finding the name of the state with the capital Austin, then finding all the states that border this state, and finally looking up which rivers run through these states. Adapting the Geobase IdeaGeobase is a natural language query interface to an existing database. You can adapt the Geobase mechanisms to your own natural language query interface; we explain how in this section.Create Your DatabaseThe first thing you need to do is to create your database. How the database is stored or was created, has nothing to do with Geobase. You can use internal database sections or Visual Prolog''s external database system, or you could even access some other database files by means of the Visual Prolog Toolbox. Geobase accesses the actual database through the predicates (db) and b(ent).For simplicity, the geographical database is stored in an internal database section, which you can load from disk by calling the (consult) predicate. Here are some sample declarations from the geographical database:/*state(Name,Abbreviation,Capitol,Area,Admit,Population,City,City,City,City*/state(string,string,string,real,real,integer,string,string,string,string)/*city(State,Abbreviation,Name,Population) */city(string,string,string,real)/*river(Name,Length,StateList)*/river(string,integer,list)/*border(State,Abbreviation,StateList) */border(string,string,list)/*etc.*/Porting GeobaseThe first step in porting Geobase to your own database is to draw the entity association network. The next step is to model this network with the database predicate schema:schema(Entity,Assoc,Entity)Here are some examples of schema clauses from Geobase:schema("capital","of","state")schema("state","with","capital")schema("population","of","state")schema("state","with","population")schema("area","of","state")schema("city","in","state")}After you have defined the entity association network, you should implement Geobase''s interface to the database. This requires that you define clauses for the two predicates db and ent.Predicatesdb(ent,assoc,ent,string,string)ent(ent,string)The ent PredicateThe (ent) predicate is responsible for delivering all instances of a given entity. In the first argument of ent, Geobase passes the name of an entity and expects the second to return actual string values for this entity.Here are some example clauses of ent from Geobase:ent(continent,usa).ent(city,Name) :- city(_,_,Name,_).ent(state,Name) :- state(Name,_,_,_,_,_,_,_,_,_).ent(capital,Name):- state(_,_,Name,_,_,_,_,_,_,_).ent(river,Name) :- river(Name,_,_).}The (db) predicate is a bit more complicated than ent. It is responsible for modeling the relation between the two entities (the association). You can also regard the (db) predicate as a function between one entity value and another value. All the arrows in the entity association network (modeled by the (schema) relation) should be implemented in clauses for the (db) predicate. Here are some examples from the geographical database:db(city,in,state,City,State) :-city(State,_,City,_).db(state,with,city,State,City) :-city(State,_,City,_).db(abbreviation,of,state,Ab,State) :- state(State,Ab,_,_,_,_,_,_,_,_).db(area,of,state,Area,State) :-state(State,_,_,_,Area1,_,_,_,_,_),str_real(Area,Area1).db(capitol,of,state,Capital,State) :-state(State,_,Capital,_,_,_,_,_,_,_).db(state,border,state,State1,State2):- border(State2,_,List),member(State1,List).db(length,of,river,Length,River) :-river(River,Length1,_),str_real(Length,Length1).db(state,with,river,State,River) :-river(River,_,List),member(State,List).That''s really all you need to do in order to provide a natural language interface for your existing database.Translating Natural Language QueriesMost natural languages (and English in particular) are not simple, straightforward, and consistent. Nouns can be singular or plural, verbs conjugate, synonyms exist. Translating sentences from natural language to something the program recognizes is not a simple task. In the following sections we discuss how the Geobase program deals with these translation issues.Internal Entity NamesGeobase needs to obtain an internal entity name from the words the user has used. They break down into three separate problems:1). Plural forms of entities. The user might use the word states, which is the entity name state appended by an s; or the word cities, which comes from the entity name city. The predicate (entn) is responsible for converting plural entities to their singluar forms.2). Synonyms for entities. The user might type town instead of city, or place instead of point. Synonyms for entities are stored in the database predicate {synonym}.3). Compound entity values. The entity values might consist of more than one word, like new york or salt lake city. Geobase handles this situation during parsing with the predicate db(get_cmpent).Some of the involved clauses look like these:Predicatesent_name(ent,string) /* Converts between an entity name and an internal entity name */entn(string,string) /* Converts an entity to singular form */entity(string) /* Gets all entities */ent_synonym(string,string) /* Synonyms for entities */Clausesent_name(Ent,Navn) :- entn(E,Navn),ent_synonym(E,Ent),entity(Ent).ent_synonym(E,Ent) :-synonym(E,Ent).ent_synonym(E,E).entn(E,N) :-concat(E,"s",N).entn(E,N) :-free(E), bound(N), concat(X,"ies",N), concat(X,"y",E).entn(E,E).entity("name"):-!.entity("continent"):-!.entity(X) :- schema(X,_,_).Internal Names for AssociationsIn the same way that entities can have synonyms and consist of several words, so can the associations in the queries be represented by several words. The alternative forms for the association names are stored in the b(assoc) database predicate. b(assoc) stores a list of words that can be used for the internal association name; for example:assoc("in",["in"])assoc("in",["running","through"])assoc("in",["runs","through"])assoc("in",["run","through"])assoc("with",["with"])assoc("with",["traversed"])assoc("with",["traversed","by"])The predicate (get_assoc) is responsible for recognizing an association in the beginning of a list of words. It does this by using the nondeterministic version of append to split the list up into two parts. If the first part of the list matches an alternative for an association in the (assoc) predicate, the corresponding internal association name is returned.get_assoc(IL,OL,A) :- append(ASL,OL,IL), assoc(A,ASL).The parser is responsible for recognizing the query sentence structure. There are many types of sentences, but these are classified by the parser into nine different cases. Each of these nine cases has alternatives in the domain (query). The (query) domain is defined recursively, which means it can represent nested queries.Give me cities -ENT - q_e(ENT)state with the city new york -ENT ASSOC ENT CONST - q_eaec(ENT,ASSOC,ENT,STR)rivers in (....) -ENT ASSOC SUBQUERY - q_eaq(ENT,ASSOC,ENT,QUERY)rivers longer than 1000 miles -ENT REL UNIT VAL - q_sel(ENT,RELOP,UNIT,REAL)the smallest (...) -MIN SUBQUERY - q_min(ENT,QUERY)the biggest (..) -MAX SUBQUERY - q_max(ENT,QUERY)rivers that does not traverse -ENT ASSOC NOT SUBQ - q_not(ENT,QUERY)rivers that are longer than1 thousand milesor that run through texas -SUBQUERY OR SUBQUERY - q_or(QUERY,QUERY)which state borders nevadaand borders arizona -SUBQUERY AND SUBQUERY - q_and(QUERY,QUERY)The words that users can type for minimum, maximum, units, etc., are stored in the language database section. The definition in Geobase looks like this:entitysize(entity,keyword)relop(keywords,relative_size) /* relational operator */assoc(association_between_entities,keyword)synonym(keyword,entity)ignore(keyword)min(keyword)max(keyword)size(entity,keyword)unit(keyword,keyword)Parsing by Difference ListsThe parser uses a method called "parsing by difference lists." The first two arguments of the parsing predicates are the input list and what remains of the list after part of a query is stripped off. In the last argument the parser builds up a structure for the query.The parser consists of several predicates and clauses, each of which is responsible for handling special cases in recognizing the query. If you want to understand everything about the parser, study the comments and use trace mode to follow how Geobase parses various queries.The following clause recognizes the query How large is the town new york. The filter gives the parser list"large", "town", "new", "york".s_attr([BIG,ENAME|S1],S2,E1,q_eaec(E1,A,E2,X)):- /*First s_attr clause*/ent_name(E2,ENAME), /*Entity type town is a city. Look up entity in the language scheme*/size(E2,BIG), /* look up city size is large */entitysize(E2,E1), /* look up city scale is population */schema(E1,A,E2), /* look up scheme population of city */get_ent(S1,S2,X),!./* return an entity name and query */The parser is also able to recognize the more ambiguous query How large is new york. Given this query, the first clause for s_attr fails because it expects an entity type (such as as town or state). Then the program calls the second clause for s_attr, shown here.s_attr([BIG|S1],S2,E1,q_eaec(E1,A,E2,X)):- /*Second s_attr clause*/get_ent(S1,S2,X), size(E2,BIG),entitysize(E2,E1), schema(E1,A,E2),ent(E2,X),!.Using this clause, the parser decides that new york refers to the city and that large refers to the number of citizens.Once the parser returns a query, Geobase calls the (eval) clause that actually determines the query. The actual calls into the database are made with the (db) and (ent) predicates.

~


Fundamental Prolog Part 1 и Part 2 -
основы программирования на Прологе.

Lists and Recursion - работа со списками в Прологе.

LAB1 и LAB2 - описания лаб, взятые с сайта.senan.rar и geobase.rar - исходники лабораторных работ, подготовленные для Visual Prolog 5.2.

Для использования, нужно просто разархивировать и открытьфайл *.prj в Visual Prolog 5.2.

В первой лабе сделаны комментарии на русскомязыке, выполненные неизвестным героем.

Если они не отображаются, нужно изменитьв Options->Global->Environment->Font->Change Font набор символов накириллический.

С уважением, Андрей Шалин

~


Файл Шалина, где все это есть

E-mail: rykov2000@mail.ru



Сайт управляется системой uCoz