+ All documents
Home > Documents > Learning structural shape descriptions from examples

Learning structural shape descriptions from examples

Date post: 01-Dec-2023
Category:
Upload: independent
View: 0 times
Download: 0 times
Share this document with a friend
11
Learning structural shape descriptions from examples L.P. Cordella * , P. Foggia, C. Sansone, M. Vento Dipartimento di Informatica e Sistemistica, Universit a di Napoli Federico II, Via Claudio, 21 I-80125 Napoli, Italy Abstract A method for learning shapes structurally described by means of attributed relational graphs (ARG’s) is discussed and tested. The method is based on an algorithm that, starting from a set of labeled shapes, finds out the set of maximally general prototypes. These prototypes, given in terms of a suitably defined data structure which generalizes the ARG’s, satisfy the properties of completeness and consistency with reference to the training set, and result to be particularly effective for their interpretability. After resuming the algorithm, the paper addresses the problem of shape representation by ARG’s, and then presents the experimental results of a learning task, with reference to a database of artificial images generated by a set of at- tributed plex grammars. The main focus here is not on the learning algorithm, but on its applicability to the problem of learning shapes from examples. A discussion of the results, aimed to highlight pros and cons, is finally reported. Ó 2002 Elsevier Science B.V. All rights reserved. Keywords: Attributed relational graphs; Structural descriptions; Shape prototyping; Inductive learning 1. Introduction Structural methods (Pavlidis, 1977; Frasconi et al., 1998) imply complex procedures both for recognition and learning. The learning problem, i.e. the task of building a set of prototypes ade- quately describing the members of each different class, is complicated by the fact that the proto- types, implicitly or explicitly, should include a model of the possible differences among members of the same class. In fact, in real applications, the information is affected by distortions, and conse- quently the descriptions of single members of a class may result quite different from each other. The difficulty of defining effective learning algo- rithms is so high that the problem is considered still open, and only few methods, usable under rather peculiar hypotheses, are available by now. A first approach to the problem assumes that structured information can be encoded in terms of a vector, thus making possible the adoption of statistical/neural paradigms. In this way, it is possible to use the large variety of well-established and effective algorithms available both for learning and for classifying patterns. The main disadvan- tages deriving from the use of these techniques are the possible loss of complex structural informa- tion, due to the adopted data structure, and the Pattern Recognition Letters 23 (2002) 1427–1437 www.elsevier.com/locate/patrec * Corresponding author. E-mail addresses: [email protected] (L.P. Cordella), foggi- [email protected] (P. Foggia), [email protected] (C. Sansone), [email protected] (M. Vento). 0167-8655/02/$ - see front matter Ó 2002 Elsevier Science B.V. All rights reserved. PII:S0167-8655(02)00103-4
Transcript

Learning structural shape descriptions from examples

L.P. Cordella *, P. Foggia, C. Sansone, M. Vento

Dipartimento di Informatica e Sistemistica, Universit�aa di Napoli Federico II, Via Claudio, 21 I-80125 Napoli, Italy

Abstract

A method for learning shapes structurally described by means of attributed relational graphs (ARG’s) is discussed

and tested. The method is based on an algorithm that, starting from a set of labeled shapes, finds out the set of

maximally general prototypes. These prototypes, given in terms of a suitably defined data structure which generalizes

the ARG’s, satisfy the properties of completeness and consistency with reference to the training set, and result to be

particularly effective for their interpretability.

After resuming the algorithm, the paper addresses the problem of shape representation by ARG’s, and then presents

the experimental results of a learning task, with reference to a database of artificial images generated by a set of at-

tributed plex grammars.

The main focus here is not on the learning algorithm, but on its applicability to the problem of learning shapes from

examples. A discussion of the results, aimed to highlight pros and cons, is finally reported. � 2002 Elsevier Science

B.V. All rights reserved.

Keywords: Attributed relational graphs; Structural descriptions; Shape prototyping; Inductive learning

1. Introduction

Structural methods (Pavlidis, 1977; Frasconiet al., 1998) imply complex procedures both forrecognition and learning. The learning problem,i.e. the task of building a set of prototypes ade-quately describing the members of each differentclass, is complicated by the fact that the proto-types, implicitly or explicitly, should include amodel of the possible differences among membersof the same class. In fact, in real applications, the

information is affected by distortions, and conse-quently the descriptions of single members ofa class may result quite different from each other.The difficulty of defining effective learning algo-rithms is so high that the problem is consideredstill open, and only few methods, usable underrather peculiar hypotheses, are available by now.

A first approach to the problem assumes thatstructured information can be encoded in terms ofa vector, thus making possible the adoption ofstatistical/neural paradigms. In this way, it ispossible to use the large variety of well-establishedand effective algorithms available both for learningand for classifying patterns. The main disadvan-tages deriving from the use of these techniques arethe possible loss of complex structural informa-tion, due to the adopted data structure, and the

Pattern Recognition Letters 23 (2002) 1427–1437

www.elsevier.com/locate/patrec

* Corresponding author.

E-mail addresses: [email protected] (L.P. Cordella), foggi-

[email protected] (P. Foggia), [email protected] (C. Sansone),

[email protected] (M. Vento).

0167-8655/02/$ - see front matter � 2002 Elsevier Science B.V. All rights reserved.

PII: S0167-8655 (02 )00103-4

impossibility of accessing the knowledge acquiredby the system. In fact, after learning, the knowl-edge is implicitly encoded (e.g. in the weights as-sociated to the connections of a neural net) and itsuse, outside the classification stage, is stronglylimited. Examples of this approach are illustratedin (Frasconi et al., 1998; Hinton, 1990; Touretzky,1990).

Attributed relational graphs (ARG’s), i.e.graphs enriched with a set of attributes associatedto nodes and edges, are the most effective datastructure for representing structural information.In case of shapes, the attributes of nodes and edgesrespectively represent properties of the primitiveshape components and of the relations amongthem.

An approach, pioneered by (Winston, 1970),groups methods facing the learning problem di-rectly in the representation space of the structureddata (Lavrac and Dzerosky, 1994; Pearce et al.,1994; Dietterich and Michalski, 1983; Alqu�eezarand Sanfeliu, 1997). So, if data are represented bygraphs, the learning procedure generates graphsfor representing the prototypes of the classes. Ac-cording to this approach, in (Messmer and Bunke,1996) a learning algorithm for symbols representedby graphs is defined. The method is based on aclustering procedure using a graph-edit distance.Some methods ascribable to the same approachare based on the assumption that the prototypicaldescriptions are built by interacting with an expertof the domain (Rocha and Pavlidis, 1994; Nishida,1996). The human inadequacy to formalize thecriteria that bring to find a set of prototypes reallyrepresentative of a given class significantly in-creases the risk of errors, especially in domainscontaining either many data or many differentclasses.

More automatic methods are those facing theproblem as a symbolic machine learning problem(Michalski, 1980), formulated as follows: ‘‘given asuitably chosen set of input data, whose classes areknown, and possibly some background domainknowledge, find out a set of optimal prototypicaldescriptions for each class’’. A formal enunciationof the problem and a more detailed discussion torelated issues will be given in the next Section.Dietterich and Michalski (1983) provide an ex-

tensive review of this field, populated by methodswhich mainly differ in the adopted formalism(Michalski, 1980; Quinlan, 1993), sometimes moregeneral than that implied by the graphs.

This approach is really effective, since theobtained prototype descriptions, besides beingexplicit, hold the property of being maximallygeneral. This makes them very compact, i.e. con-taining only the minimum information for cov-ering all the samples of a same class and forpreserving the distinction between objects of dif-ferent classes. Due to these properties, the user caneasily acquire knowledge about the domain bylooking at the prototypes generated by the system,which appear simple, understandable and effective.Consequently he can validate or improve the pro-totypes or understand what has gone wrong incase of classification errors.

In this paper we discuss a method for learningshapes structurally described by ARG’s. Themethod is based on an algorithm which, startingfrom a set of labeled shapes, finds out the set ofmaximally general prototypes. The focus of thepresent paper is not only on the learning algorithm(for details see also Foggia et al., 2001), but mainlyon its effectiveness for learning shapes from ex-amples.

The learning algorithm faces the problem di-rectly in the space of the graphs and obtainsgeneral and consistent prototypes with a low com-putational cost with respect to other symboliclearning systems, based on first order descriptions.The obtained shape prototypes, given in terms of asuitably defined data structure which generalizesthe ARG’s, satisfy the properties of completenessand consistency with reference to the training set,and result to be particularly effective for their in-terpretability.

After a discussion on the way structural de-scriptions and prototypes are represented in termsof graphs (Section 2), the learning algorithm isresumed in Section 3.

Section 4 reports the experimental results ofa shape learning task with reference to a databaseof artificial images generated by a set of attrib-uted plex grammars, together with a discussion ofthe obtained results aimed to highlight pros andcons.

1428 L.P. Cordella et al. / Pattern Recognition Letters 23 (2002) 1427–1437

2. Representational issues: structures and prototypes

Our approach to prototyping is inspired tobasic machine learning methodologies and partic-ularizes the inference operations to the case ofdescriptions given in terms of ARG’s. We will firstintroduce a new kind of graph, devoted to repre-sent the prototypes of a set of ARG’s. Such kindof graph will be called generalized attributed re-lational graph (GARG) as it contains generalizednodes, edges, and attributes. Then, we will for-mulate a learning algorithm that builds the pro-totypes by means of a set of operations directlydefined on the graphs. The algorithm preserves thegenerality of the prototypes generated by classicalmachine learning algorithms. As with most ma-chine learning systems (Dietterich and Michalski,1983; Michalski, 1980), the prototypes obtained byour system are consistent, i.e. each sample is cov-ered by only one prototype.

We assume that the shapes of the objects ofinterest are described in terms of ARG’s. An ARGcan be defined as a six-tuple ðN ;E;AN;AE; aN; aEÞ,

where N and E � N � N are respectively the sets ofnodes and edges, AN and AE are the sets of nodeand edge attributes, while aN and aE are thefunctions which associate to each node or edge thecorresponding attributes.

We will suppose that the attributes of bothnodes and edges are expressed in the form tðp1; . . . ;pktÞ, where t is a type chosen over a finite alphabetT of possible types and ðp1; . . . ; pktÞ is a tuple ofparameters, also from finite sets P t

1; . . . ; Ptkt. Both

the number of parameters (kt, the arity associatedto type t) and the sets they belongs to depend onthe type of the attribute; for some type, kt may beequal to zero, meaning that the corresponding at-tribute has no parameters. It is worth noting thatthe introduction of the type permits to differentiatethe description of different kinds of nodes (oredges); in this way, each parameter associated to anode (or an edge) assumes a meaning dependingon the type of the node itself. For example, wecould use the nodes to represent different parts ofan object, by associating a node type to each kindof part (see Fig. 1).

Fig. 1. (a) Objects made of three types of component parts: circles, triangles and rectangles. (b) The description scheme introducing

three types of nodes, each associated to a part. Each type contains a set of parameters suitable to describe each part. Similarly edges of

the graph, describing topological relations between parts, may be of two different types (on_top and left). (c) The ARG’s corresponding

to the objects given in (a). (d) A GARG representing the two ARG’s shown in (c). The GARG represents ‘‘any object made of a part

on the top of a rectangle of any width and height’’.

L.P. Cordella et al. / Pattern Recognition Letters 23 (2002) 1427–1437 1429

In order to allow a prototype to match a set ofARG’s we extend the definition of attribute. Firstof all, the set of attribute types of nodes and edgesis augmented with the special type /, without pa-rameters and allowed to match any attribute type.For the other attribute types, if the sample has aparameter whose value is within the set P t

i , thecorresponding parameter of the prototype belongsto the set P �t

i ¼ }ðP ti Þ, where }ðP t

i Þ is the power setof P t

i , i.e. the set of all the subsets of P ti . Referring

to the shapes shown in Fig. 1, a node of the pro-totype could have the attribute rectangle (s or m,m), meaning a rectangle whose width is small ormedium and whose height is medium.

We say that a GARG G� ¼ ðN �;E�;A�N;A

�E;

a�N; a

�EÞ covers a sample G and use the notation

G� j¼ G (the symbol j¼ denotes the covering re-lation) iff there is a mapping l : N � ! N such that:

l is a monomorphism; ð1Þ

the attributes of nodes and edges of G� are

compatible with the corresponding ones of G:

ð2ÞThe compatibility relation, denoted with the sym-bol , is defined as in the following:

8t;/ tðp1; . . . ; pkt Þ and

8t; tðp�1; . . . ; p�ktÞ tðp1; . . . ; pktÞ ()p1 2 p�1 ^ . . . ^ pkt 2 p�kt ð3Þ

Condition (1) requires that each primitive andeach relation in the prototype must be present alsoin the sample. This allows the prototype to specifyonly the features that are strictly required fordiscriminating among the various classes, neglect-ing the irrelevant ones. Condition (2) constrainsthe monomorphism required by condition (1) to beconsistent with the attributes of the prototype andof the sample, by means of the compatibility re-lation defined in (3). This relation simply statesthat the type of the prototype attribute must beeither equal to / or to the type of the corre-sponding attribute of the sample. In the latter caseall the parameters of the attribute, which are ac-tually sets of values, must contain the value of thecorresponding parameter of the sample.

The covering relation between a prototype anda sample graph could be tested using any graphmatching algorithm, suitably modified to take intoaccount the attributes of the two graphs. In par-ticular, we have used a modified version of thealgorithm described in Cordella et al. (2001).

Another important relation that will be intro-duced is specialization (denoted by the symbol /): aprototype G�

1 is said to be a specialization of G�2 iff

8G;G�1 j ¼ G ) G�

2j ¼ G: ð4ÞIn other words, a prototype G�

1 is a specializationof G�

2 if every sample covered by G�1 is also covered

by G�2. Hence, a more specialized prototype im-

poses stricter requirements on the samples to becovered. In Fig. 1d a prototype covering theshapes of Fig. 1a is shown.

3. The learning algorithm

The goal of the learning algorithm can be statedas follows: there is a (possibly infinite) set S� of allthe shapes that may occur, partitioned into Cdifferent classes S�

1 ; . . . ; S�C, with S�

i \ S�j ¼ ; for

i 6¼ j. The algorithm is given a finite subset S � S�

(training set) of labeled patterns (S ¼ S1 [ . . . [ SC

with Si ¼ S \ S�i ), from which it tries to find a se-

quence of prototype graphs G�1;G

�2; . . . ;G

�p, each

labeled with a class identifier, such that the fol-lowing properties hold:

8G 2 S� 9i : G�i � G ðcompletenessÞ ð5Þ

8G 2 S� G�i � G ) classðGÞ

¼ classðG�i Þ ðconsistencyÞ ð6Þ

Class (G) and class (G�) refer to the class repre-sented by the prototypes G and G�.

Of course, this is an ideal goal because of thefiniteness of S�. In practice, the algorithm can onlyverify that completeness and consistency hold forthe samples in S. On the other hand, Eq. (5) dic-tates that, in order to get as close as possible to theideal case, the prototypes generated should be ableto model also samples not found in S, that is theymust be more general than the enumeration of thesamples in the training set. However, they shouldnot be too general, otherwise Eq. (6) would not be

1430 L.P. Cordella et al. / Pattern Recognition Letters 23 (2002) 1427–1437

satisfied. The achievement of the optimal tradeoffbetween completeness and consistency makesprototyping a really hard problem.

A sketch of the learning algorithm is presentedin Fig. 2; the algorithm starts with an empty list Lof prototypes, and tries to cover the training set bysuccessively adding consistent prototypes. When anew prototype is found, the samples covered by itare eliminated and the process continues on theremaining samples of the training set. An effect ofthis elimination is that each generated prototypecontains only the features needed to discriminateamong samples not yet filtered out by the previousprototypes. Hence, at classification time a samplehas to be matched against the prototypes in thesame order in which they have been generated,stopping as soon as a matching prototype is found.This strategy entails the advantage that the algo-rithm is able to deal automatically with classeswhose samples are subgraphs of the samples be-longing to other classes. In this case, the algorithmwill choose the order in which the prototypes ofthe subgraphs are considered only when the pro-totypes of the larger graphs have already beenruled out.

The algorithm fails if no consistent prototypecovering the remaining samples can be found. It isworth noting that the test of consistency in the

algorithm actually checks if the prototype is al-most consistent, i.e.:

ConsistentðG�Þ () maxi

SiðG�Þj jSðG�Þj j P h ð7Þ

where SðG�Þ denotes the sets of all the samples ofthe training set covered by a prototype G�, SiðG�Þthe samples of the class i covered by G� and h is asuitably chosen threshold, close to 1. Relation (7)implies that almost all the samples covered by G�

belong to the same class. Also note that the asso-ciation of a prototype to a class is performed afterbuilding the prototype.

According to (7) the algorithm would considerconsistent a prototype if more than a fraction h ofthe covered training samples belong to a sameclass, avoiding a further specialization of thisprototype that could be detrimental for its gener-ality.

The most important part of the algorithm is theFindPrototype procedure, illustrated in Fig. 3. Theconstruction of a prototype starts from a trivialprototype with one node whose attribute is / (i.e.a prototype that covers any non-empty graph).The prototype is then refined by successive special-izations until either it becomes consistent or itcovers no samples at all. An important step of theFindPrototype procedure is the construction of a set

Fig. 2. A sketch of the learning procedure.

Fig. 3. The function FindPrototype.

L.P. Cordella et al. / Pattern Recognition Letters 23 (2002) 1427–1437 1431

Q of specializations of the tentative prototype G�.The adopted definition of the heuristic function H,guiding the search of the current optimal proto-type, will be examined later.

At each step, the algorithm tries to refine thecurrent prototype definition, in order to make itmore consistent, by replacing the tentative proto-type with one of its specializations. To accomplishthis task we have defined a set of specializationoperators which, given a prototype graph G�,produce a new prototype G� such that G�/G�. Theconsidered specialization operators are:

1. Node addition: G� is augmented with a newnode n whose attribute is /.

2. Edge addition: a new edge ðn�1; n�2Þ is added tothe edges of G�, where n�1 and n�2 are nodes ofG� and G� does not contain already an edge be-tween them. The edge attribute is /.

3. Attribute specialization: the attribute of a nodeor an edge is specialized according to the fol-lowing rule:• If the attribute is /, then a type t is chosen

and the attribute is replaced withtðP t

1; . . . ; PtktÞ. This means that only the type

is fixed, while the type parameters can matchany value of the corresponding type.

• Else, the attribute takes the form tðp�1; . . . ; p�ktÞ,where each p�i is a (non-necessarily proper)subset of P t

i . One of the p�i such that p�i�� �� > 1

is replaced by p�i � fpig, with pi 2 p�i . In otherwords, one of the possible values of a para-meter is excluded from the prototype.

The heuristic function H is introduced forevaluating how promising the provisional proto-type is. It is based on the estimation of the con-sistency and completeness of the prototype (seeEqs. (5) and (6)). To evaluate the consistency de-gree of a provisional prototype G�, an entropybased measure is used:

HconsðS;G�Þ ¼ �Xi

Sij jSj j log2

Sij jSj j

� �Xi

SiðG�Þj jSðG�Þj j log2

SiðG�Þj jSðG�Þj j

!

ð8Þ

H is defined so that the larger is the value ofHconsðS;G�Þ, the more consistent is G�: hence theuse of Hcons will drive the algorithm towards con-sistent prototypes.

The completeness of a provisional prototype istaken into account by a second term of the heu-ristic function, which simply counts the number ofsamples covered by the prototype:

HcomplðS;G�Þ ¼ SðG�Þj j ð9Þ

This term is introduced in order to privilege gen-eral prototypes with respect to prototypes which,albeit consistent, cover only a small number ofsamples.

The heuristic function used in our algorithm isthe one described in (Quinlan, 1993):

HðS;G�Þ ¼ HcomplðS;G�ÞHconsðS;G�Þ ð10Þ

4. Experiments and discussion

The method has been tested on a database ofartificial images generated by a set of attributedplex grammars (Felder, 1971). The technique usedfor generating the database has been also em-ployed in (de Mauro et al., 2001) in the context ofa similarity learning problem by means of a re-cursive neural network.

The images are obtained by combining a set ofpredefined blocks (the terminal symbols of thegrammar) which can be connected together atgiven attachment points. The productions of thegrammar define how the blocks are joined to-gether. The result of the composition is a newobject (a non-terminal symbol), for which thecorresponding production also defines the attach-ment points; this object, in turn, can be connectedto other terminal or non-terminal objects by otherproductions. Further, attributed plex grammarsassociate a quantitative information representedby a vector of attribute values to each terminalsymbol (i.e. length, color, texture parameters,shape parameters, orientation, etc.).

A more detailed description of the image syn-thesis methodology can be found in (Bunke et al.,2001), while the image generation tool togetherwith the documentation is available for download

1432 L.P. Cordella et al. / Pattern Recognition Letters 23 (2002) 1427–1437

from the URL (root portal): http://www.artificial-neural.net.

In the experiments, we used grammars to gen-erate three different classes of images: houses, shipsand policemen. By randomly choosing the pro-ductions in the grammar, we generated a databasecomposed of 1800 different images (600 images perclass).

The whole database was split into a training set,made up of 600 images (200 per class) and a test setmade up of the remaining 1200 images (400 perclass). Fig. 4 shows some images of the database.

Fig. 5 illustrates the adopted description schemein terms of ARG’s. For each image, a simple seg-mentation algorithm is used to find out regions

with the same color. Each region is represented inthe resulting graph by a node. Node attributesencode the diameter of the region, normalized withrespect to the image size, and its ellipticity, mea-sured as the ratio between the minor and the majoraxis of inertia (the value is 1 for circular regionsand tends to 0 for straight lines). Both the attri-butes are quantized in five levels. The edges of thegraph are used to represent the adjacency of tworegions. Two edge attributes encode the relative xand y positions of the centers of the two regions.Also these attributes are quantized in five levels.The resulting graphs have a number of nodes thatranges from 3 to 13 (8 nodes on the average), andthe maximum out-degree is six.

Fig. 4. Some samples of the training set (a) and of the test set (b).

L.P. Cordella et al. / Pattern Recognition Letters 23 (2002) 1427–1437 1433

Fig. 6 shows the representation obtained for asample of the class policeman. Tests have beenmade with different compositions of the trainingset; in particular five subsets have been randomlyextracted by considering a number of samples perclass ranging from 10 to 200. In every test thegenerated prototypes were consistent, giving riseto a 100% recognition rate on the training set.

With reference to the above described tests, thelearning time for the biggest training set (200samples per class) was about 15 s, on a 500 MHzPentium III PC equipped with 128 MB of RAM.Even though the training set is not very large (600samples), this time should be considered quiteshort if compared to the times required by fullfirst-order learning algorithms.

Table 1 shows the number of the generatedprototypes and the recognition rate on the test set,for the different training sets. It can be noted that,even with a training set made of 20 samples, theperformance of the system is quite satisfactory; by

Fig. 5. The alphabet of the types defined for the considered images.

Table 1

Recognition rate on the test set and number of generated pro-

totypes as a function of the training set size

No. of training

samples per class

Recognition rate

on test set

No. of generated

prototypes

10 85.83 6

20 93.50 4

50 99.50 5

100 99.50 5

200 99.75 6

Fig. 6. (a) The image of a policeman and (b) its representation as an ARG.

1434 L.P. Cordella et al. / Pattern Recognition Letters 23 (2002) 1427–1437

increasing the number of samples to 50 per class,the recognition rate already reaches about 99%.Further enlargements of the training set (i.e. pass-ing from 100 to 200 samples per class) producenegligible improvements in the recognition rate.Moreover, in all the tests, the number of the ob-tained prototypes changes only slightly.

Fig. 7 shows the prototypes generated using 200samples per class. With respect to the case of 50 or100 training set samples there are two differences:(a) an additional prototype (H2) is generated and(b) the simplest prototype, previously assigned tothe house class, is now attributed (as S2) to theship class. As the ordering of the prototypes affects

Fig. 7. The GARG’s obtained as prototypes of the class of houses (H1 and H2), ships (S1 and S2) and policemen (P1 and P2). For

each prototype, the number of covered samples in the training set is also shown. The sequence number indicates the order in which the

prototypes have been generated, which is also the order used during the classification.

L.P. Cordella et al. / Pattern Recognition Letters 23 (2002) 1427–1437 1435

the classification results (see Section 3), thesechanges make the system able to correctly classifythree more ship samples of the test set.

The figure also presents an informal descriptionof the obtained prototypes. It is worth noting thatthe prototypes are very easily interpretable interms of regions in the corresponding images, andof adjacency relations between these regions. It is arelatively simple and direct task to understandwhich parts of the image are considered distinctiveof each class.

This is one of the main advantages derivingfrom the adoption of a first-order symbolic learn-ing method. For comparison, consider the resultthat would have been obtained with a neural orstatistical learning system: a possibly complexclassification function depending on a possiblyhigh number of numerical parameters, none ofwhich could have been easily related to the con-tents of the image. We will discuss later how theseadvantages can significantly help to understandthe causes of recognition errors. Let us first illus-trate the classification results on the test set.

Table 2 reports the classification table on thetest set obtained with a training set made up of 200samples per class. As it is evident, three samples ofships are confused with a policeman.

Fig. 8 shows the images and the ARG’s of thesemisclassified samples. It can be noted that theseARG’s do not match with the prototype S1 of theship class as they refer to ships without portholesand having only two masts; this structural prop-erty leads to graphs having no node with threeoutgoing edges. Moreover, since all the three con-sidered ARG’s have at least five nodes, theymatch with the prototype P2 of the policemanclass. In other words, these errors are due torepresentativeness problems of the training set,

Table 2

Classification matrix on the test set, obtained with the training

set made up of 200 samples per class

House Ship Policeman

House 100 0 0

Ship 0 99.25 0.75

Policeman 0 0 100

Fig. 8. Errors made on the test set: (a) the three ships confused with a policeman, (b) their ARG’s and (c) the prototype they are

attributed to.

1436 L.P. Cordella et al. / Pattern Recognition Letters 23 (2002) 1427–1437

which does not contain any sample of ships with-out portholes and with only two masts with sails.

In fact, the only ship sample of the training setwithout portholes and with two mast has also nosails, giving rise to an ARG made up of only threenodes. This ARG does not match with the P2prototype and then the consistency of P2 is pre-served. Therefore, the system has no need of gen-erating an additional prototype while uses thesimplest prototype (S2––see last row of Fig. 7) forrepresenting such a ship sample.

We have been able to reach this conclusion,finding the weak points in the representativeness ofthe chosen training set, with a relatively little in-vestigation effort. This is possible thanks to thehigh understandability of the prototypes and tothe ease of understanding how the misclassifiedsamples are matched against the prototypes. Thislatter is a direct consequence of the adoption of astructural recognition algorithm (based on graphmatching, in this case), which in turn has beenmade effective by the availability of a structurallearning method.

5. Conclusions

In this paper we have discussed the applicabilityof a recently proposed structural learning methodto the problem of prototyping shape descriptions.These are given in terms of ARG.

The experimental analysis highlighted the ad-vantages of a symbolic approach to the problem ofshape learning: the understandability of the ob-tained prototypes allows the user to interpret theresult of the learning phase and to easily find a wayto overcome errors due to representativeness faultsof the training set.

The use of a training set made of artificiallygenerated images allowed us to point out howgeneral and easy to understand the prototypes are.

References

Pavlidis, T., 1977. Structural Pattern Recognition. Springer,

New York.

Frasconi, P., Gori, M., Sperduti, A., 1998. A general frame-

work for adaptive processing of data structures. IEEE

Transactions on Neural Network. 9 (5), 768–785.

Hinton, G.E., 1990. Mapping part-whole hierarchies into

connectionist networks. Artificial Intelligence 46, 47–75.

Touretzky, D.S., 1990. Dynamic symbol structures in a

connectionist network. Artificial Intelligence 42 (3), 5–46.

Winston, P.H., 1970. Learning Structural Descriptions from

Examples, Tech. Report MAC-TR-76. Department of

Electrical Engineering and Computer Science, MIT.

Lavrac, N., Dzerosky, S., 1994. Inductive Logic Programming:

Techniques and Applications. Ellis Horwood.

Pearce, A., Caelly, T., Bischof, W.F., 1994. Rulegraphs for

graph matching in pattern recognition. Pattern Recognition

27 (9), 1231–1247.

Dietterich, T.G., Michalski, R.S., 1983. A comparative review

of selected methods for learning from examples. In: Michal-

ski, R.S. et al. (Eds.), Machine Learning: An Artificial

Intelligence Approach, Vol. 1. Morgan Kaufmann, pp. 41–

82.

Alqu�eezar, R., Sanfeliu, A., 1997. Recognition and learning of a

class of context-sensitive languages described by augmented

regular expressions. Pattern Recognition 30 (1), 163–182.

Messmer, B., Bunke, H., 1996. Automatic learning and

recognition of graphical symbols in engineering drawing.

In: Kasturi, R., Tombre, K. (Eds.), Graphics Recognition,

Lecture Notes in Computer Science, Vol. 1072. Springer,

Berlin, pp. 123–134.

Rocha, J., Pavlidis, T., 1994. A shape analysis model with

applications to a character recognition system. IEEE

Transactions on PAMI 16 (4), 393–404.

Nishida, H., 1996. Shape recognition by integrating structural

descriptions and geometrical/statistical transforms. Com-

puter Vision and Image Understanding 64, 248–262.

Michalski, R.S., 1980. Pattern recognition as rule-guided

inductive inference. IEEE Transactions on Pattern Analysis

and Machine Intelligence 2 (4), 349–361.

Quinlan, J.R., 1993. Learning logical definitions from relations.

Machine Learning 5 (3), 239–266.

Foggia, P., Genna, R., Vento, M., 2001. Symbolic vs. connec-

tionist learning: an experimental comparison in a structured

domain. IEEE Transactions on Knowledge and Data

Engineering 13 (2), 176–195.

Cordella, L.P., Foggia, P., Sansone, C., Vento, M., 2001. An

improved algorithm for matching large graphs. In: Pro-

ceedings of the 3rd IAPR-TC15 Workshop on Graph-based

Representations in Pattern Recognition, Ischia, Italy, pp.

149–159.

Felder, T., 1971. Plex languages. Information Sciences 3, 225–

241.

de Mauro, C., Diligenti, M., Gori, M., Maggini, M., 2001.

Similarity learning for graph-based image representations.

In: Proceedings of the 3rd IAPR-TC15 Workshop on Graph-

Based Representations in Pattern Recognition, Ischia, Italy,

pp. 250–259.

Bunke, H., Irniger, C., Gori, M., Hagenbuchner, M., Tsoi,

A.C., 2001. Generation of image databases using attributed

plex grammars. In: Proceedings of the 3rd IAPR-TC15

Workshop on Graph-Based Representations in Pattern

Recognition, Ischia, Italy, pp. 200–209.

L.P. Cordella et al. / Pattern Recognition Letters 23 (2002) 1427–1437 1437


Recommended