+ All documents
Home > Documents > Location similarity of regions

Location similarity of regions

Date post: 02-Dec-2023
Category:
Upload: independent
View: 0 times
Download: 0 times
Share this document with a friend
12
Ž . ISPRS Journal of Photogrammetry & Remote Sensing 55 2000 189–200 www.elsevier.nlrlocaterisprsjprs Location similarity of regions q Stephan Winter Department of Geoinformation, Technical UniÕersity Vienna, Gusshausstr. 27-29, A-1040 Vienna, Austria Received 25 February 1999; accepted 4 March 2000 Abstract This article gives a systematic investigation of location-based similarity between regular regions. Starting from reasonable conditions for such measures, it is shown that there is only a finite number of location properties to be compared. The complete set of combinations is presented, and their behaviour and interpretability are discussed. Similarity measures are needed for all kinds of matching problems, including merging spatial data sets, change detection, and generalization. However, the measures are empirical measures. Therefore, measures found in literature seem to be chosen at random. With this synopsis, I show the differences of behaviour for all available choices. q 2000 Elsevier Science B.V. All rights reserved. Keywords: similarity; topological relations; quality assessment; distance 1. Introduction 1.1. MotiÕation Similarity is a concept widely used, referring to Ž . space and geographic information systems GIS ; it provides the basis for handling positional uncertainty and imprecision, for matching spatial entities, for merging spatial data sets, for change detection, or for generalization. Since similarity is the basis, there needs to be a measure to make it quantifiable. Addi- tionally, similarity is the central notion for any ab- straction and has been discussed in the categorization q This article is a significantly revised and extended version of: Location-based similarity measures of regions. In: Fritsch, D., Ž . Englich, M., Sester M. Eds. : GIS Between Visions and Applica- tions. International Archives of Photogrammetry and Remote Ž . Sensing, Vol. 32r4 1998 , pp. 669–676. Ž . E-mail address: [email protected] S. Winter . controversy as an undecidable problem for 2000 Ž . years Flasch, 1986 . The fundamental question of similarity is to find a common reference frame for measuring: there are so many aspects of physical, linguistic or semantic similarity, that a statement ‘A is similar to B’ contains no information as long as the referred aspects are not specified. For the present investigation, the reference frame is location and location-based similarity. Spatial entities in databases, in this article as- sumed as regions, are models of real world objects. The comparison of the location of two regions from different data sets is based on the hypothesis that both are modeling the same object. The grade of similarity allows an assessment of that hypothesis. It is most likely that two models are never identical, because of different concepts applied to the real world, of context-dependent levels of detail, of changes in a dynamic world, and of errors in data 0924-2716r00r$ - see front matter q 2000 Elsevier Science B.V. All rights reserved. Ž . PII: S0924-2716 00 00019-8
Transcript

Ž .ISPRS Journal of Photogrammetry & Remote Sensing 55 2000 189–200www.elsevier.nlrlocaterisprsjprs

Location similarity of regionsq

Stephan WinterDepartment of Geoinformation, Technical UniÕersity Vienna, Gusshausstr. 27-29, A-1040 Vienna, Austria

Received 25 February 1999; accepted 4 March 2000

Abstract

This article gives a systematic investigation of location-based similarity between regular regions. Starting fromreasonable conditions for such measures, it is shown that there is only a finite number of location properties to be compared.The complete set of combinations is presented, and their behaviour and interpretability are discussed.

Similarity measures are needed for all kinds of matching problems, including merging spatial data sets, change detection,and generalization. However, the measures are empirical measures. Therefore, measures found in literature seem to bechosen at random. With this synopsis, I show the differences of behaviour for all available choices. q 2000 Elsevier ScienceB.V. All rights reserved.

Keywords: similarity; topological relations; quality assessment; distance

1. Introduction

1.1. MotiÕation

Similarity is a concept widely used, referring toŽ .space and geographic information systems GIS ; it

provides the basis for handling positional uncertaintyand imprecision, for matching spatial entities, formerging spatial data sets, for change detection, or forgeneralization. Since similarity is the basis, thereneeds to be a measure to make it quantifiable. Addi-tionally, similarity is the central notion for any ab-straction and has been discussed in the categorization

q This article is a significantly revised and extended version of:Location-based similarity measures of regions. In: Fritsch, D.,

Ž .Englich, M., Sester M. Eds. : GIS Between Visions and Applica-tions. International Archives of Photogrammetry and Remote

Ž .Sensing, Vol. 32r4 1998 , pp. 669–676.Ž .E-mail address: [email protected] S. Winter .

controversy as an undecidable problem for 2000Ž .years Flasch, 1986 . The fundamental question of

similarity is to find a common reference frame formeasuring: there are so many aspects of physical,linguistic or semantic similarity, that a statement ‘Ais similar to B’ contains no information as long asthe referred aspects are not specified. For the presentinvestigation, the reference frame is location andlocation-based similarity.

Spatial entities in databases, in this article as-sumed as regions, are models of real world objects.The comparison of the location of two regions fromdifferent data sets is based on the hypothesis thatboth are modeling the same object. The grade ofsimilarity allows an assessment of that hypothesis. Itis most likely that two models are never identical,because of different concepts applied to the realworld, of context-dependent levels of detail, ofchanges in a dynamic world, and of errors in data

0924-2716r00r$ - see front matter q 2000 Elsevier Science B.V. All rights reserved.Ž .PII: S0924-2716 00 00019-8

( )S. Winterr ISPRS Journal of Photogrammetry & Remote Sensing 55 2000 189–200190

capture. Introducing similarity at this point allows tocharacterize the grade of coincidence by commonlocation as well as the grade of distance by distinct

Ž .location Tversky, 1977 . As both aspects of similar-ity are not redundant, it is necessary to characterizethem in a more precise way. Thus, we talk of

Ž . Žsimilarity common location and dissimilarity dis-.tinct location . Obviously, the concept of similarity

is broader than a pure distance measure. Further-more, specific similarity measures could yield selec-tive information about different aspects contributing

Ž .to similarity, like grade of equality or overlap.Similarity measures are empirical measures.

Therefore, measures found in literature often seem tobe chosen at random. The choice is based on proper-ties of the specific measure without comparison toother possible location-based similarity measures. Iwill show a synopsis of all choices with significantdifference of behaviour, and I will characterize thedifferent measures for their specific behaviour.

1.2. Focus

This paper presents a systematic investigation ofsimilarity measures between two discrete regions

Ž .from different data sets Fig. 1 . The only aspectconsidered is the location of regions, which is afunction of coordinates in a given geometry. I ex-clude all thematic attributes of regions as well asrelations between objects, both relating to their ownproblems and literature. Finally, I will not treat thematching problem of two regions from different datasources.

It will be shown that the number of locationproperties to be compared is finite. A complete listof possible combinations will be presented and dis-cussed. Other similarity measures can only be given

Žwith higher orders of normalization e.g., L -normp

Fig. 1. Given two regions, A and B, from two independent datasets: to what extent are they similar?

.with p)1 . It can be expected that such measurescannot generate new information because the combi-natorial complexity of possible properties is ex-hausted with the measures given here.

Giving the preconditions that a measure should besymmetric, normalized, and free of dimension, arearatios will be set up. Only some of all possible ratiosfulfill these preconditions. These ratios are usefulsimilarity measures. Hence, their behaviour and se-mantical interpretations will be discussed. Also otherconditions will be investigated, especially reflexivityand the triangle equation. It will be shown that theymay not be postulated for similarity measures.

Different measures characterize different proper-ties or interrelations between position and size oftwo regions. None of the measures can be a measureof overall similarity. Consequently, at least two ofthe listed measures are necessary to describe similar-ity as well as dissimilarity. In literature, either one or

Ž .the other pair of these measures is used. Typically,it is a practical approach which leads to the choice ofmeasures without reference to alternatives. It will beshown at which point and to what extent alternativemeasures exist.

1.3. Structure

This article starts by investigating similarity as aconcept and introduces location as a reference frame

Ž .for the similarity of regions Section 2 . Then loca-tion-based measures based on intersection sets will

Ž .be introduced Section 3 . The sizes of the intersec-tion sets are normalized by setting them into ratios.These ratios will be investigated and discussed inSection 4. In simple test situations, the behaviour of

Ž .these measures will be demonstrated Section 5 .Ž .Finally, the conclusion Section 6 will discuss this

approach and its results in a wider context.

2. Similarity and location

Similarity is a concept that varies between disci-plines: in mathematics, it describes a type of trans-

Ž .formation Edgar, 1990 ; in statistics, it means thatŽ .two similar signals are correlated Jahne, 1995 ; in¨

cognition, it means that similar things belong to theŽ .same category Lakoff, 1987 ; in visual perception,

( )S. Winterr ISPRS Journal of Photogrammetry & Remote Sensing 55 2000 189–200 191

similarity is based on the laws of Gestalt theoryŽ .Metzger, 1936 ; and, in computer vision, it is re-lated to topological and geometric properties, like

ŽEuler number, area, compactness Haralick and.Shapiro, 1992 . Not all of these concepts of similar-

Ž .ity have a ratio scale Stevens, 1946 , i.e., not allconcepts can be measured.

This variety of aspects of physical, linguistic orsemantic similarity requires a specification in whichway two objects shall be similar. In this paper, thelocation of two spatial extended objects shall be

Ž .investigated for similarity. Tversky 1977 postulatedthat similarity of objects increases with the numberof common features, and decreases with the numberof distinct features. He proposed a contrast model,which expresses similarity between objects A and Bas a weighted difference of the measures of theircommon and distinct features. Similarity Ais ex-pressed as a function h of three arguments: . . . thefeatures that are common in both A and B; . . . thefeatures that belong to A but not to B; . . . the

Ž .features that belong to B but not to A.B p. 330 :

Ž . Ž .s A , B s h Al B , Ay B , By A1Ž .Ž . Ž . Ž .s a h Al B y b h Ay B yg h By A

Consequently, similarity is more than an inverseof distance or difference. Difference of spatial ex-tended objects causes costs in mapping one onto theother; whereas common features of the objects repre-sent benefits, which have also to be considered in the

Ž .mapping Vosselman, 1992 . For that reason, Tver-sky’s model provides the basis of the argumentationin this paper: similarity is considered as a combinedmeasure of similar parts and dissimilar parts. Withregard to location: similarity is a combination of the

Ž .parts sharing a location similarity in a narrow senseŽ .and the parts that are different dissimilarity .

In the following, it is generally assumed that alltreated areal objects are existing and not empty.Location is represented simply by the location func-tion:

0 if x , y fAŽ .f x , y s 2Ž . Ž .½ 1 if x , y gAŽ .

Ž . 2 Ž . Ž . 2with x,y gR vector representation or x,y gZŽ .raster representation , respectively. In the following,there will be referred to R2 only, without loss of

generality. All the formulas can be applied also to Z2

replacing integrals by sums.At this point, it makes sense to distinguish be-

tween contexts of similarity of regions. As a binaryrelation, similarity may concern:

Ž .1 Two different objects in the real world: In thiscase, similarity concerns shape only with the basic

Ž .assumption that two physical objects cannot existat the same location at the same time.

Ž .2 Two different abstractions of the same object:In this case, similarity concerns the two contexts.Location-based measures can be used to specify onetype of context by describing indicators.

Ž .3 Two different representations of the sameobject: In this case, similarity of location concernsidentity, or at least part-of relations. Similarity canbe used to match regions, to detect differences indata sets, e.g., changes, and so on.

In special circumstances, the third case can betreated as an estimation problem of a shift between

Ž .two correlated spatial signals. That is commonŽ .practice in image matching Ackermann, 1984 . But

all differences between the two signals, which cannotbe described by the shift, violate the estimationmodel. Hence, they have to be small. In this article,no restriction shall be put on the shape or correlationbetween the two regions considered. For that reason,statistical matching techniques will not be consideredfurther.

This concept of location shows that matching ofdata sets is an ill-posed problem requiring empiricalapproaches and that similarity measures for locationare, thus, empirical measures.

3. Location-based measures

In this chapter, we will derive location-basedsimilarity measures with special attention to com-

Žpleteness. They will be based on the sizes of inter-.section sets with strong interrelation to weighted

topological relations.Location of a region was defined as the space

Ž Ž ..covered by this region Eq. 2 . Measures based onŽ .location count or integrate atomic elements of

space; these are points in R2, and raster cells in Z2.For similarity measures, one will count atoms thatare covered by both regions or atoms that are cov-ered by either one or the other region. In terms of

( )S. Winterr ISPRS Journal of Photogrammetry & Remote Sensing 55 2000 189–200192

topology, the size of intersection sets between theinterior and the exterior of the considered regionscomes into focus. Thus, the strong mathematicalformulation of topological relations by emptinessr

Žnonemptiness of intersection sets Egenhofer and.Herring, 1990 is softened to graded, fuzzy or uncer-

Ž .tain topological relations Winter, 1996 . An exam-Ž .ple is the relation equal A,B : if two regions A and

B fulfill the strict relation, they share all coveredspace without distinct space which means they areperfectly similar. All other states of location will betreated as more or less equal corresponding to moreor less similar.

The number of possible combinations of suchintersection sets is finite. In the following, all loca-tion-based measures will be collected. Then the ra-tios of those measures will be investigated for theiruse of location-based similarity measures.

3.1. Intersection sets

The intersection sets between the interior andexterior will be investigated to characterize stricttopological relations. Then the qualitative relationswill be graded by the size of the sets.

Ž Ž ..The location function Eq. 2 distinguishes twoŽ Ž . .sets, the interior f x,y s 1 and the exterior

Ž Ž . .f x,y s0 of a region A. The function needs noconcept of neighborhood. Therefore, open and closedsets cannot be distinguished in the functional repre-sentation. The inverse of function f , fy1, yields thecomplement of A, i.e., ! A. Thus, for two regions,A and B, a set of four intersection sets in total can bederived.

Consider Fig. 2. Region A from a data set A andregion B from a data set B have an arbitrary posi-tion relative to each other; in the figure, the rectangle

Fig. 2. The intersection sets between two rectangular regions, Aand B, form a partition of the plane. The background is assumedto be unlimited.

A is top-left of the rectangle B, and both are over-lapping. Their intersection sets form a partition ofthe planar space with — generally, at most — foursets:

AlB , Al! B , ! AlB , ! Al! B. 3Ž .All other sets are unions of those intersection sets.

For example:

A s AlB j Al! BŽ . Ž .B s AlB j ! AlB 4Ž . Ž . Ž .AjBs AlB j ! AlB j Al! BŽ . Ž . Ž .

First of all, the size m of the sets is interesting.An elementary operation size-of is introduced here,

< <in short, written in mathematical notation by P .This operation can be defined on the location func-

Ž . Ž . 2tion for A, f x,y , and B, g x,y , in R as:

< < Ž . Ž .m s Al B s f x , y g x , y d x d yHH1x , y

y1< < Ž . Ž .m s Al! B s f x , y g x , y d x d yHH2x , y

y1< < Ž . Ž .m s ! Al B s f x , y g x , y d x d yHH3x , y

y1 y1< < Ž . Ž .m s ! Al! B s f x , y g x , y d x d yHH4x , y

5Ž .

With unlimited functions, f and g, in R2, the sizem is always `. Hence, no information is contributed4

by m . Therefore, m can be excluded from further4 4

consideration.Once the sizes m are known, they can be mappedi

Žœ.to binary measures m with values 0 0 and 1iŽ œ. � 4!0 , for ig 1 . . . 4 :

1 if m /0im™m s 6Ž .i i ½ 0 if m s0i

For the binary measures the following dependen-cies exist:

Ž .1 m is never 0 for finite A and B. It con-4

tributes no qualitative information. In consequence, asituation between any two regions A and B can bedescribed qualitatively by combinations of the triple� 4 3m , m , m . That yields 2 s8 theoretically possi-1 2 3

ble combinations.Ž . Ž . Ž .2 There is no pair of m , m and m , m that1 2 1 3

Žœ œ.can be 0 , 0 based on the presumption that neither

( )S. Winterr ISPRS Journal of Photogrammetry & Remote Sensing 55 2000 189–200 193

< < < <A nor B is empty. With A sm qm and B sm1 2 1

qm at least one term in each sum must be )03Žwith the property of partitions to be pairwise dis-joint, the size of a union of intersection sets can be

.written as a sum . That dependency excludes three of� 4 � 4 � 4the eight triples of m : 0, 0, 0 , 0, 1, 0 , 0, 0, 1 arei

impossible.The remaining five triples correspond to the fol-

lowing separable topological relations:

� 4 Ž .1. 0, 1, 1 disjunctrtouching : A and B have nopart in common;� 4 Ž .2. 1, 1, 1 overlap : A and B have parts in com-mon and parts not in common;� 4 Ž .3. 1, 0, 0 equal : all parts of A are parts of B andvice versa;� 4 Ž .4. 1, 1, 0 containsrcovers : all parts of B are partof A, and A has additional parts;� 4 Ž .5. 1, 0, 1 contained byrcovered by : all parts ofA are part of B, and B has additional parts.

Using intersection sets seems to be similar to theŽ .work of Egenhofer and Franzosa 1991 who deter-

mined the topological relation between A and B byintersection sets first. They investigated the intersec-tion sets of interiors and boundaries with the restric-tion to simple regions. They could separate eightŽ .families of topological relations. Our classificationworks also for complex regions, which are multiplyconnected regions or regions with many components.Indeed, both approaches require regular closed re-gions. The proposed topological relations represent asubset of the Egenhofer relations; Fig. 3 is a general-ization of his conceptual neighborhood graphŽ .Egenhofer and Al-Taha, 1992 .

The topological relation can serve as similarityŽ .measure on an ordinal scale Egenhofer, 1997 : equal

is the highest degree of similarity, and each of the

Fig. 3. The topological relations representable by the two-dimen-sional intersection sets, related by conceptual neighborhood.

direct neighbored relations in the graph of Fig. 3guarantees higher similarity than the relation dis-junct that is farthest from equal, with a distance oftwo graph edges. The next step has to be the quanti-tative description of similarity.

3.2. Combinations of intersection sets

Ž .The size measures m of Eq. 5 will be investi-i

gated numerically for all possible combinations ofintersection sets.

Ž .With partition into at most four intersection sets,in principle 16 combinations of intersection sets arepossible. The number follows from the sequence of

nbinomial coefficients , with ns4, the numberž /k� 4of intersection sets, and kg 0, . . . , 4 , the number

of combined elementary sets. Concerning the size ofthese combinations, all those combinations contain-ing m as a term of the sum will be constantly `.4

With excluding m and the limitation to the triple4� 4m , m , m , the number of relevant unions of1 2 3

intersection sets decreases to the sequence of bino-� 4mial coefficients with ns3 and kg 0, . . . ,3 . Their

sizes are:

œks0: 0 excluded for similar reasons than mŽ .4

ks1: m , m , m the elementary setŽ .1 2 37Ž .

ks2: m qm , m qm , m qm all 2-tuplesŽ .1 2 1 3 2 3

ks3: m qm qm the single 3-tupleŽ .1 2 3

( )S. Winterr ISPRS Journal of Photogrammetry & Remote Sensing 55 2000 189–200194

The domain of values for the size of an arbitrary2 Ž < <. w xset X in R is dom X s 0, ` . The case ks0 is

trivial:

<œ< w xdom 0 s 0 . 8Ž .Ž .Regions A and B shall be limited to finite sets

< < < <which may not be empty. Then holds: 0- A , B -

`. It follows for the sizes of three considered inter-Ž .section sets ks1 :

< < < < < <dom m :0- AlB Fmin A , B 9Ž . Ž .1

< < < <dom m :0- ! AlB F B2

< < < <dom m :0- Al! B F A3

Ž Ž ..For ks2, it follows cf. Eq. 4 :

< < < < < <m qm s AlB q Al! B s A 10Ž .1 2

< < < < < <m qm s AlB q ! AlB s B1 3

< < < <m qm s Al! B q ! AlB2 3

with the domains:

< <w xdom m qm s AŽ .1 2

< <w xdom m qm s BŽ . 11Ž .1 3

< < < <w xdom m qm s 0, A q BŽ .2 3

Ž Ž ..Finally, for ks3, it follows cf. Eq. 4 :

< < < < < <m qm qm s AlB q Al! B q ! AlB1 2 3

< <s AjB 12Ž .with the domain:

< < < < < < < <dom m qm qm s max A , B , A q B .Ž . Ž .1 2 3

13Ž .

In the following, the union sets will be used asshort forms for the combination of elementary inter-section sets: m sm qm qm .8 1 2 3

3.3. Other size measures

Obviously other set size measures exist. Some ofthem already occur in the domain limitations of the

Ž Ž . Ž . Ž ..location-based measures Eqs. 9 , 11 and 13 . Inthe linear form, they are:

< < < < < < < <m smin A , B , m smax A , B ,Ž . Ž .5 6

< < < <m s A q B . 14Ž .7

These three measures are dependent by:

< < < < < < < < < < < <min A , B qmax A , B s A q B . 15Ž . Ž . Ž .

All three of these measures are independent fromthe relative location of two regions, and for thatreason, they are not considered as candidates forlocation-based similarity measures. But they areneeded for normalization of the location-based mea-sures as the consideration of domains has shown.Besides, these measures are symmetric.

There can also be set up nonlinear measures.< < < <Partly, they are of higher dimensions — e.g., A P B

— which disqualifies for normalization. The otherpart consists of norms of higher dimension measures.A prototype is the L -norm, e.g., ps2 producesp'< < < < Ž .A P B Molenaar and Cheng, 1998 .

Such norms use necessarily the same combina-tions of sets as the already given measures; theycontribute no new information. Therefore, the givenset of size measures is sufficiently complete.

4. Location-based similarity measures

In this section, we will derive location-based sim-ilarity measures with special attention on complete-ness. The size measures of Section 3 are used andcoupled with three criteria for similarity measures:symmetry, normalization and freedom of dimension.It will be possible to set up lists of such measuresand to describe their properties.

4.1. Criteria for similarity measures

Three criteria will be established to specify simi-larity measures. With these criteria, it will be possi-ble to derive such measures from the size measures.As already discussed similarity is an empirical con-cept which is not identical to distance. Nevertheless,special conditions for distances will be investigatedtoo.

The considered criteria for similarity measuresare:

Ž .1 Symmetry: Without explicit reasons from priorknowledge, the situation between A and B is sym-metric; no region is preferred as a reference or a

( )S. Winterr ISPRS Journal of Photogrammetry & Remote Sensing 55 2000 189–200 195

prototype. In such neutral situations, a measure mustbe independent from the order of the consideredregions A and B:

similar A , B ssimilar B , A . 16Ž . Ž . Ž .Ž .2 Domain limitation: It is useful to have normal-

ized measures. This property eases interpretation andcomparison of measures:

0F similar A , B F1. 17Ž . Ž .

For this reason, suited ratios of size measures areintroduced as similarity measures.

Ž .3 Freedom of dimension: Similarity measuresshall be free of dimension because similarity is nophysical concept or property. That can be reached bybuilding ratios of measures with the same dimension.

4.2. Symmetry

First, we consider symmetry in the size measures.The case ks0 is meaningless in the context ofsimilarity. From all other tuples only a few are

Žsymmetric taking advantage from abbreviations by.unions :

œ Ž .ks0:0 excludedabove

Ž .ks1:m basedon Al B1 18Ž .Ž Ž . Ž ..ks2:m qm basedon Al! B j ! Al B2 3

Ž .ks3:m basedon Aj B8

In the following, it is sufficient to investigate thisreduced set of size measures as the only symmetricones. They have to be normalized now.

4.3. Normalization to dimension-less ratios

Ž .The symmetric size measures of Eq. 18 will benormalized. For that purpose, the domains of size

Ž Ž . Ž . Ž ..measures are used Eqs. 9 , 11 and 13 . Normal-ization must not destroy the symmetry property; forthat reason, the norm factors must be symmetric too.Further, norm factors may never take the value 0.

< <This argument excludes the measures m s AlB1Ž . <Ž . Ž . <and m qm s ! AlB j Al! B from the2 3

list of possible norm factors. To keep the thirdcriterion, only the linear size measures are consid-ered as norm factors.

The remaining candidates for norm factors are themeasures:

m , m , m , m 19Ž .8 5 6 7

Table shows the matrix of all 4=3 ratios. Not all� 4of the ratios are normalized to 0 . . . 1 . The ratios

will be discussed individually in the next section.

4.4. Similarity measures

In this section, the ratios of Table 1 will beinvestigated. Normal ratios are considered as similar-ity measure, or as dissimilarity measure, if theirbehaviour meets, the following idea of location-basedsimilarity. The meaning of a location-based similar-ity measure shall be that of a fuzzy membership

Ž .value Zadeh, 1965 to a topological relation; a valueof 1 refers to total correspondence with the discreterelation and a value of 0 refers to total disagreementwith the discrete relation. Such a relation can beequal. There will be also similarity measures regis-tering correspondence to any containment. Themeaning of the individual measures will be discussedwith regard to the actual topological relation referredto.

The ratios in detail are as follows.w xs : Domain of values is 0, 1 . 0 stands for totally11

Ž œ.disjoint regions AlBs0 , and 1 stands for identi-Ž .cal regions AlBsAjB . This ratio is a proto-

typical example of a location-based similarity mea-sure increasing with the grade of similarity up toequality.

w xs : Domain of values is 0, 1 . 0 occurs only if12

AsB, and 1 occurs if A and B are totally disjoint.With this behaviour, the ratio complements s ,11

Table 1Combination of all possible ratios of size measures

Denominator Numerator< < < <m s AlB m qm m s AjB1 2 3 8

< <s !AlB< <q Al!B

< <m s AjB s s s8 11 12 13Ž < < < <.m smin A , B s s s5 21 22 23Ž < < < <.m smax A , B s s s6 31 32 33

< < < <m s A q B s s s7 41 42 43

Not all of the ratios are normal.

( )S. Winterr ISPRS Journal of Photogrammetry & Remote Sensing 55 2000 189–200196

which is corresponding to the complementing sets inthe numerators with regard to the denominator. Thisratio is a typical dissimilarity measure decreasingwith the grade of similarity.

w xs : Domain of values is 1 , trivially.13w xs : Domain of values is 0, 1 . 0 stands for totally21

disjoint regions, and 1 stands for complete coverage,Ž .containment, or identity : . The ratio does not

recognize the proportion in size between A and Band, therefore, it is not suited as a similarity mea-sure. Nevertheless, this ratio could be used as a

Ž .measure for the grade of symmetric overlap.w .s : Domain of values is 0, ` . Again, 0 occurs22

only if AsB. But the denominator is not sufficientto normalize the numerator. That property excludesthis ratio from the list of similarity measures. Addi-tionally, values different from 0 are difficult to inter-pret, because numerator and denominator are notcorrelated.

w .s : Domain of values is 1, ` . 1 occurs if23

AsB, and the ratio increases in all other cases.Without being normalized, this ratio is excludedfrom the list of similarity measures.

w xs : Domain of values is 0, 1 . 0 occurs if both31

regions are disjoint, and 1 occurs only if AsB incontrast to s . With its sensitivity for proportions21

between A and B, this ratio is a suited similaritymeasure.

w xs : Domain of values is 0, 2 . 0 occurs if AsB,32< < < <and 2 occurs if A is disjoint from B and A s B .

As long as one region is coveredrcontained in theother region, the value of the ratio is limited by anupper bound of 1. As long as both regions aredisjoint, the value of the ratio is limited by a lowerbound of 1. In any case of overlap, no prediction canbe made. This ratio could be normalized by division

Žby 2; then it represents a dissimilarity measure de-.creasing with growing similarity .

w xs : Domain of values is 1, 2 . The value 1 stands33

for all cases of coveragercontainment or identity.< < < <The value 2 occurs for disjoint regions, if A s B .

Neither domain nor the behaviour recommends thisratio as similarity measure.

w xs : Domain of values is 0, 1r2 . 0 stands for41

disjoint regions, and 1r2 stands for AsB. If weŽ .would normalize the ratio by multiplication with 2 ,

the result would be a mean size of A and B asŽ Ž ..denominator cf. Eq. 15 . Then the behaviour of

Ž .the normalized ratio s would be in between of41

s and s . This yields no new information.31 21w xs : Domain of values is 0, 1 . 0 occurs if AsB,42

and 1 occurs if A and B are disjoint. Again, this is amean ratio of s and s , but it fulfills the condi-22 32

Ž .tions of a dis- similarity measure.w xs : Domain of values is 1r2, 1 . The lower43

bound occurs if AsB. 1 occurs in all cases ofdisjoint regions, but is reached also in all other

< < < <topologic relations, if A and B are different in theorder of magnitude. This ratio represents an extraor-dinary dissimilarity measure.

In summary, given all possible ratios of sizemeasures the following are similarity measures:

� 4similarity measures: s , s , s )2 . 20Ž .11 31 41

Another list contains dissimilarity measures:

dissimilarity measures:

s 132s , , s , s y )2 . 21Ž .12 42 43½ 5ž /2 2

Both lists are complete regarding the given crite-ria.

4.5. Combination of similarity measures

In this section, different combinations of similar-ity measures will be discussed. Evidence will begiven that both lists from above are needed, whichwill be supported by some examples of recent appli-cations.

Ž Ž ..With Tversky’s contrast model in mind Eq. 1 ,our lists of similarity and dissimilarity become moretransparent. All similarity measures are based on the

< <numerator AlB , which represents the commonfeatures between A and B. All dissimilarity mea-sures, with one exception, are based on the numera-

< < < <tor ! AlB q Al! B , which represents the dis-tinctive features of A and B. The exception, s ,43

treats topological relations combined with orders ofmagnitude mixing different kinds of features, metricand topologic ones. These considerations lead to theexpectation that in practical applications one mea-sure from each list is required to assess similaritycompletely.

( )S. Winterr ISPRS Journal of Photogrammetry & Remote Sensing 55 2000 189–200 197

ŽConsider the following example Harvey et al.,.1998 : To evaluate a match of two regions, two

measures are introduced: an inclusion function, whichis in fact identical to s and yields the grade of21

Žoverlap instead of similarity nevertheless: the com-.mon features , and a surface distance, which is iden-

Žtical to s and which measures dissimilarity dis-12.tinctive features . Thus, the hypothesis is supported

that two measures are needed. The question remainsinteresting whether other pairs of measures wouldhave been also useful. The authors do not discusstheir choice. Another example is mentioned in Ragia

Ž .and Winter 1998 : The authors match two buildingsfrom two data sets with special requirements regard-ing the aggregation levels of the data sets. Part ofrelations are accepted as a match. Similarity is re-placed by weighted topological relations, e.g., by s21

and s . With this choice, only common features are31

considered, but not the distinctive.Similarity of regions has to be handled in a

different way to similarity of lower dimensionalŽ .entities. Recently, Walter 1997 matched lines and

points of street networks. He works only with dis-Ž .tance measures costs neglecting the weight of com-

mon features. That is justified for one-dimensionaldata sets because the probability is very small that

Žtwo lines coincide by chance the probability for two.points is even zero .

Similarity of spatial relations cannot be treated byŽsizes of sets the single exception are topological

. Ž .relations . For example, Bruns and Egenhofer 1996Ž .and Egenhofer 1997 are investigating spatial scenes.

Though they involve metric refinements of topologi-Ž Ž ..cal relations cf. Eq. 5 , they need an additional

concept of similarity for other spatial relations. Theyalso work with distance measures, which they derivefrom conceptual neighborhood graphs.

Metric properties would require additional condi-tions, especially reflexivity and the triangle equation.

Now it will be investigated how far the location-basedsimilarity measures follow such rules.

A symmetric, normalized similarity measure al-lows to introduce its inverse:

dissimilar A , B s1ysimilar A , B 22Ž . Ž . Ž .

The inverse topological relation is always dis-junct, which will be supported by the interpretationof the similarity and dissimilarity measures in Sec-tion 4.4. The found measures will not complementeach other; therefore, the formal introduction of aninverse is useful.

Reflexivity:

similar A ,! A s0, similar A , A s1. 23Ž . Ž . Ž .

If B is assigned to ! A, the first rule is fulfilledby all three similarity measures, with m s0 for1

disjunct regions. If B is assigned to A, the secondrule is fulfilled by all three similarity measures.

Reflexivity put on dissimilarity requires an ex-change of the rules applying the inverse propertyŽ Ž .. Ž .Eq. 22 on Eq. 23 :

dissimilar A ,! A s1, dissimilar A , A s0Ž . Ž .24Ž .

A triangle equation, e.g., in the form:

similar A , B )similar B ,C Fsimilar A ,CŽ . Ž . Ž .

does not hold. Multiplication is required to keep thenorm, and the relation sign has to be converted formultiplication factors -1. But neither disjunct re-gions A and C require that A and B or B and C are

Ž .disjunct i.e., their similarity is 0 , nor equal regionsA and C require that A and B or B and C are equaltoo. Location-based similarity is not metric.

Ž . Ž . Ž .Fig. 4. The three test scenarios for similarity measures: left a , center b , right c . Explanation: see text.

( )S. Winterr ISPRS Journal of Photogrammetry & Remote Sensing 55 2000 189–200198

Ž .Fig. 5. The behavior of the measures in the three tests see text .

( )S. Winterr ISPRS Journal of Photogrammetry & Remote Sensing 55 2000 189–200 199

5. Testing the behaviour of the found measures

Ž Ž . Ž ..Similarity measures Eqs. 20 and 21 can beimplemented directly. A small test program allows toinvestigate the behaviour of similarity measures sys-

1 Ž .tematically. Three tests were performed Fig. 4 ,each of them concentrating on a specific aspect of

Ž .location similarity. Scenario a starts from two poly-gons of the same size and shape, and one polygon

Ž .moves over the location of the other. Scenario b isbasically the same scenario, but the size and shape ofthe two polygons are significantly different. ScenarioŽ .c starts from two polygons, where one polygon isinside of the other, and the smaller polygon growsstepwise.

The results are shown in Fig. 5, where each lines to s shows the behaviour of a similarity mea-11 43

Žsure s to s during the transformation translation11 43.or resizing . It can be observed that in each scenario

Žthe two sets of measures are correlated some mea-.sures are even identical . Further, it can be observed

that in certain scenarios some measures act moreselectively than others. However, basically, the mea-sures behave in a similar way. It seems not to be

Ž .relevant which measure pair of measures is chosenin a specific situation.

6. Summary, discussion and conclusion

This article presents a systematic investigation oflocation-based similarity measures between discreteregions of different data sets. Considering only mea-sures that are symmetric, normal and free of dimen-sion, it is shown that only seven of such measuresexist. The set of similarity measures can be classifiedinto measures counting common features of regionsand measures counting distinct features. A completedescription of similarity requires one measure fromboth classes.

By measuring the sizes of intersection sets somesimilarity measures are related strongly to gradedtopological relations. s represents a grade of equals,11

1 This test program is implemented in Haskell; the code isavailable from ftp:rrgi27.geoinfo.tuwien.ac.atrwinterrwinter00location.hs.

s represents a grade of overlaps, s , as the com-21 12

plement of s , represents a grade of disjoint. Grada-11

tions of containment cannot be found; but a conceptof a graded containment may probably coincide withthe grade of overlap. Boundary-based topologicalrelations are not treated at this point.

Ž Ž ..Tversky’s contrast model Eq. 1 has the advan-tage of having only one measure for overall similar-ity. On the other hand, there may be proposed asmany measures s as different weights a , b , g exist,without any idea for such weights. Then the choicedepends on the context of a comparison, whichcannot be treated systematically. It is omitted todiscuss combinations of weights. However, a fewstatements about the weights are possible. A sym-metric measure requires bsg , which correspondsto the dissimilarity measures that do not distinguishAyB and ByA. The special case of a cost modelis included by setting as0, and also a benefitmodel can be represented by bs0 and gs0.

It may be criticized that our concept of locationŽ 2 . Ž 2 .based on sets of points R or atoms Z is too

specific in parametrization. Indeed, other frames ofŽ . Žlocational reference are possible Bittner and Stell,

.1998 . Moreover, with the Hausdorff distance, adistance measure exists which is even more general

Ž .in parametrization of space Edgar, 1990 . The Haus-Ždorff distance is symmetric and one-dimensional the.set sizes above are two-dimensional in planar space .

It is zero if and only if AsB. Any other valueŽ .-0 does not allow to conclude a topological con-figuration. This disadvantage cannot be adjusted be-cause an adequate measure of common features isnot yet known. For that reason, the Hausdorff dis-tance cannot be completed to a similarity measure.

Ž Ž ..With the binary location function Eq. 2 , onlydiscrete regions are tested for similarity. That fits todata sets in today’s spatial databases, where a needfor quality description is realized but usually notavailable. On the other hand, the presented model forsimilarity measures could be refined for uncertain or

Ž .imprecise regions. Molenaar and Cheng 1998 pre-sented a similarity measure for fuzzy regions fittinginto our framework. Another idea is to replace abinary function f by a spatial distribution functionwhich corresponds to a convolution of f with adistribution function, e.g., a Gaussian. The conse-quences have to be worked out elsewhere.

( )S. Winterr ISPRS Journal of Photogrammetry & Remote Sensing 55 2000 189–200200

The presented similarity measures increase lin-early with common location as a consequence ofsetting elementary set sizes into ratios. Such a modelis purely mathematical, and there is no reason toassume that human cognitive concepts are compara-ble, with the exception of simplicity.

Similarity is a general concept applied to manyspatial decision problems. The systematic investiga-tion succeeds by limiting itself to a strict frame ofreference. Concentrating on location of two spatial

Ž .objects regions , an elementary set of similaritymeasures can be presented. To what extent the modelcan be expanded leaves to be investigated.

Acknowledgements

The idea of this paper goes back to a discussionwith Andrew Frank. Besides, I had interesting dis-courses about philosophical aspects of similarity andlocation with Katrin Dyballa and Thomas Bittner,both from Vienna.

References

Ackermann, F., 1984. High precision digital image correlation.39th Photogrammetric Week, Institute for Photogrammetry,Stuttgart University. Schriftenr. Inst. Photogrammetrie vol.9, pp. 231–244.

Bittner, T., Stell, J., 1998. A boundary-sensitive approach toqualitative location. Ann. Math. Artif. Intell. 24, 93–114.

Bruns, H.T., Egenhofer, M.J., 1996. Similarity of spatial scenes.Ž .In: Kraak, M.-J., Molenaar, M. Eds. , Advances in GIS

Research. Taylor & Francis, Delft, pp. 173–184.Edgar, G.A., 1990. Measure, Topology, and Fractal Geometry.

2nd edn. Undergraduate Texts in Mathematics, Springer, NewYork.

Egenhofer, M.J., 1997. Query processing in spatial-query-by-Ž .sketch. J. Visual Lang. Comput. 8 4 , 403–424.

Egenhofer, M.J., Herring, J.R., 1990. A mathematical frameworkfor the definition of topological relationships. 4th InternationalSymposium on Spatial Data Handling. International Geograph-ical Union, Zurich, pp. 803–813.

Egenhofer, M.J., Franzosa, R.D., 1991. Point-set topological spa-Ž .tial relations. Int. J. Geogr. Inf. Syst. 5 2 , 161–174.

Egenhofer, M.J., Al-Taha, K.K., 1992. Reasoning about gradualchanges of topological relationships. Theories and Models ofSpatio-Temporal Reasoning in Geographic Space. In: Frank,

Ž .A.U., Campari, I., Formentini, U. Eds. , Lect. Notes Comput.Sci. vol. 639, Springer, Berlin, pp. 196–219.

Flasch, K., 1986. Das philosophische Denken im Mittelalter.Philipp Reclamjun, Stuttgart.

Haralick, R.M., Shapiro, L.G., 1992. Computer and Robot Visionvol. I, Addison-Wesley, Reading, MA.

Harvey, F., Vauglin, F., Ali, A.B.H., 1998. Geometric matchingof areas — comparison measures and association links. In:

Ž .Poiker, T.K., Chrisman, N. Eds. , 8th International Sympo-sium on Spatial Data Handling. International GeographicalUnion, Vancouver.

Jahne, B., 1995. Digital Image Processing. 3rd edn. Springer,¨Berlin.

Lakoff, G., 1987. Women, Fire, and Dangerous Things — WhatCategories Reveal About the Mind. Univ. Chicago Press,Chicago.

Metzger, W., 1936. Gesetze des Sehens. Senckenberg-Buch vol.VI, W. Kramer, Frankfurt am Main.

Molenaar, M., Cheng, T., 1998. Fuzzy spatial objects and theirdynamics. ISPRS Commission IV Symposium AGIS BetweenVisions and ApplicationsB. In: Fritsch, D., Englich, M., Sester,

Ž .M. Eds. , Int. Arch. Photogramm. Remote Sens. vol. 32r4,pp. 389–394, Stuttgart.

Ragia, L., Winter, S., 1998. Contributions to a quality descriptionof areal objects in spatial data sets. ISPRS Commission IVSymposium AGIS Between Visions and ApplicationsB. In:

Ž .Fritsch, D., Englich, M., Sester, M. Eds. , Int. Arch. Pho-togramm. Remote Sens. vol. 32r4, pp. 479–486, Stuttgart.

Stevens, S., 1946. On the theory of scales of measurement.Ž .Science 103 2684 , 677–680.

Ž .Tversky, A., 1977. Features of similarity. Psychol. Rev. 84 4 ,327–352.

Vosselman, G., 1992. Relational Matching. Lect. Notes Comput.Sci. vol. 628, Springer, Berlin.

Walter, V., 1997. Zuordnung von raumbezogenen Daten amBeispiel der Datenmodelle ATKIS und GDF. PhD thesis,Fakultat f ur Bauingenieur-und Vermessungswesen der Univer-¨ ¨sitat Stuttgart.¨

Winter, S., 1996. Unsichere topologische Beziehungen zwischenungenauen Flachen. PhD thesis, Landwirtschaftliche Fakultat¨ ¨der Rheinischen Friedrich-Wilhelms-Universitat Bonn.¨

Zadeh, L.A., 1965. Fuzzy sets. Inf. Control 8, 338–353.


Recommended