Is Traditional Conceptual Modeling Going to Become Obsolete?

Lukyanenko and Parsons Is Traditional Conceptual Modeling Going to Become Obsolete?

Proceedings of the 12th AIS SIGSAND Symposium Provo, Utah, May 17-18, 2013 1

Is Traditional Conceptual Modeling Going to Become Obsolete?

Roman Lukyanenko

Memorial University of Newfoundland

[email protected]

Jeffrey Parsons

Memorial University of Newfoundland

[email protected]

ABSTRACT

Traditionally, the research and practice of conceptual

modeling assumed relevant information about a domain is

determined in advance to be used as input to design. The

increasing ubiquity of systems – characterized by

heterogeneous and transient users, customizable features,

and open or extensible data standards – challenges a

number of long-held propositions about conceptual

modeling. We raise the question whether conceptual

modeling as commonly understood is an impediment to

systems development and should be phased out. We

discuss the motivation for rethinking approaches to

conceptual modeling, consider traditional approaches to

conceptual modeling and provide empirical evidence of

the limitations of traditional conceptual modeling. We

then propose three directions for future conceptual

modeling research.

Keywords

Conceptual modeling, Information Systems Analysis and

Design, Ontology, Cognition

INTRODUCTION

Traditionally information systems (IS) were developed

and primarily used within organizational boundaries (e.g.,

Fry and Sibley 1976; Mason and Mitroff 1973; Zuboff

1988). IS development in this setting was user- and

consensus-driven: users (or stakeholders) define system

requirements, use and evaluate designed systems, while

close proximity with users makes it possible for analysts

and designers to gather requirements, verify their fidelity,

and resolve any conflicting perspectives before

implementation. As users were mostly corporate

employees or parities closely affiliated with the

organization (e.g., suppliers), any individual or divergent

perspectives were generally subsumed by goals and

perspectives of the organization.

The user/consensus-driven development underlies

prevailing approaches to conceptual modeling – a phase

of IS development aimed at “formally describing some

aspects of the physical and social world around us for the

purposes of understanding and communication”

(Mylopoulos 1992, p. 51). Conceptual modeling

traditionally results in specifications that capture relevant

knowledge about the application domain. This

specification then guides development by supporting

communication between developers and users, promoting

domain understanding and guiding design process (Wand

and Weber 2002).

The traditional modeling paradigm is increasingly

challenged as more organizations draw on knowledge

outside of organizational boundaries and become

interested in perspectives of individual users. Under these

premises, it may no longer be feasible to reach all

potential users and establish an agreed-upon specification

of a domain.

Here we pose a question of whether conceptual modeling

as commonly understood is becoming an impediment for

managing distributed heterogeneous information. We

present motivation for this research, discuss traditional

approaches to conceptual modeling and provide empirical

evidence of the limitations of traditional conceptual

modeling in distributed heterogeneous settings. We then

propose three directions for future conceptual modeling

research.

CHALLENGES TO TRADITIONAL MODELING PARADIGM

The interest in distributed heterogeneous information is

growing. Knowledge created outside of controlled

internal information production process is of increased

value to organizations. Such information, for example,

can better connect internal decision process with

information available tobusiness partners, customers, and

general public (Doan et al. 2011; Hand 2010; March et al.

2000; Zwass 2010).

Within organizations initiatives that support flexible

knowledge management, flexible routines, dynamic

sense-making, grass-roots innovation are on the rise

(Leonardi 2011).

Organizations are increasingly looking to understand

individual users (e.g., customers) and cater to their unique

and changing needs. First, the proliferation of mobile,

minituarized and ubiquitous computing exposes IS to

diverse and unpredictable situations (e.g., when scuba-

diving) and demands systems to be adaptive and flexible

(Lyytinen and Yoo 2002). Second, personalization of user



experience increases profitability by better matching

product and service offerings to individual users needs

(Brynjolfsson et al. 2011). Third, the rise in social

networking and peer-to-peer computing (e.g., Facebook,

Twitter, YouTube, Flickr) fuels demand for more flexible

and natural information exchange between people. The

social capital created through high user interaction is of

significant economic and social value {{753

Zwass,Vladimir 2010}}.

Of particular relevance is crowdsourcing that engages

users to work on sponsor-defined tasks. Many

corwdsourcing initiatives aim to capture unique

conceptualizations of diverse distributed audiences. This

is prevalent in a type of crowdsourcing known as citizen

science that harnesses crowds for scientific uses and

broadly encourages ingenuity, creativity, and divergent

thinking (Goodchild 2007; Hand 2010).

Heterogeneous and distributed information poses a

significant challenge to traditional conceptual modeling

that assumed that relevant aspects of reality were known

or knowable in advance. Below, we examine traditional

conceptual modeling in light of the emerging challenges.

Since the 1970s numerous conceptual modeling grammars

have been developed. The prevailing approaches to

conceptual modeling, such as Entity-Relationship (E-R),

UML class diagrams, involve specification of conceptual

entities (classes, entity types), attributes (or properties)

and relationships between entities (Chen 1976; Evermann

and Wand 2001; Greenspan et al. 1982). Constructs in

other modeling approaches may include roles, actors,

agents, goals, activities, frames or patterns (see

Mylopoulos 1998). Modeling grammars specify rules by

which data (information, knowledge) in a domain is

organized to support IS functions (e.g., state-tracking).

Generally, this is done by embedding structures

developed during conceptual modeling (e.g., collection of

predefined classes, frames, roles) in IS objects, such as

database schema, user interface, or code logic. IS use,

including data creation, maintenance and retrieval is then

mediated by these objects. Since predominantly

conceptual structures are abstract - in that they do not

represent concrete individual objects or events, but rather

generalized or stylized (Kaldor 1961, p. 178)

representations - the fundamental approach to conceptual

modeling is representation by abstraction. Abstraction-

driven conceptual modeling deliberately ignores some

aspects of reality capturing only relevant information

(where users, stakeholders indicate what is relevant). For

example, Olive (2007) content “a conceptual schema is

the definition of the general domain knowledge that the

information system needs to perform its functions;

therefore, the conceptual schema must include all the

required knowledge” (p. 29, emphasis added).

For example, a typical script made using the popular E-R

grammar may depict entity types, attributes of entity types

and relationship types with attributes. Entity types (e.g.,

student, customer, equipment) abstract from differences

among instances (e.g., a particular student, or a specific

customer), instead capturing perceived equivalence of

instances. Hence, many conceptual modeling grammars

consider instances (objects) to be members of their

classes: “[o]ne principle of conceptual modeling is that

domain objects are instances of entity types” (Olivé 2007,

p. 383). Abstraction-based modeling was deemed critical

to “organize the information base and guide its use,

making it easier to update or search it” (Mylopoulos and

Borgida 2006, p. 35).

Representation by abstraction presupposes that consensus

can be reached among users (stakeholders) on what is

relevant. This assumption was considered somewhat non-

problematic to the extent that development occurs in close

contact with system users and other key stakeholders.

Close contact with users provided an opportunity to

resolve conflicts in individual views and generated an

agreed-upon abstract conceptualization of a domain (Pohl

1994).

As we discussed earlier, the assumption that users can be

identified, reached and engaged in consensus building is

becoming inadequate in a growing number of cases.

Aside from the difficulty of identifying and reaching all

potential users in distributed and dynamic settings, many

potential users may lack domain expertise (e.g., consumer

products knowledge) and have unique views or

conceptualizations that are unstable and incongruent with

those of project sponsors and other users (Erickson et al.

2012; Lukyanenko et al. 2011). However, since a

consensus is no longer feasible, the resulting system may

be critically defective. For example, an IS representing a

domain as perceived by some users may marginalize, bias

or exclude possibly valuable conceptualizations of other

users (Lukyanenko and Parsons 2011b; Parsons et al.

2011b). A growing body of research is looking to address

the challenges of modeling information in heterogeneous

environment. Typically solutions involve modification of

abstraction-based modeling grammars (Ma and Yan 2008)

and are therefore not entirely free of the negative

consequences of the abstraction-driven models. As an

alternative, we examine whether it is more advantageous

to develop IS without modeling domains a priori.

Consider an IS development without conceptual

modeling. In contrast with difficulties of modeling

distributed heterogeneous information, it is becoming

increasingly possible to store such data. Since the

beginning of database management in 1950s, enhanced

computing capabilities coupled with conceptual

development led to liberation of modeling from physical

constraints (Fry and Sibley 1976; Parsons 2003).

Overtime the focus shifted to capturing greater domain

semantics. Notable data models with advanced semantic

support include instance-based (Parsons and Wand 2000),

graph (Angles and Gutierrez 2008), semistructured

(Abiteboul 1997), and fuzzy (Ma and Yan 2008) data

models. Leveraging advanced data modeling, data can be

https://www.researchgate.net/publication/2621699_Emancipating_Instances_from_the_Tyranny_of_Classes_in_Information_Modeling?el=1_x_8&enrichId=rgreq-b453c9e2-cb11-463b-a43b-637238696a5b&enrichSource=Y292ZXJQYWdlOzI3MTUzMDQzMDtBUzoxOTE0NDMwODk4NDYyNzhAMTQyMjY1NDk5Nzg3Ng==

https://www.researchgate.net/publication/220040835_Co-Creation_Toward_a_Taxonomy_and_an_Integrated_Research_Perspective?el=1_x_8&enrichId=rgreq-b453c9e2-cb11-463b-a43b-637238696a5b&enrichSource=Y292ZXJQYWdlOzI3MTUzMDQzMDtBUzoxOTE0NDMwODk4NDYyNzhAMTQyMjY1NDk5Nzg3Ng==



less structured which requires very little or no modeling.

For example, using instance-based data model,

information can be collected without having to classify

relevant instances; information about instances can be

stored in terms of attributes (Parsons and Wand 2000).

Different users can supply different attributes for the same

instance. Failure to agree on classes, relationship types or

attributes is no longer problematic as both convergence

and divergence of views is accommodated: any relevant

attribute can be seamlessly captured. The attributes can be

then queried (e.g., as per on ad hoc needs) to select

instances stored based on classes of interest. Since classes

and other abstract constructs are not necessary before

implementing a system, conceptual modeling may not be

needed at the design phase (at least not for the purposes of

generating a database schema).

Indeed, the instance-based or other flexible solutions

appear to address the challenges of reaching consensus

and accommodating individual and unanticipated views

and uses. Critically these solutions permit to bypass a

major part of IS development – the creation of a formal

representation of knowledge in a domain. This

significantly simplifies systems analysis and does so in

the environment considered extremely problematic for

traditional analysis. Furthermore, the instance-based IS

appears to improve data quality (e.g., accuracy per unit of

data) and information yield (e.g., greater number of

instances stored) compared to more traditional (i.e., class-

based) systems (Lukyanenko and Parsons 2011a;

Lukyanenko and Parsons 2011b; Parsons et al. 2011a). 1

EXPERIMENT

To empirically evaluate the instance-based IS with no a

priori conceptual modeling, we designed a laboratory

experiment in the context of online citizen science. 2

Many popular citizen science applications epitomize

modeling challenges discussed above. These systems are

established primarily to serve the needs of scientists, but

the actual users or contributors (i.e., citizen scientists) are

ordinary people, often lacking subject matter expertise

and possessing diverse domain views (Coleman et al.

2009; Snäll et al. 2011). Imposing a particular view upon

content creators may focus (or bias) contributors to one

particular goal (e.g., species identification, classification

of galaxies), but fail to capture additional information

citizen scientists may wish to communicate.

Current approaches to citizen science follow traditional

modeling principles. Popular citizen science projects (e.g.,

www.eBird.org, www.iSpot.org.uk) involve users in

1 Both authors are developing a real IS artifact powered by the

instance-based data model in the citizen science domain, where

conceptual modeling focused on organizing knowledge about a

domain was virtually non-existent. 2 This experiment was also used to provide support for the

impact of class-based models on data accuracy; this issue is

beyond the scope of the current study.

positive identification of species or genera (e.g.,

American Robin). Species and genus are classification

levels with widely accepted scientific utility. In contrast,

the generally preferred level of classification for non-

experts is the basic level (Rosch et al. 1976). Unlike the

species level, the basic level (e.g., bird, fish, tree) tends to

be an intermediate taxonomic level (e.g., “bird” is a level

higher than “American Robin”, and lower than “animal”).

Species/genus-level classes represent useful classes in a

natural history application, while basic-level classes

operationalize intuitive classes natural to non-expert

users; therefore both are reasonable for constructing

abstraction-driven conceptual models of the natural

history citizen science IS. To contrast traditional

conceptual modeling with a “no modeling” alternative, we

explore an instance-based solution to citizen science

where sightings of organisms are reported in terms of

attributes of instances (Parsons et al. 2011b). Users are

thus not required to comply with a priori created models

of abstraction (e.g., classes).

Consistent with philosophy and cognition that postulate

uniqueness of individual instances and mental models of

instances (Bunge 1977; Panaccio 2005; Smith 2005), we

argue non-expert participants, if given the opportunity,

will provide substantial numbers of unique attributes.

Since abstractions such as classes are based on

commonalities of instances, they will be unable to

accommodate some of the attributes participants are

inclined to provide. Furthermore, as it may be difficult to

a priori anticipate the kinds of attributes that are salient

for different users, it is infeasible to choose classes that

will account for all attributes. We thus hypothesize:

Hypothesis: Non-experts will describe instances in terms

of attributes that cannot be captured by definitions of

classes (both intuitive and useful) used to model

instances.

While we predict that many attributes provided by

different users will be unique, it is also desirable to have

some degree of attribute agreement. Indeed, complete

disagreement (i.e., no overlap in attributes provided by

different participants) would mean that using attributes to

represent reality is unreliable. To broadly ensure the value

of collecting and storing attributes of instances, ideally

agreement on a core set of attributes should hold for both

familiar (e.g., instance of American robin) and unfamiliar

(e.g., instance of obscure mushroom) instances; both

simple and complex. Thus, we wish to investigate the

degree to which non-experts converge on the kinds of

attributes used to describe familiar and unfamiliar, as well

as complex and simple, instances. In view of this, we seek

to answer the following exploratory question:

Question: Do non-experts demonstrate significant

agreement on a core set of attributes of familiar and

unfamiliar, complex and simple instances?

http://www.ebird.org/

http://www.ispot.org.uk/

https://www.researchgate.net/publication/50267698_Easier_citizen_science_is_better_Nature_4717336_37?el=1_x_8&enrichId=rgreq-b453c9e2-cb11-463b-a43b-637238696a5b&enrichSource=Y292ZXJQYWdlOzI3MTUzMDQzMDtBUzoxOTE0NDMwODk4NDYyNzhAMTQyMjY1NDk5Nzg3Ng==


https://www.researchgate.net/publication/242366361_Basic_Objects_in_Natural_Categories?el=1_x_8&enrichId=rgreq-b453c9e2-cb11-463b-a43b-637238696a5b&enrichSource=Y292ZXJQYWdlOzI3MTUzMDQzMDtBUzoxOTE0NDMwODk4NDYyNzhAMTQyMjY1NDk5Nzg3Ng==

https://www.researchgate.net/publication/265148897_Emerging_Ideas_About_Categories?el=1_x_8&enrichId=rgreq-b453c9e2-cb11-463b-a43b-637238696a5b&enrichSource=Y292ZXJQYWdlOzI3MTUzMDQzMDtBUzoxOTE0NDMwODk4NDYyNzhAMTQyMjY1NDk5Nzg3Ng==



Method

We conducted a study among potential citizen scientists.

Participants were 247 undergraduate business students

(141 female, 106 male) at a Canadian university. The

experiment was conducted in 8 sessions and the order of

stimuli was randomized between sessions.

Business students were chosen to ensure a low level of

expertise in biology, reflecting the intended context where

users are members of the general public. Low domain

expertise was verified using self-reported expertise

measures and more objective measures: 83% of

participants either strongly or somewhat disagreed (on a

5-point scale) with the statement that they are “experts” in

local wildlife (mean=1.90; s.d.=0.886). Two thirds of

participants (77%) had never taken any post-secondary

courses in biology. Finally, the low number of species-

level responses (presented below) is further evidence of

low expertise.

The stimuli were 24 full-color images of plants and

animals (all different biological species) native to the

geographic region in which the study was conducted. The

stimuli were selected by an ecology professor expert in

flora and fauna of the region. Species were chosen to

include some organisms believed to be familiar and

unfamiliar.

Participants were randomly assigned into one of two

study conditions. Those in the “Categories and Attributes”

condition (122 participants) were given a printed form

with two columns - one asking participants to name the

object on the image (using one or more words) and the

second asking them to list features that best describe the

object. In the “Attributes only” condition (125

participants), there was only one column asking

participants to list features that best describe the object.

Images were presented to participants in a random

sequence on a large screen in a classroom setting. Each

image was shown for 50 seconds, a time deemed

sufficient through a pre-test.

Responses were converted from paper to digital form by

one of the authors to ensure consistency. We aimed to

record verbatim the categories and attributes provided by

participants, following best practices set in similar

studies. When faced with illegible handwriting we

attempted to decipher handwriting but avoided making

interpretations and skipped unreadable entries. Complex

attributes were broken down into individual components

(e.g., “long yellow beak” was coded as “long beak” and

“yellow beak”), following Rosenberg and Jones (1972).

Consistent with psychology research (e.g., Tanaka and

Taylor 1991), attributes for the same species with clearly

similar meanings were grouped together (e.g., “horns,”

“antlers,” and “rack”).

Once categories and attributes were entered, we coded

categories as ether “basic level,” “species-genus level,” or

“other” and attributes as either “basic level,”

“superordinate to basic,” “subordinate to basic,” and

“other.” The species-genus level was determined based on

biological convention, while the basic level was adopted

from prior studies in cognitive psychology. A thorough

survey of cognitive literature failed to reveal an agreed-

upon basic-level for 6 out of the 24 species used as

stimuli (lung lichen, Old Man’s beard, coyote, chipmunk,

moose, and caribou).

The final data set contained 25,315 records, with 6,397

categories and 18,918 attributes. The total number of

unique attributes and categories was 1,673, with 264

categories and 1,409 attributes.

Results and Discussion

We first provide evidence that non-expert participants

generally do not prefer species/genus level to classify

instances and these responses are generally not as

accurate as more intuitive basic-level classes. To do this,

we analyze categories in the “Categories and Attributes”

condition. In this condition, 122 participants provided a

total of 3,737 categories (an average of 1.28 per image per

participant). We analyzed data for each image separately

across all participants.

As expected, participants prefer to classify using basic-

level categories and these classification tend to be more

accurate than when attempting to classify at species/genus

levels (see Table 1). The exceptions (i.e., American robin,

Blue Jay, Killer Whale) appear to be common organisms

that participants are frequently exposed to in nature or

through media.

Common

name

No of

BC

No of

SG

χ2

No of

BC vs. SG

Correct

BC

Correct

SG

Fisher’s

exact p-val.

Accuracy

of BC vs.

SG

Blue W. Teal 144 5 129.67*** 143 0 0.000

Mallard Duck 133 20 83.46*** 133 15 0.000

Spt. Sandpiper 112 2 106.14*** 112 0 0.000

Caspian Tern 111 2 105.14*** 111 0 0.000

Red fox 110 14 74.32*** 104 10 0.015

Labrador tea 108 4 96.57*** 108 0 0.000

G. Yellowlegs 108 1 105.04*** 107 0 0.018

Common Tern 107 3 98.33*** 107 0 0.000

Red squirrel 105 18 61.54*** 100 1 0.000

Sheep laurel 103 2 97.15*** 103 0 0.000

Atl. Salmon 100 25 45.00*** 100 0 0.000

Fireweed 94 26 38.53*** 94 1 0.000

Calypso orchid 92 12 61.54*** 91 0 0.000

Indian pipe 89 7 70.04*** 88 0 0.000

Amer. Robin 86 78 0.39 86 74 0.049

https://www.researchgate.net/publication/232449748_A_method_for_investigating_a_person's_implicit_personality_theory_Theodore_Dreiser's_view_of_people?el=1_x_8&enrichId=rgreq-b453c9e2-cb11-463b-a43b-637238696a5b&enrichSource=Y292ZXJQYWdlOzI3MTUzMDQzMDtBUzoxOTE0NDMwODk4NDYyNzhAMTQyMjY1NDk5Nzg3Ng==



Blue Jay 69 99 5.36** 69 98 1.000

Killer whale 54 88 8.14*** 48 86 0.054

False morel 34 0 N/A 22 0 N/A

TABLE 1. Number and accuracy of basic categories (BC) and species-genus categories (SG) (*** -sig. at 0.01 level; ** -sig. at 0.05 level)

These results confirm the operationalization of basic-level

as an intuitive class for the participants. This is critical in

testing the extent to which participants employ basic-level

attributes (e.g., can fly, has feathers for bird) versus

lower-level attributes (e.g., red breast). The greater the

number of sub-basic level attributes, the greater the extent

to which a conceptual model built on basic level omits all

information non-experts are able to provide. To

investigate these issues, the attributes (7,330) in the

Attributes-only condition for the 18 plants and animals

with an agreed-on basic level category were classified

into: sub-basic, basic (and superordinate), or other,

resulting in 6,429 sub-basic, 824 basic, and 77 other

attributes.

As expected, in contrast with the prevalence of basic level

categorization, there were significantly more sub-basic

attributes, with an average p-value approaching zero (see

Table 2). This suggests that including intuitive classes

(which tend to be general for non-experts) in conceptual

models prevents considerable number of attributes from

being captured.

Species Sub-basic Basic Diff: χ2

p-val

Other

Attr.

American Robin 362 35 0.000 3

Atlantic salmon 273 45 0.000 19

Blue Jay 397 51 0.000 5

Blue Winged Teal 350 76 0.000 13

Bog Labrador tea 266 3 0.000 5

Calypso orchid 358 3 0.000 3

Caspian Tern 460 47 0.000 4

Common Tern 435 41 0.000 3

False morel 238 9 0.000 1

Fireweed 302 3 0.000 7

Greater Yellowlegs 486 39 0.000 9

Indian pipe 342 6 0.000 3

Killer whale 325 54 0.000 9

Mallard Duck 421 74 0.000 2

Red fox 340 46 0.000 90

Red squirrel 362 105 0.000 36

Sheep laurel 319 4 0.000 3

Spotted Sandpiper 393 44 0.000 1

Table 2. Number of basic and subordinate attributes

We now evaluate the same hypothesis with respect to the

species-level classes. Although we demonstrate low

natural frequency of responses at that level, in principle it

may be possible to design a user interface that guides

users to species-level classes because they are valuable to

project sponsors. We argue, however, even these more

specific classes would fail to account for all attributes

non-experts report. Thus, the greater the number of

attributes not captured by species classes, the greater the

degree to which a conceptual model built on species-level

misses all information non-experts are able to provide.

We compare the attributes provided in the Attributes-only

condition with attributes from the species identification

guides considered standard for identifying at the species-

level (McClane 1978; Newcomb 1977; Peterson 2010;

Phillips 2005; Stokes et al. 2010). One of the authors

matched each attribute provided by participants with

attributes of the organism in the field guide. The

comparison was based on approximate similarity (e.g.,

gray underbelly and whitish underbelly were considered

equivalent), erring on the side of similarity (to increase

conservativeness of the test).

As predicted, while many attributes provided can be

inferred from classifying organisms at the species-level,

participants provide significantly greater than zero

number of attributes not accounted for by an applicable

species class (see Table 3). Among those, some are

instance attributes in that they describe a particular object

(e.g., standing on rock, looking sick, dorsal fin is

deformed); some describe features considered not salient

for identification at the species-level (e.g., blue eyes for

American Robin, black feet for Blue Jay); some attribute

are orthogonal to biological taxonomy (e.g., weed-like,

beautiful, scary). As in the case of basic-level categories,

modeling using more specific species-level classes fails to

account for a large number (49.0% of subordinate

attributes) of attributes freely provided by non-experts

when describing common and uncommon instances.

Common Name Sub-basic Species Non-species

American Robin 362 180 182

Atlantic Salmon 273 100 173

Blue jay 397 176 221

Blue W. Teal 350 156 194

Calypso Orchid 358 117 241

False morel 238 162 76

Fireweed 302 137 165

G. Yellowlegs 486 362 124

Indian Pipe 342 193 149

Mallard duck 421 238 183

Sheep Laurel 319 122 197

Sp. Sandpiper 393 221 172



TABLE 3. Number of subordinate, species-level and non-species-level attributes

Finally, we examine the question: “to what extent do non-

experts agree on the attributes of familiar and unfamiliar

phenomena?” Answering this question is important in

determining whether data collection based on instances

and attributes can generate consistent data. To address the

issue, we assessed agreement on 9,556 attributes provided

by 125 participants for all 24 animals in the Attributes-

only condition.

To evaluate agreement we employed the theoretically-

driven approach of model testing and compared two

hypothetical models. The null model represents the

absence of statistically significant agreement on a core set

of attributes among participants. Under the null model,

some attributes may be used by more than one participant

(e.g., due to limited domain vocabulary, rudimentary

beliefs about a domain, or by simple chance); yet there is

no “core” set of attributes that many participants agree on.

The corresponding distribution of attribute frequencies is

assumed to be uniform.

The alternative model represents the hypothesized

agreement among observers on a core attributes for the

observed instance. The alternative model should

demonstrate, with statistical significance, a non-uniform

distribution of attribute frequencies (e.g., Pareto

distribution). Similar to the null, the alternative model

may contain many idiosyncratic attributes with low

frequencies, signifying individual perceptions of attributes

of instances. Unlike the null model, however, it will also

reflect a small number of highly frequent attributes

reported by a large number of participants –

demonstrating strong agreement on a small number of

“key” attributes.

To test the two models, we computed maximum

likelihood-ratio G-test. Here, the expected values are

determined assuming the null model of uniform

distribution and are obtained by taking the sum of all

frequencies divided by the number of reported attributes.

For example, participants provided 400 total and 85

unique attributes describing American robin (see Figure

1). The expected value for each attribute is 4.71 (which is

less than 5, thereby justifying G-test technique). The

resulting G-statistic was computed to be 772.11 (p <

0.001 with 84 d.f.). This procedure was repeated for the

other 23 stimuli with similar results: all attribute

frequencies were found to be non-uniformly distributed.

The results were highly significantly with an average p-

value approaching zero.

Figure 1. Top attributes for American robin in Attributes-

only condition

These results indicate the attribute frequencies are not

uniformly distributed, demonstrating statistical agreement

among non-expert observers of familiar and unfamiliar,

feature-rich and feature-poor (perceptively, based on the

image) natural history instances in the study.

We proceeded with Kolmogorov-Smirnov and Anderson-

Darling goodness-of-fit statistics to fit data to common

distributions. While different distributions exhibited better

fit for different species, the general families of better

fitting distributions belonged to either power-law or

lognormal ones. Such distributions included Pareto, log-

gamma, Frechet, log-Pearson, and lognormal. For all 24

species the distributions of attribute frequencies were

skewed and leptokurtic (e.g., Figure 1). This means that,

for each species, participants reported a large number of

non-repeating attributes creating a long tail with a

compact set of frequently agreed-upon attributes.

DISCUSSION: CONCEPTUAL MODELING IS OBSOLETE. LONG LIVE CONCEPTUAL MODELING

An emerging conceptual modeling challenge is modeling

unpredictable and often unique user input. Addressing this

challenge is difficult using traditional abstraction-driven

modeling premised on a priori availability of “complete”

specification of the kinds of data users would be

contributing.

In this paper, we explored the possibility of omitting

conceptual modeling and storing data using flexible

databases, such as an instance-based database. Based on

the empirical evidence presented above, the instance-

based approach with no conceptual modeling appears to

meet the objectives of projects that engage distributed

heterogeneous audiences better than the two class-based

approaches (one based on intuitive and one based on

useful classes). The diversity of attributes provided by

non-experts makes it extremely difficult to a priori

specify classes capable of capturing these attributes. For

example, among reported attributes, some appear to be

applicable to a particular instance (e.g., deformed fin),

while some pertain to emotional evaluation of instances

(e.g., scary). These kinds of attributes are likely to be

unique to each situation and each person.



At the same time, the overall distribution of attributes

resembles a long-tail with agreement on the core set of

attributes and a large number of idiosyncratic ones. This

suggests that attributes reflect some underlying

regularities or shared perceptions of domain phenomena.

Hence, using these overlapping attributes, it may be

possible, for example, to infer species – something that

non-experts are generally not capable of.

It is also notable that many attributes provided (here,

51.1%, see Table 3) by non-experts overlap with those

established for species identification. At the same time, as

seen from the categorical responses (see Table 1)

participants fail to accurately classify at the species-level.

This means that non-experts supply attributes that can be

potentially used to identify instances at the species-level –

a task shown to be mostly unattainable when

classification is elicited directly.

Based on the evidence presented, there appears much

value in avoiding traditional class-based conceptual

modeling especially for IS aimed at managing distributed

heterogeneous data. Does this spell the end of conceptual

modeling in these settings and a decline in interest to

conceptual modeling? We argue that such a conclusion is

premature, but making conceptual modeling relevant

requires rethinking of its role in IS development. Below

we propose three promising approaches for future

research to enhance value of conceptual modeling.

First, conceptual modeling can be used as a sensitizing

tool rather than a formal specification that directly shapes

physical IS objects. For example, analysts can randomly

sample potential users (e.g., potential citizen scientists)

and ask to describe instances of interest (e.g., birds,

cosmic bodies, material assets) using attributes. These

attributes can then be analyzed to get an early glimpse

into what actual data may look like. This may reveal

potential data conflicts and suggest ways to handle them.

In this sense attributes become “thick descriptions” (as in

ethnography or case research) that permit communication

about issues in a domain with various stakeholders and

guide design choices. A similar idea, known as

contextualism, has been proposed by Potts and Hsi (1997)

and it also resonates with some aspects of the Soft

Systems Methodology (Checkland and Holwell 1998).

The contextualism in conceptual modeling opens a new

research stream aimed at developing, evaluating and

improving models as sensitizing tools.

The second major direction for research deals with the

issue of paradigmatic (e.g., ontological) assumptions that

underlie IS development. While flexible database

technology appears well-suited for IS implementations

several questions arise including (1) how to design

flexible data models and (2) how to choose an appropriate

model for a given project. For instance, any flexible data

model that stores data in a more or less structured manner

(e.g., in terms of instances) adopts (implicitly or

explicitly) ontological, epistemological, axiological and

other paradigmatic assumptions about what reality is

made of, what is valuable to capture, and how to best

capture pertinent aspects of reality. For example, the

instance-based data model follows philosophy of Mario

Bunge and cognitive theories and assumes that reality is

made of (unique) instances that possess properties

(Parsons and Wand 2000). Here we experimentally

demonstrate that embedding these assumptions in IS leads

to attainment of several desirable goals. The question

arises, however, whether different paradigmatic

assumptions are germane to different projects. For

example, if IS resides in the context characterized by

continuities rather than discrete instances (e.g., see

Mylopoulos 1998), should analysts specify a flexible data

model founded on these assumptions? Conceptual

modeling research has been engaged in rich and on-going

discourse on these issues (Guarino and Guizzardi 2006;

Hirschheim et al. 1995; March and Allen 2012;

Mylopoulos 1998; Wand and Weber 2006). Increased

interest in distributed heterogeneous data motivates

continued attention to paradigmatic assumptions in IS

development.

Third, conceptual modeling research can begin addressing

the issue of modeling under a hybrid abstraction-based/no

modeling assumptions. In practice most IS are likely to

belong to different points on the development continuum,

as some aspects of a system could remain relatively fixed

and amenable to abstraction-driven modeling. For

example legal, security and reporting considerations could

be embedded in software consistent with some fixed

convention rather than left open to judgment of individual

users. This raises questions about of how to integrate no

modeling paradigm with traditional abstraction-driven

modeling. Currently little is known about these issues and

much scope exists in research on appropriate balance

between different modeling approaches.

REFERENCES

1. Abiteboul, S. (1997) Querying semi-structured data,

Database Theory—ICDT'97, Delphi, Greece.

2. Angles, R. and Gutierrez, C. (2008) Survey of graph

database models, ACM Computing Surveys, 40, 1,

1:1-1:39.

3. Brynjolfsson, E., Hu, Y. J. and Simester, D. (2011)

Goodbye pareto principle, hello long tail: The effect

of search costs on the concentration of product sales,

Management Science, 57, 8, 1373-1386.

4. Bunge, M. (1977) Treatise on basic philosophy:

Ontology I: the furniture of the world, Reidel,

Boston, MA.

5. Checkland, P. and Holwell, S. (1998) Information,

systems, and information systems: making sense of

the field, John Wiley & Sons, Inc, Hoboken, NJ.




6. Chen, P. (1976) The entity-relationship model -

toward a unified view of data, ACM Transactions on

Database Systems, 1, 1, 9-36.

7. Coleman, D. J., Georgiadou, Y. and Labonte, J.

(2009) Volunteered Geographic Information: The

Nature and Motivation of Producers, International

Journal of Spatial Data Infrastructures Research, 4,

1, 332-358.

8. Doan, A., Ramakrishnan, R. and Halevy, A. Y.

(2011) Crowdsourcing systems on the World-Wide

Web, Communications of the ACM, 54, 4, 86-96.

9. Erickson, L., Petrick, I. and Trauth, E. (2012)

Hanging with the right crowd: Matching

crowdsourcing need to crowd characteristics, AMCIS

2012 Proceedings, .

10. Evermann, J. and Wand, Y. (2001) Towards

ontologically based semantics for UML constructs,

Conceptual Modeling—ER 2001, 354-367.

11. Fry, J. P. and Sibley, E. H. (1976) Evolution of data-

base management systems, ACM Computing Surveys

(CSUR), 8, 1, 7-42.

12. Goodchild, M. (2007) Citizens as sensors: the world

of volunteered geography, GeoJournal, 69, 4, 211-

221.

13. Greenspan, S. J., Mylopoulos, J. and Borgida, A.

(1982) Capturing more world knowledge in the

requirements specification, Proceedings of the 6th

International Conference on Software Engineering,

Tokyo, Japan.

14. Guarino, N. and Guizzardi, G. (2006) In the defense

of ontological foundations for conceptual modeling,

Scandinavian Journal of Information Systems, 18, 1,

115-126.

15. Hand, E. (2010) People power, Nature, 466, 7307,

685-687.

16. Hirschheim, R., Klein, H. K. and Lyytinen, K. (1995)

Information Systems Development and Data

Modeling: Conceptual and Philosophical

Foundations, Cambridge University Press,

Cambridge.

17. Kaldor, N. (1961) Capital Accumulation and

Economic Growth, F. A. Lutz and D. C. Hague

(eds.), The Theory of Capital, Macmillan, London.

18. Kauffman, R., Li, T. and Heck, E. V. (2010) Business

Network-Based Value Creation in Electronic

Commerce, International Journal of Electronic

Commerce, 15, 1, 113-144.

19. Leonardi, P. (2011) When flexible routines meet

flexible technologies: Affordance, constraint, and the

imbrication of human and material agencies, MIS

Quarterly, 35, 1, 147-167.

20. Lukyanenko, R. and Komiak, S. X. (2011) Designing

recommendation agents as the extension of individual

users: similarity and identification in web

personalization, International Conference on

Information Systems, Shanghai, China.

21. Lukyanenko, R. and Parsons, J. (2011a) Information

Loss in the Era of User-Generated Data, Pre-ICIS

SIG IQ, Shanghai, China.

22. Lukyanenko, R. and Parsons, J. (2011b) Rethinking

data quality as an outcome of conceptual modeling

choices, 16th International Conference on

Information Quality, Adelaide, Australia.

23. Lukyanenko, R., Parsons, J. and Wiersma, Y. (2011)

Citizen Science 2.0: Data Management Principles to

Harness the Power of the Crowd, Hemant Jain, Atish

Sinha and Padmal Vitharana (eds.), Service-Oriented

Perspectives in Design Science Research, Springer

Berlin / Heidelberg, .

24. Lyytinen, K. and Yoo, Y. (2002) Research

commentary: the next wave of nomadic computing,

Information Systems Research, 13, 4, 377-388.

25. Ma, Z. M. and Yan, L. (2008) A Literature Overview

of Fuzzy Database Modeling, Journal of Information

Science and Engineering, 24, 1, 189-202.

26. March, S. and Allen, G. (2012) Toward a social

ontology for conceptual modeling, 11th Symposium

on Research in Systems Analysis and Design,

Vancouver, Canada.

27. March, S., Hevner, A. and Ram, S. (2000) Research

commentary: An agenda for information technology

research in heterogeneous and distributed

environments, Information Systems Research, 11, 4,

327-341.

28. Mason, R. O. and Mitroff, I. I. (1973) A program for

research on management information systems,

Management Science, 19, 5, 475-487.

29. McClane, A. J. (1978) McClane's field guide to

freshwater fishes of North America, Holt Paperbacks,

New York, NY.

30. Mylopoulos, J. (1992) Conceptual Modeling and

Telos, P. Loucopoulos and R. Zicari (eds.),

Conceptual Modeling, Databases, and CASE: An

Integrated View of Information Systems

Development, John Wiley & Sons, Inc., New York,

NY.

31. Mylopoulos, J. (1998) Information modeling in the

time of the revolution, Information Systems, 23, 3–4,

127-155.

32. Mylopoulos, J. and Borgida, A. (2006) Properties of

Information Modeling Techniques for Information

Systems Engineering, Peter Bernus, Kai Mertins and

Günter Schmidt (eds.), Handbook on Architectures of

Information Systems, Springer Berlin Heidelberg, .

33. Newcomb, L. (1977) Newcomb's Wildflower Guide:

An Ingenious New Key System for Quick, Positive

Field Identification of the Wildflowers, Flowering

Shrubs and Vines of Northeastern and North Central



North America, Little, Brown and Company, New

York, NY.

34. Olivé, A. (2007) Conceptual modeling of information

systems, Springer, Berlin Heildelberg New York.

35. Panaccio, C. (2005) Nominalism and the Theory of

Concepts, H. Cohen and C. Lefebvre (eds.),

Handbook of Categorization in Cognitive, Elsevier

Science, Amsterdam.

36. Parsons, J. (2003) Data Modeling, Handbook on

Data Management in Information Systems, 49.

37. Parsons, J., Lukyanenko, R. and Wiersma, Y. (2011a)

Easier citizen science is better, Nature, 471, 7336,

37-37.

38. Parsons, J., Lukyanenko, R. and Wiersma, Y.

(2011b) Easier citizen science is better, Nature, 471,

7336, 37-37.

39. Parsons, J. and Wand, Y. (2000) Emancipating

Instances from the Tyranny of Classes in Information

Modeling, ACM Transactions on Database Systems,

25, 2, 228–268.

40. Peterson, R. T. (2010) Peterson field guide to birds of

eastern and central North America, Houghton Mifflin

Harcourt, New York, NY.

41. Phillips, R. (2005) Mushrooms & Other Fungi of

North America, Firefly Books, Richmond Hill, ON.

42. Pohl, K. (1994) The three dimensions of

requirements engineering: a framework and its

applications, Information Systems, 19, 3, 243-258.

43. Potts, C. and Hsi, I. (1997) Abstraction and context in

requirements engineering: toward a synthesis, Annals

of Software Engineering, 3, 1, 23-61.

44. Rosch, E., Mervis, C. B., Gray, W. D., Johnson, D.

M. and Boyesbraem, P. (1976) Basic Objects in

Natural Categories, Cognitive Psychology, 8, 3, 382-

439.

45. Rosenberg, S. and Jones, R. (1972) A method for

investigating and representing a person's implicit

theory of personality: Theodore Dreiser's view of

people, Journal of Personality and Social

Psychology, 22, 3, 372-386.

46. Smith, L. B. (2005) Emerging Ideas about

Categories, L. Gershkoff-Stowe and D. H. Rakison

(eds.), Building Object Categories in Developmental

Time, L. Erlbaum Associates, Mahwah, NJ.

47. Snäll, T., Kindvall, O., Nilsson, J. and Pärt, T. (2011)

Evaluating citizen-based presence data for bird

monitoring, Biological Conservation, 144, 2, 804.

48. Stokes, D. W., Stokes, L. Q. and Lehman, P. E.

(2010) The Stokes Field Guide to the Birds of North

America, Little, Brown, New York, NY.

49. Tanaka, J. W. and Taylor, M. (1991) Object

categories and expertise: Is the basic level in the eye

of the beholder? Cognitive Psychology, 23, 3, 457-

482.

50. Wand, Y. and Weber, R. (2006) On ontological

foundations of conceptual modeling: A response to

Wyssusek, Scandinavian Journal of Information

Systems, 18, 1, 127-138.

51. Wand, Y. and Weber, R. (2002) Research

commentary: Information systems and conceptual

modeling - A research agenda, Information Systems

Research, 13, 4, 363-376.

52. Zuboff, S. (1988) In The Age Of The Smart Machine:

The Future Of Work And Power, Basic Books, .

53. Zwass, V. (2010) Co-Creation: Toward a Taxonomy

and an Integrated Research Perspective, International

Journal of Electronic Commerce, 15, 1, 11-48.

Date post:	26-Nov-2023
Category:	Documents
Upload:	independent
View:	0 times
Download:	0 times

Is Traditional Conceptual Modeling Going to Become Obsolete?

Documents