Date post: | 26-Nov-2023 |
Category: |
Documents |
Upload: | independent |
View: | 0 times |
Download: | 0 times |
Lukyanenko and Parsons Is Traditional Conceptual Modeling Going to Become Obsolete?
Proceedings of the 12th AIS SIGSAND Symposium Provo, Utah, May 17-18, 2013 1
Is Traditional Conceptual Modeling Going to Become Obsolete?
Roman Lukyanenko
Memorial University of Newfoundland
Jeffrey Parsons
Memorial University of Newfoundland
ABSTRACT
Traditionally, the research and practice of conceptual
modeling assumed relevant information about a domain is
determined in advance to be used as input to design. The
increasing ubiquity of systems – characterized by
heterogeneous and transient users, customizable features,
and open or extensible data standards – challenges a
number of long-held propositions about conceptual
modeling. We raise the question whether conceptual
modeling as commonly understood is an impediment to
systems development and should be phased out. We
discuss the motivation for rethinking approaches to
conceptual modeling, consider traditional approaches to
conceptual modeling and provide empirical evidence of
the limitations of traditional conceptual modeling. We
then propose three directions for future conceptual
modeling research.
Keywords
Conceptual modeling, Information Systems Analysis and
Design, Ontology, Cognition
INTRODUCTION
Traditionally information systems (IS) were developed
and primarily used within organizational boundaries (e.g.,
Fry and Sibley 1976; Mason and Mitroff 1973; Zuboff
1988). IS development in this setting was user- and
consensus-driven: users (or stakeholders) define system
requirements, use and evaluate designed systems, while
close proximity with users makes it possible for analysts
and designers to gather requirements, verify their fidelity,
and resolve any conflicting perspectives before
implementation. As users were mostly corporate
employees or parities closely affiliated with the
organization (e.g., suppliers), any individual or divergent
perspectives were generally subsumed by goals and
perspectives of the organization.
The user/consensus-driven development underlies
prevailing approaches to conceptual modeling – a phase
of IS development aimed at “formally describing some
aspects of the physical and social world around us for the
purposes of understanding and communication”
(Mylopoulos 1992, p. 51). Conceptual modeling
traditionally results in specifications that capture relevant
knowledge about the application domain. This
specification then guides development by supporting
communication between developers and users, promoting
domain understanding and guiding design process (Wand
and Weber 2002).
The traditional modeling paradigm is increasingly
challenged as more organizations draw on knowledge
outside of organizational boundaries and become
interested in perspectives of individual users. Under these
premises, it may no longer be feasible to reach all
potential users and establish an agreed-upon specification
of a domain.
Here we pose a question of whether conceptual modeling
as commonly understood is becoming an impediment for
managing distributed heterogeneous information. We
present motivation for this research, discuss traditional
approaches to conceptual modeling and provide empirical
evidence of the limitations of traditional conceptual
modeling in distributed heterogeneous settings. We then
propose three directions for future conceptual modeling
research.
CHALLENGES TO TRADITIONAL MODELING PARADIGM
The interest in distributed heterogeneous information is
growing. Knowledge created outside of controlled
internal information production process is of increased
value to organizations. Such information, for example,
can better connect internal decision process with
information available tobusiness partners, customers, and
general public (Doan et al. 2011; Hand 2010; March et al.
2000; Zwass 2010).
Within organizations initiatives that support flexible
knowledge management, flexible routines, dynamic
sense-making, grass-roots innovation are on the rise
(Leonardi 2011).
Organizations are increasingly looking to understand
individual users (e.g., customers) and cater to their unique
and changing needs. First, the proliferation of mobile,
minituarized and ubiquitous computing exposes IS to
diverse and unpredictable situations (e.g., when scuba-
diving) and demands systems to be adaptive and flexible
(Lyytinen and Yoo 2002). Second, personalization of user
Lukyanenko and Parsons Is Traditional Conceptual Modeling Going to Become Obsolete?
Proceedings of the 12th AIS SIGSAND Symposium Provo, Utah, May 17-18, 2013 2
experience increases profitability by better matching
product and service offerings to individual users needs
(Brynjolfsson et al. 2011). Third, the rise in social
networking and peer-to-peer computing (e.g., Facebook,
Twitter, YouTube, Flickr) fuels demand for more flexible
and natural information exchange between people. The
social capital created through high user interaction is of
significant economic and social value {{753
Zwass,Vladimir 2010}}.
Of particular relevance is crowdsourcing that engages
users to work on sponsor-defined tasks. Many
corwdsourcing initiatives aim to capture unique
conceptualizations of diverse distributed audiences. This
is prevalent in a type of crowdsourcing known as citizen
science that harnesses crowds for scientific uses and
broadly encourages ingenuity, creativity, and divergent
thinking (Goodchild 2007; Hand 2010).
Heterogeneous and distributed information poses a
significant challenge to traditional conceptual modeling
that assumed that relevant aspects of reality were known
or knowable in advance. Below, we examine traditional
conceptual modeling in light of the emerging challenges.
Since the 1970s numerous conceptual modeling grammars
have been developed. The prevailing approaches to
conceptual modeling, such as Entity-Relationship (E-R),
UML class diagrams, involve specification of conceptual
entities (classes, entity types), attributes (or properties)
and relationships between entities (Chen 1976; Evermann
and Wand 2001; Greenspan et al. 1982). Constructs in
other modeling approaches may include roles, actors,
agents, goals, activities, frames or patterns (see
Mylopoulos 1998). Modeling grammars specify rules by
which data (information, knowledge) in a domain is
organized to support IS functions (e.g., state-tracking).
Generally, this is done by embedding structures
developed during conceptual modeling (e.g., collection of
predefined classes, frames, roles) in IS objects, such as
database schema, user interface, or code logic. IS use,
including data creation, maintenance and retrieval is then
mediated by these objects. Since predominantly
conceptual structures are abstract - in that they do not
represent concrete individual objects or events, but rather
generalized or stylized (Kaldor 1961, p. 178)
representations - the fundamental approach to conceptual
modeling is representation by abstraction. Abstraction-
driven conceptual modeling deliberately ignores some
aspects of reality capturing only relevant information
(where users, stakeholders indicate what is relevant). For
example, Olive (2007) content “a conceptual schema is
the definition of the general domain knowledge that the
information system needs to perform its functions;
therefore, the conceptual schema must include all the
required knowledge” (p. 29, emphasis added).
For example, a typical script made using the popular E-R
grammar may depict entity types, attributes of entity types
and relationship types with attributes. Entity types (e.g.,
student, customer, equipment) abstract from differences
among instances (e.g., a particular student, or a specific
customer), instead capturing perceived equivalence of
instances. Hence, many conceptual modeling grammars
consider instances (objects) to be members of their
classes: “[o]ne principle of conceptual modeling is that
domain objects are instances of entity types” (Olivé 2007,
p. 383). Abstraction-based modeling was deemed critical
to “organize the information base and guide its use,
making it easier to update or search it” (Mylopoulos and
Borgida 2006, p. 35).
Representation by abstraction presupposes that consensus
can be reached among users (stakeholders) on what is
relevant. This assumption was considered somewhat non-
problematic to the extent that development occurs in close
contact with system users and other key stakeholders.
Close contact with users provided an opportunity to
resolve conflicts in individual views and generated an
agreed-upon abstract conceptualization of a domain (Pohl
1994).
As we discussed earlier, the assumption that users can be
identified, reached and engaged in consensus building is
becoming inadequate in a growing number of cases.
Aside from the difficulty of identifying and reaching all
potential users in distributed and dynamic settings, many
potential users may lack domain expertise (e.g., consumer
products knowledge) and have unique views or
conceptualizations that are unstable and incongruent with
those of project sponsors and other users (Erickson et al.
2012; Lukyanenko et al. 2011). However, since a
consensus is no longer feasible, the resulting system may
be critically defective. For example, an IS representing a
domain as perceived by some users may marginalize, bias
or exclude possibly valuable conceptualizations of other
users (Lukyanenko and Parsons 2011b; Parsons et al.
2011b). A growing body of research is looking to address
the challenges of modeling information in heterogeneous
environment. Typically solutions involve modification of
abstraction-based modeling grammars (Ma and Yan 2008)
and are therefore not entirely free of the negative
consequences of the abstraction-driven models. As an
alternative, we examine whether it is more advantageous
to develop IS without modeling domains a priori.
Consider an IS development without conceptual
modeling. In contrast with difficulties of modeling
distributed heterogeneous information, it is becoming
increasingly possible to store such data. Since the
beginning of database management in 1950s, enhanced
computing capabilities coupled with conceptual
development led to liberation of modeling from physical
constraints (Fry and Sibley 1976; Parsons 2003).
Overtime the focus shifted to capturing greater domain
semantics. Notable data models with advanced semantic
support include instance-based (Parsons and Wand 2000),
graph (Angles and Gutierrez 2008), semistructured
(Abiteboul 1997), and fuzzy (Ma and Yan 2008) data
models. Leveraging advanced data modeling, data can be
Lukyanenko and Parsons Is Traditional Conceptual Modeling Going to Become Obsolete?
Proceedings of the 12th AIS SIGSAND Symposium Provo, Utah, May 17-18, 2013 3
less structured which requires very little or no modeling.
For example, using instance-based data model,
information can be collected without having to classify
relevant instances; information about instances can be
stored in terms of attributes (Parsons and Wand 2000).
Different users can supply different attributes for the same
instance. Failure to agree on classes, relationship types or
attributes is no longer problematic as both convergence
and divergence of views is accommodated: any relevant
attribute can be seamlessly captured. The attributes can be
then queried (e.g., as per on ad hoc needs) to select
instances stored based on classes of interest. Since classes
and other abstract constructs are not necessary before
implementing a system, conceptual modeling may not be
needed at the design phase (at least not for the purposes of
generating a database schema).
Indeed, the instance-based or other flexible solutions
appear to address the challenges of reaching consensus
and accommodating individual and unanticipated views
and uses. Critically these solutions permit to bypass a
major part of IS development – the creation of a formal
representation of knowledge in a domain. This
significantly simplifies systems analysis and does so in
the environment considered extremely problematic for
traditional analysis. Furthermore, the instance-based IS
appears to improve data quality (e.g., accuracy per unit of
data) and information yield (e.g., greater number of
instances stored) compared to more traditional (i.e., class-
based) systems (Lukyanenko and Parsons 2011a;
Lukyanenko and Parsons 2011b; Parsons et al. 2011a). 1
EXPERIMENT
To empirically evaluate the instance-based IS with no a
priori conceptual modeling, we designed a laboratory
experiment in the context of online citizen science. 2
Many popular citizen science applications epitomize
modeling challenges discussed above. These systems are
established primarily to serve the needs of scientists, but
the actual users or contributors (i.e., citizen scientists) are
ordinary people, often lacking subject matter expertise
and possessing diverse domain views (Coleman et al.
2009; Snäll et al. 2011). Imposing a particular view upon
content creators may focus (or bias) contributors to one
particular goal (e.g., species identification, classification
of galaxies), but fail to capture additional information
citizen scientists may wish to communicate.
Current approaches to citizen science follow traditional
modeling principles. Popular citizen science projects (e.g.,
www.eBird.org, www.iSpot.org.uk) involve users in
1 Both authors are developing a real IS artifact powered by the
instance-based data model in the citizen science domain, where
conceptual modeling focused on organizing knowledge about a
domain was virtually non-existent. 2 This experiment was also used to provide support for the
impact of class-based models on data accuracy; this issue is
beyond the scope of the current study.
positive identification of species or genera (e.g.,
American Robin). Species and genus are classification
levels with widely accepted scientific utility. In contrast,
the generally preferred level of classification for non-
experts is the basic level (Rosch et al. 1976). Unlike the
species level, the basic level (e.g., bird, fish, tree) tends to
be an intermediate taxonomic level (e.g., “bird” is a level
higher than “American Robin”, and lower than “animal”).
Species/genus-level classes represent useful classes in a
natural history application, while basic-level classes
operationalize intuitive classes natural to non-expert
users; therefore both are reasonable for constructing
abstraction-driven conceptual models of the natural
history citizen science IS. To contrast traditional
conceptual modeling with a “no modeling” alternative, we
explore an instance-based solution to citizen science
where sightings of organisms are reported in terms of
attributes of instances (Parsons et al. 2011b). Users are
thus not required to comply with a priori created models
of abstraction (e.g., classes).
Consistent with philosophy and cognition that postulate
uniqueness of individual instances and mental models of
instances (Bunge 1977; Panaccio 2005; Smith 2005), we
argue non-expert participants, if given the opportunity,
will provide substantial numbers of unique attributes.
Since abstractions such as classes are based on
commonalities of instances, they will be unable to
accommodate some of the attributes participants are
inclined to provide. Furthermore, as it may be difficult to
a priori anticipate the kinds of attributes that are salient
for different users, it is infeasible to choose classes that
will account for all attributes. We thus hypothesize:
Hypothesis: Non-experts will describe instances in terms
of attributes that cannot be captured by definitions of
classes (both intuitive and useful) used to model
instances.
While we predict that many attributes provided by
different users will be unique, it is also desirable to have
some degree of attribute agreement. Indeed, complete
disagreement (i.e., no overlap in attributes provided by
different participants) would mean that using attributes to
represent reality is unreliable. To broadly ensure the value
of collecting and storing attributes of instances, ideally
agreement on a core set of attributes should hold for both
familiar (e.g., instance of American robin) and unfamiliar
(e.g., instance of obscure mushroom) instances; both
simple and complex. Thus, we wish to investigate the
degree to which non-experts converge on the kinds of
attributes used to describe familiar and unfamiliar, as well
as complex and simple, instances. In view of this, we seek
to answer the following exploratory question:
Question: Do non-experts demonstrate significant
agreement on a core set of attributes of familiar and
unfamiliar, complex and simple instances?
Lukyanenko and Parsons Is Traditional Conceptual Modeling Going to Become Obsolete?
Proceedings of the 12th AIS SIGSAND Symposium Provo, Utah, May 17-18, 2013 4
Method
We conducted a study among potential citizen scientists.
Participants were 247 undergraduate business students
(141 female, 106 male) at a Canadian university. The
experiment was conducted in 8 sessions and the order of
stimuli was randomized between sessions.
Business students were chosen to ensure a low level of
expertise in biology, reflecting the intended context where
users are members of the general public. Low domain
expertise was verified using self-reported expertise
measures and more objective measures: 83% of
participants either strongly or somewhat disagreed (on a
5-point scale) with the statement that they are “experts” in
local wildlife (mean=1.90; s.d.=0.886). Two thirds of
participants (77%) had never taken any post-secondary
courses in biology. Finally, the low number of species-
level responses (presented below) is further evidence of
low expertise.
The stimuli were 24 full-color images of plants and
animals (all different biological species) native to the
geographic region in which the study was conducted. The
stimuli were selected by an ecology professor expert in
flora and fauna of the region. Species were chosen to
include some organisms believed to be familiar and
unfamiliar.
Participants were randomly assigned into one of two
study conditions. Those in the “Categories and Attributes”
condition (122 participants) were given a printed form
with two columns - one asking participants to name the
object on the image (using one or more words) and the
second asking them to list features that best describe the
object. In the “Attributes only” condition (125
participants), there was only one column asking
participants to list features that best describe the object.
Images were presented to participants in a random
sequence on a large screen in a classroom setting. Each
image was shown for 50 seconds, a time deemed
sufficient through a pre-test.
Responses were converted from paper to digital form by
one of the authors to ensure consistency. We aimed to
record verbatim the categories and attributes provided by
participants, following best practices set in similar
studies. When faced with illegible handwriting we
attempted to decipher handwriting but avoided making
interpretations and skipped unreadable entries. Complex
attributes were broken down into individual components
(e.g., “long yellow beak” was coded as “long beak” and
“yellow beak”), following Rosenberg and Jones (1972).
Consistent with psychology research (e.g., Tanaka and
Taylor 1991), attributes for the same species with clearly
similar meanings were grouped together (e.g., “horns,”
“antlers,” and “rack”).
Once categories and attributes were entered, we coded
categories as ether “basic level,” “species-genus level,” or
“other” and attributes as either “basic level,”
“superordinate to basic,” “subordinate to basic,” and
“other.” The species-genus level was determined based on
biological convention, while the basic level was adopted
from prior studies in cognitive psychology. A thorough
survey of cognitive literature failed to reveal an agreed-
upon basic-level for 6 out of the 24 species used as
stimuli (lung lichen, Old Man’s beard, coyote, chipmunk,
moose, and caribou).
The final data set contained 25,315 records, with 6,397
categories and 18,918 attributes. The total number of
unique attributes and categories was 1,673, with 264
categories and 1,409 attributes.
Results and Discussion
We first provide evidence that non-expert participants
generally do not prefer species/genus level to classify
instances and these responses are generally not as
accurate as more intuitive basic-level classes. To do this,
we analyze categories in the “Categories and Attributes”
condition. In this condition, 122 participants provided a
total of 3,737 categories (an average of 1.28 per image per
participant). We analyzed data for each image separately
across all participants.
As expected, participants prefer to classify using basic-
level categories and these classification tend to be more
accurate than when attempting to classify at species/genus
levels (see Table 1). The exceptions (i.e., American robin,
Blue Jay, Killer Whale) appear to be common organisms
that participants are frequently exposed to in nature or
through media.
Common
name
No of
BC
No of
SG
χ2
No of
BC vs. SG
Correct
BC
Correct
SG
Fisher’s
exact p-val.
Accuracy
of BC vs.
SG
Blue W. Teal 144 5 129.67*** 143 0 0.000
Mallard Duck 133 20 83.46*** 133 15 0.000
Spt. Sandpiper 112 2 106.14*** 112 0 0.000
Caspian Tern 111 2 105.14*** 111 0 0.000
Red fox 110 14 74.32*** 104 10 0.015
Labrador tea 108 4 96.57*** 108 0 0.000
G. Yellowlegs 108 1 105.04*** 107 0 0.018
Common Tern 107 3 98.33*** 107 0 0.000
Red squirrel 105 18 61.54*** 100 1 0.000
Sheep laurel 103 2 97.15*** 103 0 0.000
Atl. Salmon 100 25 45.00*** 100 0 0.000
Fireweed 94 26 38.53*** 94 1 0.000
Calypso orchid 92 12 61.54*** 91 0 0.000
Indian pipe 89 7 70.04*** 88 0 0.000
Amer. Robin 86 78 0.39 86 74 0.049
Lukyanenko and Parsons Is Traditional Conceptual Modeling Going to Become Obsolete?
Proceedings of the 12th AIS SIGSAND Symposium Provo, Utah, May 17-18, 2013 5
Blue Jay 69 99 5.36** 69 98 1.000
Killer whale 54 88 8.14*** 48 86 0.054
False morel 34 0 N/A 22 0 N/A
TABLE 1. Number and accuracy of basic categories (BC) and species-genus categories (SG) (*** -sig. at 0.01 level; ** -sig. at 0.05 level)
These results confirm the operationalization of basic-level
as an intuitive class for the participants. This is critical in
testing the extent to which participants employ basic-level
attributes (e.g., can fly, has feathers for bird) versus
lower-level attributes (e.g., red breast). The greater the
number of sub-basic level attributes, the greater the extent
to which a conceptual model built on basic level omits all
information non-experts are able to provide. To
investigate these issues, the attributes (7,330) in the
Attributes-only condition for the 18 plants and animals
with an agreed-on basic level category were classified
into: sub-basic, basic (and superordinate), or other,
resulting in 6,429 sub-basic, 824 basic, and 77 other
attributes.
As expected, in contrast with the prevalence of basic level
categorization, there were significantly more sub-basic
attributes, with an average p-value approaching zero (see
Table 2). This suggests that including intuitive classes
(which tend to be general for non-experts) in conceptual
models prevents considerable number of attributes from
being captured.
Species Sub-basic Basic Diff: χ2
p-val
Other
Attr.
American Robin 362 35 0.000 3
Atlantic salmon 273 45 0.000 19
Blue Jay 397 51 0.000 5
Blue Winged Teal 350 76 0.000 13
Bog Labrador tea 266 3 0.000 5
Calypso orchid 358 3 0.000 3
Caspian Tern 460 47 0.000 4
Common Tern 435 41 0.000 3
False morel 238 9 0.000 1
Fireweed 302 3 0.000 7
Greater Yellowlegs 486 39 0.000 9
Indian pipe 342 6 0.000 3
Killer whale 325 54 0.000 9
Mallard Duck 421 74 0.000 2
Red fox 340 46 0.000 90
Red squirrel 362 105 0.000 36
Sheep laurel 319 4 0.000 3
Spotted Sandpiper 393 44 0.000 1
Table 2. Number of basic and subordinate attributes
We now evaluate the same hypothesis with respect to the
species-level classes. Although we demonstrate low
natural frequency of responses at that level, in principle it
may be possible to design a user interface that guides
users to species-level classes because they are valuable to
project sponsors. We argue, however, even these more
specific classes would fail to account for all attributes
non-experts report. Thus, the greater the number of
attributes not captured by species classes, the greater the
degree to which a conceptual model built on species-level
misses all information non-experts are able to provide.
We compare the attributes provided in the Attributes-only
condition with attributes from the species identification
guides considered standard for identifying at the species-
level (McClane 1978; Newcomb 1977; Peterson 2010;
Phillips 2005; Stokes et al. 2010). One of the authors
matched each attribute provided by participants with
attributes of the organism in the field guide. The
comparison was based on approximate similarity (e.g.,
gray underbelly and whitish underbelly were considered
equivalent), erring on the side of similarity (to increase
conservativeness of the test).
As predicted, while many attributes provided can be
inferred from classifying organisms at the species-level,
participants provide significantly greater than zero
number of attributes not accounted for by an applicable
species class (see Table 3). Among those, some are
instance attributes in that they describe a particular object
(e.g., standing on rock, looking sick, dorsal fin is
deformed); some describe features considered not salient
for identification at the species-level (e.g., blue eyes for
American Robin, black feet for Blue Jay); some attribute
are orthogonal to biological taxonomy (e.g., weed-like,
beautiful, scary). As in the case of basic-level categories,
modeling using more specific species-level classes fails to
account for a large number (49.0% of subordinate
attributes) of attributes freely provided by non-experts
when describing common and uncommon instances.
Common Name Sub-basic Species Non-species
American Robin 362 180 182
Atlantic Salmon 273 100 173
Blue jay 397 176 221
Blue W. Teal 350 156 194
Calypso Orchid 358 117 241
False morel 238 162 76
Fireweed 302 137 165
G. Yellowlegs 486 362 124
Indian Pipe 342 193 149
Mallard duck 421 238 183
Sheep Laurel 319 122 197
Sp. Sandpiper 393 221 172
Lukyanenko and Parsons Is Traditional Conceptual Modeling Going to Become Obsolete?
Proceedings of the 12th AIS SIGSAND Symposium Provo, Utah, May 17-18, 2013 6
TABLE 3. Number of subordinate, species-level and non-species-level attributes
Finally, we examine the question: “to what extent do non-
experts agree on the attributes of familiar and unfamiliar
phenomena?” Answering this question is important in
determining whether data collection based on instances
and attributes can generate consistent data. To address the
issue, we assessed agreement on 9,556 attributes provided
by 125 participants for all 24 animals in the Attributes-
only condition.
To evaluate agreement we employed the theoretically-
driven approach of model testing and compared two
hypothetical models. The null model represents the
absence of statistically significant agreement on a core set
of attributes among participants. Under the null model,
some attributes may be used by more than one participant
(e.g., due to limited domain vocabulary, rudimentary
beliefs about a domain, or by simple chance); yet there is
no “core” set of attributes that many participants agree on.
The corresponding distribution of attribute frequencies is
assumed to be uniform.
The alternative model represents the hypothesized
agreement among observers on a core attributes for the
observed instance. The alternative model should
demonstrate, with statistical significance, a non-uniform
distribution of attribute frequencies (e.g., Pareto
distribution). Similar to the null, the alternative model
may contain many idiosyncratic attributes with low
frequencies, signifying individual perceptions of attributes
of instances. Unlike the null model, however, it will also
reflect a small number of highly frequent attributes
reported by a large number of participants –
demonstrating strong agreement on a small number of
“key” attributes.
To test the two models, we computed maximum
likelihood-ratio G-test. Here, the expected values are
determined assuming the null model of uniform
distribution and are obtained by taking the sum of all
frequencies divided by the number of reported attributes.
For example, participants provided 400 total and 85
unique attributes describing American robin (see Figure
1). The expected value for each attribute is 4.71 (which is
less than 5, thereby justifying G-test technique). The
resulting G-statistic was computed to be 772.11 (p <
0.001 with 84 d.f.). This procedure was repeated for the
other 23 stimuli with similar results: all attribute
frequencies were found to be non-uniformly distributed.
The results were highly significantly with an average p-
value approaching zero.
Figure 1. Top attributes for American robin in Attributes-
only condition
These results indicate the attribute frequencies are not
uniformly distributed, demonstrating statistical agreement
among non-expert observers of familiar and unfamiliar,
feature-rich and feature-poor (perceptively, based on the
image) natural history instances in the study.
We proceeded with Kolmogorov-Smirnov and Anderson-
Darling goodness-of-fit statistics to fit data to common
distributions. While different distributions exhibited better
fit for different species, the general families of better
fitting distributions belonged to either power-law or
lognormal ones. Such distributions included Pareto, log-
gamma, Frechet, log-Pearson, and lognormal. For all 24
species the distributions of attribute frequencies were
skewed and leptokurtic (e.g., Figure 1). This means that,
for each species, participants reported a large number of
non-repeating attributes creating a long tail with a
compact set of frequently agreed-upon attributes.
DISCUSSION: CONCEPTUAL MODELING IS OBSOLETE. LONG LIVE CONCEPTUAL MODELING
An emerging conceptual modeling challenge is modeling
unpredictable and often unique user input. Addressing this
challenge is difficult using traditional abstraction-driven
modeling premised on a priori availability of “complete”
specification of the kinds of data users would be
contributing.
In this paper, we explored the possibility of omitting
conceptual modeling and storing data using flexible
databases, such as an instance-based database. Based on
the empirical evidence presented above, the instance-
based approach with no conceptual modeling appears to
meet the objectives of projects that engage distributed
heterogeneous audiences better than the two class-based
approaches (one based on intuitive and one based on
useful classes). The diversity of attributes provided by
non-experts makes it extremely difficult to a priori
specify classes capable of capturing these attributes. For
example, among reported attributes, some appear to be
applicable to a particular instance (e.g., deformed fin),
while some pertain to emotional evaluation of instances
(e.g., scary). These kinds of attributes are likely to be
unique to each situation and each person.
Lukyanenko and Parsons Is Traditional Conceptual Modeling Going to Become Obsolete?
Proceedings of the 12th AIS SIGSAND Symposium Provo, Utah, May 17-18, 2013 7
At the same time, the overall distribution of attributes
resembles a long-tail with agreement on the core set of
attributes and a large number of idiosyncratic ones. This
suggests that attributes reflect some underlying
regularities or shared perceptions of domain phenomena.
Hence, using these overlapping attributes, it may be
possible, for example, to infer species – something that
non-experts are generally not capable of.
It is also notable that many attributes provided (here,
51.1%, see Table 3) by non-experts overlap with those
established for species identification. At the same time, as
seen from the categorical responses (see Table 1)
participants fail to accurately classify at the species-level.
This means that non-experts supply attributes that can be
potentially used to identify instances at the species-level –
a task shown to be mostly unattainable when
classification is elicited directly.
Based on the evidence presented, there appears much
value in avoiding traditional class-based conceptual
modeling especially for IS aimed at managing distributed
heterogeneous data. Does this spell the end of conceptual
modeling in these settings and a decline in interest to
conceptual modeling? We argue that such a conclusion is
premature, but making conceptual modeling relevant
requires rethinking of its role in IS development. Below
we propose three promising approaches for future
research to enhance value of conceptual modeling.
First, conceptual modeling can be used as a sensitizing
tool rather than a formal specification that directly shapes
physical IS objects. For example, analysts can randomly
sample potential users (e.g., potential citizen scientists)
and ask to describe instances of interest (e.g., birds,
cosmic bodies, material assets) using attributes. These
attributes can then be analyzed to get an early glimpse
into what actual data may look like. This may reveal
potential data conflicts and suggest ways to handle them.
In this sense attributes become “thick descriptions” (as in
ethnography or case research) that permit communication
about issues in a domain with various stakeholders and
guide design choices. A similar idea, known as
contextualism, has been proposed by Potts and Hsi (1997)
and it also resonates with some aspects of the Soft
Systems Methodology (Checkland and Holwell 1998).
The contextualism in conceptual modeling opens a new
research stream aimed at developing, evaluating and
improving models as sensitizing tools.
The second major direction for research deals with the
issue of paradigmatic (e.g., ontological) assumptions that
underlie IS development. While flexible database
technology appears well-suited for IS implementations
several questions arise including (1) how to design
flexible data models and (2) how to choose an appropriate
model for a given project. For instance, any flexible data
model that stores data in a more or less structured manner
(e.g., in terms of instances) adopts (implicitly or
explicitly) ontological, epistemological, axiological and
other paradigmatic assumptions about what reality is
made of, what is valuable to capture, and how to best
capture pertinent aspects of reality. For example, the
instance-based data model follows philosophy of Mario
Bunge and cognitive theories and assumes that reality is
made of (unique) instances that possess properties
(Parsons and Wand 2000). Here we experimentally
demonstrate that embedding these assumptions in IS leads
to attainment of several desirable goals. The question
arises, however, whether different paradigmatic
assumptions are germane to different projects. For
example, if IS resides in the context characterized by
continuities rather than discrete instances (e.g., see
Mylopoulos 1998), should analysts specify a flexible data
model founded on these assumptions? Conceptual
modeling research has been engaged in rich and on-going
discourse on these issues (Guarino and Guizzardi 2006;
Hirschheim et al. 1995; March and Allen 2012;
Mylopoulos 1998; Wand and Weber 2006). Increased
interest in distributed heterogeneous data motivates
continued attention to paradigmatic assumptions in IS
development.
Third, conceptual modeling research can begin addressing
the issue of modeling under a hybrid abstraction-based/no
modeling assumptions. In practice most IS are likely to
belong to different points on the development continuum,
as some aspects of a system could remain relatively fixed
and amenable to abstraction-driven modeling. For
example legal, security and reporting considerations could
be embedded in software consistent with some fixed
convention rather than left open to judgment of individual
users. This raises questions about of how to integrate no
modeling paradigm with traditional abstraction-driven
modeling. Currently little is known about these issues and
much scope exists in research on appropriate balance
between different modeling approaches.
REFERENCES
1. Abiteboul, S. (1997) Querying semi-structured data,
Database Theory—ICDT'97, Delphi, Greece.
2. Angles, R. and Gutierrez, C. (2008) Survey of graph
database models, ACM Computing Surveys, 40, 1,
1:1-1:39.
3. Brynjolfsson, E., Hu, Y. J. and Simester, D. (2011)
Goodbye pareto principle, hello long tail: The effect
of search costs on the concentration of product sales,
Management Science, 57, 8, 1373-1386.
4. Bunge, M. (1977) Treatise on basic philosophy:
Ontology I: the furniture of the world, Reidel,
Boston, MA.
5. Checkland, P. and Holwell, S. (1998) Information,
systems, and information systems: making sense of
the field, John Wiley & Sons, Inc, Hoboken, NJ.
Lukyanenko and Parsons Is Traditional Conceptual Modeling Going to Become Obsolete?
Proceedings of the 12th AIS SIGSAND Symposium Provo, Utah, May 17-18, 2013 8
6. Chen, P. (1976) The entity-relationship model -
toward a unified view of data, ACM Transactions on
Database Systems, 1, 1, 9-36.
7. Coleman, D. J., Georgiadou, Y. and Labonte, J.
(2009) Volunteered Geographic Information: The
Nature and Motivation of Producers, International
Journal of Spatial Data Infrastructures Research, 4,
1, 332-358.
8. Doan, A., Ramakrishnan, R. and Halevy, A. Y.
(2011) Crowdsourcing systems on the World-Wide
Web, Communications of the ACM, 54, 4, 86-96.
9. Erickson, L., Petrick, I. and Trauth, E. (2012)
Hanging with the right crowd: Matching
crowdsourcing need to crowd characteristics, AMCIS
2012 Proceedings, .
10. Evermann, J. and Wand, Y. (2001) Towards
ontologically based semantics for UML constructs,
Conceptual Modeling—ER 2001, 354-367.
11. Fry, J. P. and Sibley, E. H. (1976) Evolution of data-
base management systems, ACM Computing Surveys
(CSUR), 8, 1, 7-42.
12. Goodchild, M. (2007) Citizens as sensors: the world
of volunteered geography, GeoJournal, 69, 4, 211-
221.
13. Greenspan, S. J., Mylopoulos, J. and Borgida, A.
(1982) Capturing more world knowledge in the
requirements specification, Proceedings of the 6th
International Conference on Software Engineering,
Tokyo, Japan.
14. Guarino, N. and Guizzardi, G. (2006) In the defense
of ontological foundations for conceptual modeling,
Scandinavian Journal of Information Systems, 18, 1,
115-126.
15. Hand, E. (2010) People power, Nature, 466, 7307,
685-687.
16. Hirschheim, R., Klein, H. K. and Lyytinen, K. (1995)
Information Systems Development and Data
Modeling: Conceptual and Philosophical
Foundations, Cambridge University Press,
Cambridge.
17. Kaldor, N. (1961) Capital Accumulation and
Economic Growth, F. A. Lutz and D. C. Hague
(eds.), The Theory of Capital, Macmillan, London.
18. Kauffman, R., Li, T. and Heck, E. V. (2010) Business
Network-Based Value Creation in Electronic
Commerce, International Journal of Electronic
Commerce, 15, 1, 113-144.
19. Leonardi, P. (2011) When flexible routines meet
flexible technologies: Affordance, constraint, and the
imbrication of human and material agencies, MIS
Quarterly, 35, 1, 147-167.
20. Lukyanenko, R. and Komiak, S. X. (2011) Designing
recommendation agents as the extension of individual
users: similarity and identification in web
personalization, International Conference on
Information Systems, Shanghai, China.
21. Lukyanenko, R. and Parsons, J. (2011a) Information
Loss in the Era of User-Generated Data, Pre-ICIS
SIG IQ, Shanghai, China.
22. Lukyanenko, R. and Parsons, J. (2011b) Rethinking
data quality as an outcome of conceptual modeling
choices, 16th International Conference on
Information Quality, Adelaide, Australia.
23. Lukyanenko, R., Parsons, J. and Wiersma, Y. (2011)
Citizen Science 2.0: Data Management Principles to
Harness the Power of the Crowd, Hemant Jain, Atish
Sinha and Padmal Vitharana (eds.), Service-Oriented
Perspectives in Design Science Research, Springer
Berlin / Heidelberg, .
24. Lyytinen, K. and Yoo, Y. (2002) Research
commentary: the next wave of nomadic computing,
Information Systems Research, 13, 4, 377-388.
25. Ma, Z. M. and Yan, L. (2008) A Literature Overview
of Fuzzy Database Modeling, Journal of Information
Science and Engineering, 24, 1, 189-202.
26. March, S. and Allen, G. (2012) Toward a social
ontology for conceptual modeling, 11th Symposium
on Research in Systems Analysis and Design,
Vancouver, Canada.
27. March, S., Hevner, A. and Ram, S. (2000) Research
commentary: An agenda for information technology
research in heterogeneous and distributed
environments, Information Systems Research, 11, 4,
327-341.
28. Mason, R. O. and Mitroff, I. I. (1973) A program for
research on management information systems,
Management Science, 19, 5, 475-487.
29. McClane, A. J. (1978) McClane's field guide to
freshwater fishes of North America, Holt Paperbacks,
New York, NY.
30. Mylopoulos, J. (1992) Conceptual Modeling and
Telos, P. Loucopoulos and R. Zicari (eds.),
Conceptual Modeling, Databases, and CASE: An
Integrated View of Information Systems
Development, John Wiley & Sons, Inc., New York,
NY.
31. Mylopoulos, J. (1998) Information modeling in the
time of the revolution, Information Systems, 23, 3–4,
127-155.
32. Mylopoulos, J. and Borgida, A. (2006) Properties of
Information Modeling Techniques for Information
Systems Engineering, Peter Bernus, Kai Mertins and
Günter Schmidt (eds.), Handbook on Architectures of
Information Systems, Springer Berlin Heidelberg, .
33. Newcomb, L. (1977) Newcomb's Wildflower Guide:
An Ingenious New Key System for Quick, Positive
Field Identification of the Wildflowers, Flowering
Shrubs and Vines of Northeastern and North Central
Lukyanenko and Parsons Is Traditional Conceptual Modeling Going to Become Obsolete?
Proceedings of the 12th AIS SIGSAND Symposium Provo, Utah, May 17-18, 2013 9
North America, Little, Brown and Company, New
York, NY.
34. Olivé, A. (2007) Conceptual modeling of information
systems, Springer, Berlin Heildelberg New York.
35. Panaccio, C. (2005) Nominalism and the Theory of
Concepts, H. Cohen and C. Lefebvre (eds.),
Handbook of Categorization in Cognitive, Elsevier
Science, Amsterdam.
36. Parsons, J. (2003) Data Modeling, Handbook on
Data Management in Information Systems, 49.
37. Parsons, J., Lukyanenko, R. and Wiersma, Y. (2011a)
Easier citizen science is better, Nature, 471, 7336,
37-37.
38. Parsons, J., Lukyanenko, R. and Wiersma, Y.
(2011b) Easier citizen science is better, Nature, 471,
7336, 37-37.
39. Parsons, J. and Wand, Y. (2000) Emancipating
Instances from the Tyranny of Classes in Information
Modeling, ACM Transactions on Database Systems,
25, 2, 228–268.
40. Peterson, R. T. (2010) Peterson field guide to birds of
eastern and central North America, Houghton Mifflin
Harcourt, New York, NY.
41. Phillips, R. (2005) Mushrooms & Other Fungi of
North America, Firefly Books, Richmond Hill, ON.
42. Pohl, K. (1994) The three dimensions of
requirements engineering: a framework and its
applications, Information Systems, 19, 3, 243-258.
43. Potts, C. and Hsi, I. (1997) Abstraction and context in
requirements engineering: toward a synthesis, Annals
of Software Engineering, 3, 1, 23-61.
44. Rosch, E., Mervis, C. B., Gray, W. D., Johnson, D.
M. and Boyesbraem, P. (1976) Basic Objects in
Natural Categories, Cognitive Psychology, 8, 3, 382-
439.
45. Rosenberg, S. and Jones, R. (1972) A method for
investigating and representing a person's implicit
theory of personality: Theodore Dreiser's view of
people, Journal of Personality and Social
Psychology, 22, 3, 372-386.
46. Smith, L. B. (2005) Emerging Ideas about
Categories, L. Gershkoff-Stowe and D. H. Rakison
(eds.), Building Object Categories in Developmental
Time, L. Erlbaum Associates, Mahwah, NJ.
47. Snäll, T., Kindvall, O., Nilsson, J. and Pärt, T. (2011)
Evaluating citizen-based presence data for bird
monitoring, Biological Conservation, 144, 2, 804.
48. Stokes, D. W., Stokes, L. Q. and Lehman, P. E.
(2010) The Stokes Field Guide to the Birds of North
America, Little, Brown, New York, NY.
49. Tanaka, J. W. and Taylor, M. (1991) Object
categories and expertise: Is the basic level in the eye
of the beholder? Cognitive Psychology, 23, 3, 457-
482.
50. Wand, Y. and Weber, R. (2006) On ontological
foundations of conceptual modeling: A response to
Wyssusek, Scandinavian Journal of Information
Systems, 18, 1, 127-138.
51. Wand, Y. and Weber, R. (2002) Research
commentary: Information systems and conceptual
modeling - A research agenda, Information Systems
Research, 13, 4, 363-376.
52. Zuboff, S. (1988) In The Age Of The Smart Machine:
The Future Of Work And Power, Basic Books, .
53. Zwass, V. (2010) Co-Creation: Toward a Taxonomy
and an Integrated Research Perspective, International
Journal of Electronic Commerce, 15, 1, 11-48.