Interacting with a Robot: A Guide Robot Understanding Natural Language Instructions

Talking with a robot: understanding humaninstructions to a guidance autonomous robot

Loreto Susperregi1, Ane Fernandez1, Izaskun Fernandez1, SantiagoFernandez1, and Inaki Maurtua1

Tekniker-IK4{lsusperregi,afernandez,ifernandez,sfernandez,imaurtua}@tekniker.es

Abstract. A guide robot requires to accurately and robustly capture,process and understand the human order and to generate the correct be-haviour for giving the correct answer to the asked task. To automaticallyperform that natural Human Robot interaction, we propose a solutioncombining natural language processing techniques, semantic technolo-gies and autonomous navigation techniques able to guide people in amuseum environment. We have evaluated the solution in a real scenariowith different people and the results are promising.

Keywords: human-robot natural interaction, autonomous navigation,natural language processing, knowledge semantic representation

1 Introduction

Service robots, now and in the near future, performing tasks as assistants, guides,tutors, or social companions in human populated settings such as museums, hos-pitals, etc. pose two main challenges: by the one hand, robots must be able toadapt to complex, unstructured environments and, on the other hand, robotsmust interact with humans. Currently, many researches in robotics are dealingwith different problems of motion of autonomous mobile robots in unknown envi-ronments. While operating in a completely unstructured environment, the robotmust navigate, detect and avoid obstacles, using its own sensors. A requirementfor natural Human Robot interaction is the robot’s ability to accurately androbustly capture, process and understand the human order and to generate thecorrect behaviour for giving the correct answer to the asked task.

Within this article, the service proposed by the mobile robot is to guide aperson to a destination interpreting the natural request he/she has asked to therobot. For that purpose, the robot should be able to interpret and understoodnatural order from humans, evaluate if it is able to carry out the requested order,and if it is plan the necessary trajectory to achieve the requested destination,and finally navigate autonomously through the environment skipping all theobstacles found in the itinerary until getting to the destiny.

2 Susperregi et al.

2 Related works

A wide variety of robots have been demonstrated to navigate successfully inhuman environments. A common theme is robotic museum tour guides. RHINO[1] was the first robotic tour guide and was followed soon after by MINERVA [2],which led tours through the Smithsonian Museum in 1998. Other robots havefollowed, including Mobots [3], Robox [4], and Jinny [5]. These robots have dealtwith environments that are often crowded with people

[1] developed SHRDLU, a system which processed natural-language instruc-tions and performed actions in a virtual environment. From this, researcherspushed forward trying to extend SHRDLU’s capabilities into real-world environ-ments and soon branched into tackling various sub-problems, including NLP androbotics systems. Research conducted on the robotics systems side has resultedin frameworks such as ROS, developed by [2], which has been used in several do-mains of modern robotics research. NLP research including robotic componentshas also lead to advancements. Notably, [4] and [3] have developed methods offollowing route instructions given in natural language. [7] has developed roboticsystems capable of inferring and acting upon implicit commands using knowledgedatabases. A similar knowledge representation is proposed in [6] using semanticrepresentation standards such as OWL for describing an indoor environment.

3 Proposed approach / System Overview

The approach proposed in this paper aims to create a guide robot that is ableto interact with humans, interpret and understood their requests, and when therequest is reachable the robot should plan the necessary trajectory to the desireddestiny and go along with the person/group up there. To automatically performall these task, we propose a prototype that combines natural language process-ing techniques, semantic technologies and autonomous navigation techniques inthree main modules:

– A knowledge-base module: where both, the environment and the actions willbe described and managed.

– Order interpretation module: given a (request) text this module extracts themain action and the destination on it.

– KTBOT: the module responsible for planning the trajectory and makingthe robot navigate autonomously up to the destiny, adapting the planing foravoiding the possible obstacles found in the trajectory.

The autonomous navigation module is integrated on the robot, but theknowledge-base management and the order interpretation module have been im-plemented as services. This way, we get the desired functionality independent tothe robot and its operating system. So the functionality is portable to any robotthat is able to connect to those services and get their response. In our approach,those services are exploited from ROS, the operating system of KTBOT.

Three main modules are explained in detail in the following subsections.

Talking with a robot 3

3.1 Knowledge-base

In our approach we try to describe a museum environment as much detailed aspossible: detailing the rooms, the objects in those rooms, and also the peoplewho used to be in those rooms. So we need some representation that give usthe oportunity not only to represent the objects but also the relations amongthem. For such a logic description we have used the semantic knowledge repre-sentation approach which have been successfully applied for similar task as wehave seen in the related works section. Concretely we have described the envi-ronment using the Resource Description Framework (RDF) and trying to reuseexisting and extended vocabularies like GEO1 or FOAF2. In Figure 1 you cansee a short example describing an object (tmm:maquinaderecubrimiento) anda person (:bdiaz) and then a room (tmm:salatribologia) and the relationsamong the person and the object with it.

tmm:maquinaderecubrimiento foaf:name "maquina de recubrimiento" .

:bdiaz a foaf:person ;

foaf:member tmm:tribologia;

foaf:name "beatriz diaz" ;

foaf:givenName "beatriz" ;

foaf:familyname "diaz" .

tmm:salatribologia a rooms:Room ;

dct:conformsTo <http://openorg.ecs.soton.ac.uk/wiki/Places> ;

foaf:name "sala de tribologia" ;

rooms:contains tmm:maquinaderecubrimiento ;

rooms:occupant :bdiaz ;

tmm:angle "270" ;

geo:lat "7.06" ;

geo:long "1.68" .

Fig. 1. Knowledge-base short destiny example in RDF (turtle)

According to this representation we have described all the experimental envi-ronment, more concretely the TMM area in Tekniker-IK4 (described in section 4)and also the information about actions that could happen in the described con-text. In order to make accessible this semantic repository we have used OWLIM[9], a high-performance Storage and Inference Layer (SAIL) for Sesame whichperforms reasoning. So loading all this data in OWLIM as RDF repository, weget an endpoint to easily access and infer information from the KB using the

1 http://www.w3.org/2003/01/geo/wgs84_pos\#2 http://xmlns.com/foaf/0.1/

4 Susperregi et al.

standard SPARQL3 query language for RDF. This endpoint will be the connec-tion point between the KB and the order interpretation module.

3.2 Order interpretation

Getting as input a Spanish human request, where the person indicates the de-sired action in natural language, the aim of this module is to analyse trying tounderstand what is exactly what the person wants and if it is possible to performit. The module is divided in two main steps:

– The first step relies on superficial information in the sense that it does nottake care about the sense of the words in the guidance context. Its onlypurpose is to extract an action (a verb) and a destination (a place, a personor an object) from the given order.

– The second step attempts to assess whether the action is feasible in thecontext of guidance, in other words if it is a move action and does not implyany further action such as break, which implies going to a place and destroysomething. And also if the extracted destiny is reachable.

For the first step we apply natural language processing techniques using Freel-ing, an open source language analysis tool suite [5]. With this tool we apply amorphosyntactic and dependency parsing to a set of 24 request examples from6 different people. This way we get the morphosyntactic information of everyelement and about the request itself. We manually revised the complete informa-tion and we identified the most frequent morphosyntactic patterns for extractingthe action verb and the destination from the sentence. After that we implementthose patterns as rules obtaining a rule-set that given a Freeling tagged sentenceis able to extract the desired information. In the evaluation section we will seehow do those rules perform.

For delivering if the action and the destiny are possible in the described guid-ance environment we exploit the knowledge-base information described above.On the one hand we verify if the extracted verb is considered a feasible actionconsulting the KB via the SPARQL endpoint. If no match is obtained we applya deeper analysis combining verbal information in Spanish WordNet (esWN) [8]and the KB actions information. Concretely, when the verb does not match witha KB action, we see if any of the actions described in the KB is related4 in esWNwith the extracted sentence and if it is we consider a valid action. Otherwise,the analysis is stopped and the module returns a message indicating that theindicated action is not possible to be carried out.

If the action is valid the module continues with the destiny analysis. Forthe destiny verification we consult the KB again using the SPARQL endpoint.This time we take care of the places and the relations among people and objectsdescribed in the KB for inferring which is the most suitable option in the KB

3 http://www.w3.org/TR/rdf-sparql-query/4 The relation should be a synonym, hiponim or hiperonim relation, and never an

antonim relation.


according to the extracted destiny, if there is someone. If there are more thanone possible options we apply a full text search ranking criteria and we selectthe option with the highest ranking. When no destiny match occurs, the orderinterpretation module sends a message denoting that the desired destiny is notpossible to reach.

In the case that there is a KB concept matching the requested destiny themodule extract the corresponding coordinates and angle to provide them to therobot. All this functionalities have been implemented as web-services. There isa master-service with which the robot will interact to process the given requestand obtain the necessary information to plan the navigation. The whole designof the order interpretation module is described in Figure 2.

Fig. 2. Order interpretation module design

3.3 KTBOT

Hardware Experiments in this paper were performed with a modified SEGWAYRMP200 mobile robot (Figure 1). The SEGWAY 4-wheeled differentiald drivebase is operated with ROS operating system. The robot is equipped with a xxxGHz, xxx MB RAM and multiple analog and digital inputs and outputs. The topspeed of the SEGWAY base is approximately 0.35 m/s. . A Hokuyo URG-04LX

6 Susperregi et al.

laser scan provides a measuring area of 240 angular degrees, from 60 to 4095 mm(ALDATU) in depth and 625 readings per scan. A second Hokuyo laser scanneris mounted on a tilting platform at xxx height, providing a 3D view of the areain front of the robot. In the present work, we use the base laser and the tiltinglaser to detect obstacles and navigate.

Navigation ROS is an open-source, meta-operating system widely used in therobotic community. It provides the services such as hardware abstraction, low-level device control, message-passing between processes, and package manage-ment. It also provides tools and libraries for obtaining, building, writing, andrunning code across multiple computers.

The navigation system is simple at a high level. It takes in data from sensors,odometry, and a navigation goal, and outputs velocity commands that are sent toa mobile base. The low-level architecture of this system, however, is complex andconsists of many components that must work together. The major components ofthe navigation system and the relationships between them are presented below.

The system architecture comprises several modules: Mapping and Localiza-tion , Costmap. An scheme of the architecture is presented in Fig. xxx.

1. Mapping and Localization

The navigation system is initialized with a static map.-¿ relacionarlo con el mapade conocimiento

1. Costmap

Because the robots we study are constrained to drive on flat ground, and cannot,for example, step or jump over obstructions, we assemble obstacle data into aplanar Costmap on which the planners operate. The Costmap is initialized withthe static map (if available), but updates as new sensor data comes in to maintainan up-to-date view of the robot’s local and global environment. Although theCostmap is used as a two-dimensional structure by the navigation system, itsunderlying representation of the world actually consists of the efficient three-dimensional Voxel Grid described above.

1. Global Planner

The global planner is given the obstacle and cost information contained in theCostmap, information from the robot’s localization system, and a goal in theworld. From these, it creates a high-level plan for the robot to follow to reachthe goal location. It is important that this planning process be efficient, so thatthe navigation system can run at a reasonable rate.

Therefore, the global planner used for this navigation system assumes thatthe robot is circular, uses an A* algorithm that plans directly in the configurationspace computed during obstacle inflation in the Costmap, and does not take intoaccount the dynamics or the kinematics of the robot

1. Local Planner


The local planner is responsible for generating velocity commands for themobile base that will safely move the robot towards a goal. The local planneris seeded with the plan produced by the global planner, and attempts to followit as closely as possible while taking into account the kinematics and dynamicsof the robot as well as the obstacle information stored in the Costmap. In orderto generate safe velocity commands, the local planner makes use of a techniqueknown as the Dynamic Window Approach (DWA) to forward simulate and selectamong potential commands based on a cost function [14].

The cost function combines distance to obstacles, distance to the path pro-duced by the global planner, and the speed at which the robot travels. Thebehavior of the robot can be changed drastically by setting the weights on eachcomponent of the cost function differently.

Fig. 3. KTBOT state diagram

4 Experiments

4.1 Setup

LORETO The experimental scenario located at Tekniker-IK4 is a real laboratorywhere machines and humans share the space in performing TMM activities. Thelaboratory area in Figure can be characterized as an xxx, etc.

Plano TMMFoto robot

8 Susperregi et al.

4.2 Evaluation

We have selected different people from different departments of Tekniker-IK4and different profiles to carry out the evaluation. We met with that peopleindividually, and in every try we explain how the guide robot works and what isthe experiment about: imagine that you are in a museum that you don’t know

and you have to ask help to a guide, in this experiment the robot, to find a

place,thing or a person. We give them a map of the place where room names,people used to be there and the most relevant objects’ names are representedand we ask them to make 5 different tries and at the end of each try to evaluateif the robot have guided them to the desired place/object/person.

At the end of each explanation we switch on the prototype, and the robotintroduce itself: Hello I am a guide robot, can I help you?. Since for this exper-iment no voice recognition was ready in the prototype, we transcript literallywhat people say after that manual intervention the automatic process starts.

We have carried out the experiments with 10 people, each one asking to therobot 5/6 different actions more or less. The Table 1 shows the experiment re-sults, describing for each participant how many tries have done (Request Num.),how many times the guide robot has successfully interpreted the request andguided the person to correct place (Correct), and when the behaviour of therobot has not been the expected one, we have divided the error cases accordingto the three main possible reasons: when the error was cause by the extractedverb (Action), by the destiny (Destiny) or by some problem during the naviga-tion (Navigation).

Request Num. Correct Action Destiny Navigation

P1 5 4 0 0 1

P2 8 4 3 1 0

P3 5 5 0 0 0

P4 6 4 0 2 0

P5 5 5 0 0 0

P6 5 4 1 0 0

P7 5 4 0 1 0

P8 5 2 1 2 0

P9 5 3 1 1 0

P10 4 4 0 0 0

Total 53 38 7 7 1

Table 1. Experiment results

So considering a success when the request is reachable (the action is posible,and the destiny is recognized) and the robot get the destination correctly, and afailure when a know destiny is not recognized, when a possible action is rejected,or when the correct destiny has not been achieved or the autnomous navigationhas been stopped, the guide robot presents a 71,7% of accuracy.


Analysing the errors we have observed that people with a technique profileand used to work with robots tends to be more exigent with the robot asking withmore complex request and proving an error in the robot behaviour. Requests likePues ahora llvame donde estn las impresoras que tengo imprimir el acta de la

reunin (So now take me to the printers because I have to print the minutes of the

meeting) or Quiero reunirme con Roberto Onate (I want to meet Roberto Onate)with verbs that implies more than a simple moving action. This two examplescould also be considered a successful try, because the main verbs are not a movingverbs, but since they are verbs that can occur in a guidance environment (imaginean enterprise guidance environment) we decided to consider them as failure.

In terms of navigation we can say that the robot is very stable, in the wholeexperimentation only once the navigation has been stopped and it has occurredbecause the robot has found too many obstacles (people) in the trajectory. Therest of the errors has been caused equally by verbs and destiny problems (46,7%each one).

We consider a failure when a know destiny is not recognized, when a possibleaction is rejected, or when the correct destiny has not been achieved or theautonomous navigation has been stopped.

At the end of the experiment we ask to each participant what has beenthe feeling and what they think about the robot guide behaviour. In generalevery body was happy with the results, and some of them comment that theguide is a bit slow, and that they expect that the robot will go inside the roomand no finish the trajectory in the destiny door. The last occurs because of therobot dimensions is not possible to go across the doors. That’s why we have alsoincluded in the prototype the robot to positioned facing the destiny door andnot to any other direction.

5 Conclusions and future works

We have presented a prototype for a guide robot that interact with humans innatural language. The prototype aims to understand a person request and helpit to achieving the desired destiny in a museum environment.

The prototype gets a person request and analysed it to interpret and un-derstand what is in. It tries to extract requested action and a destiny, that canbe a place an object or a person, and to translate it to a robot understandablecommand. For that purpose we combine natural language processing techniqueswith semantic knowledge representation and reasoning techniques.

For the autonomous navigation the prototype uses the ROS platform forplan the trajectory needed to reach the desired place, and to move along itautonomously.

We have evaluated the prototype in a real scenario with different profilepeople and we have proved that for being a first approximation, we have obtainedpromising results (71,7% of accuracy).

We have identified that for a more robust system we need to improve on theone hand, the action treatment, in terms of including more actions that can be

10 Susperregi et al.

asked in a guidance environment such as meet, speak or similar. And on theother hand to include semantic expansion to the destiny treatment as we havedone with verbs exploiting WordNet. And of course to include voice recognition,a work in progress.

As a more challenging future work, we are planning to revise the prototypeto get it working in a real museum environment. Actually we know the museumand the robot is able to navigate autonomously in that place. So we consider thatdefining the environment of the museum with the same representation strategyand with a more complete action representation in the knowledge-base we canget it working.

References

1. Winograd T.: Procedures as a representation for data in a computer program forunderstanding natural language. Technical report, MIT (1971)

2. Quigley M. and Gerkey B. and Conley K. and Faust J. and Foote T. and LeibsJ. and Berger E. and Wheeler R. and Ng A.: Ros: an open-source robot operatingsystem. Proceedings of the Open-Source Software workshop at the InternationalConference on Robotics and Automation (ICRA)(2009)

3. Kollar T. and Tellex S. and Roy D. and Roy N.: Toward understanding naturallanguage directions. International Conference on Human-Robot Interaction, pages259-266 (2010)

4. MacMahon M. and Stankiewicz B. and Kuipers B.: Walk the talk: Connecting lan-guage, knowledge, and action in route instructions. AAAI (2006)

5. Padro L. and Collado M. and Reese S. and Lloberes M. and Castellon I.: FreeL-ing 2.1: Five Years of Open-Source Language Processing Tools. Proceedings of 7thLanguage Resources and Evaluation Conference (LREC), ELRA (2010)

6. Wang T. and Chen Q.: Object semantic map representation for indoor mobile robots.Proceedings of International Conference on System Science and Engineering pages309-313 (2011)

7. : Tenorth M. and Kunze L. and Jain D. and Beetz M.: KNOWROB-MAP -knowledge-linked semantic object maps. Proceedings of 2010 IEEE-RAS Interna-tional Conference on Humanoid Robots (2010)

8. : Atserias J. and Rigau G. and Villarejo L.: Spanish WordNet 1.6: Porting theSpanish Wordnet across Princeton versions. Procedings of the 4th InternationalConference on Language Resources and Evaluation (LREC’04), p.161-164 (2004)

9. : Kiryakov A. and Ognyanov D. and Manov D.: OWLIM - A Pragmatic SemanticRepository for OWL. Procedings of WISE Workshops, p. 182-192 (2005)

Date post:	16-May-2023
Category:	Documents
Upload:	independent
View:	1 times
Download:	0 times

Interacting with a Robot: A Guide Robot Understanding Natural Language Instructions

Documents