| 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595596597598599600601602603604605606607608609610611612613614615616617618619620621622623624625626627628629630631632633634635636637638639640641642643644645646647648649650651652653654655656657658659660661662663664665666667668669670671672673674675676677678679680681682683684685686687688689690691692693694695696697698699700701702703704705706707708709 |
- \documentclass[twoside,openright]{uva-bachelor-thesis}
- \usepackage[english]{babel}
- \usepackage[utf8]{inputenc}
- \usepackage{hyperref,graphicx,tikz,subfigure,float,lipsum}
- % Link colors
- \hypersetup{colorlinks=true,linkcolor=black,urlcolor=blue,citecolor=DarkGreen}
- % Title Page
- \title{A generic architecture for gesture-based interaction}
- \author{Taddeüs Kroes}
- \supervisors{Dr. Robert G. Belleman (UvA)}
- \signedby{Dr. Robert G. Belleman (UvA)}
- \begin{document}
- % Title page
- \maketitle
- \begin{abstract}
- % TODO
- \end{abstract}
- % Set paragraph indentation
- \parindent 0pt
- \parskip 1.5ex plus 0.5ex minus 0.2ex
- % Table of content on separate page
- \tableofcontents
- \chapter{Introduction}
- \label{chapter:introduction}
- Surface-touch devices have evolved from pen-based tablets to single-touch
- trackpads, to multi-touch devices like smartphones and tablets. Multi-touch
- devices enable a user to interact with software using hand gestures, making the
- interaction more expressive and intuitive. These gestures are more complex than
- primitive ``click'' or ``tap'' events that are used by single-touch devices.
- Some examples of more complex gestures are ``pinch''\footnote{A ``pinch''
- gesture is formed by performing a pinching movement with multiple fingers on a
- multi-touch surface. Pinch gestures are often used to zoom in or out on an
- object.} and ``flick''\footnote{A ``flick'' gesture is the act of grabbing an
- object and throwing it in a direction on a touch surface, giving it momentum to
- move for some time after the hand releases the surface.} gestures.
- The complexity of gestures is not limited to navigation in smartphones. Some
- multi-touch devices are already capable of recognizing objects touching the
- screen \cite[Microsoft Surface]{mssurface}. In the near future, touch screens
- will possibly be extended or even replaced with in-air interaction (Microsoft's
- Kinect \cite{kinect} and the Leap \cite{leap}).
- The interaction devices mentioned above generate primitive events. In the case
- of surface-touch devices, these are \emph{down}, \emph{move} and \emph{up}
- events. Application programmers who want to incorporate complex, intuitive
- gestures in their application face the challenge of interpreting these
- primitive events as gestures. With the increasing complexity of gestures, the
- complexity of the logic required to detect these gestures increases as well.
- This challenge limits, or even deters the application developer to use complex
- gestures in an application.
- The main question in this research project is whether a generic architecture
- for the detection of complex interaction gestures can be designed, with the
- capability of managing the complexity of gesture detection logic. The ultimate
- goal would be to create an implementation of this architecture that can be
- extended to support a wide range of complex gestures. With the existence of
- such an implementation, application developers do not need to reinvent gesture
- detection for every new gesture-based application.
- Application frameworks for surface-touch devices, such as Nokia's Qt \cite{qt},
- do already include the detection of commonly used gestures like \emph{pinch}
- gestures. However, this detection logic is dependent on the application
- framework. Consequently, an application developer who wants to use multi-touch
- interaction in an application is forced to use an application framework that
- includes support for multi-touch gestures. Moreover, the set of supported
- gestures is limited by the application framework of choice. To incorporate a
- custom event in an application, the application developer needs to extend the
- framework. This requires extensive knowledge of the framework's architecture.
- Also, if the same gesture is needed in another application that is based on
- another framework, the detection logic has to be translated for use in that
- framework. Nevertheless, application frameworks are a necessity when it comes
- to fast, cross-platform development. A generic architecture design should aim
- to be compatible with existing frameworks, and provide a way to detect and
- extend gestures independent of the framework.
- Application frameworks are written in a specific programming language. To
- support multiple frameworks and programming languages, the architecture should
- be accessible for applications using a language-independent method of
- communication. This intention leads towards the concept of a dedicated gesture
- detection application that serves gestures to multiple applications at the same
- time.
- The scope of this thesis is limited to the detection of gestures on multi-touch
- surface devices. It presents a design for a generic gesture detection
- architecture for use in multi-touch based applications. A reference
- implementation of this design is used in some test case applications, whose
- goal is to test the effectiveness of the design and detect its shortcomings.
- \section{Structure of this document}
- % TODO: pas als thesis af is
- \chapter{Related work}
- \section{Gesture and Activity Recognition Toolkit}
- The Gesture and Activity Recognition Toolkit (GART) \cite{GART} is a
- toolkit for the development of gesture-based applications. The toolkit
- states that the best way to classify gestures is to use machine learning.
- The programmer trains a program to recognize using the machine learning
- library from the toolkit. The toolkit contains a callback mechanism that
- the programmer uses to execute custom code when a gesture is recognized.
- Though multi-touch input is not directly supported by the toolkit, the
- level of abstraction does allow for it to be implemented in the form of a
- ``touch'' sensor.
- The reason to use machine learning is the statement that gesture detection
- ``is likely to become increasingly complex and unmanageable'' when using a
- set of predefined rules to detect whether some sensor input can be seen as
- a specific gesture. This statement is not necessarily true. If the
- programmer is given a way to separate the detection of different types of
- gestures and flexibility in rule definitions, over-complexity can be
- avoided.
- \section{Gesture recognition implementation for Windows 7}
- The online article \cite{win7touch} presents a Windows 7 application,
- written in Microsofts .NET. The application shows detected gestures in a
- canvas. Gesture trackers keep track of stylus locations to detect specific
- gestures. The event types required to track a touch stylus are ``stylus
- down'', ``stylus move'' and ``stylus up'' events. A
- \texttt{GestureTrackerManager} object dispatches these events to gesture
- trackers. The application supports a limited number of pre-defined
- gestures.
- An important observation in this application is that different gestures are
- detected by different gesture trackers, thus separating gesture detection
- code into maintainable parts.
- \section{Analysis of related work}
- The simple Processing implementation of multi-touch events provides most of
- the functionality that can be found in existing multi-touch applications.
- In fact, many applications for mobile phones and tablets only use tap and
- scroll events. For this category of applications, using machine learning
- seems excessive. Though the representation of a gesture using a feature
- vector in a machine learning algorithm is a generic and formal way to
- define a gesture, a programmer-friendly architecture should also support
- simple, ``hard-coded'' detection code. A way to separate different pieces
- of gesture detection code, thus keeping a code library manageable and
- extendable, is to user different gesture trackers.
- \chapter{Design}
- \label{chapter:design}
- % Diagrams are defined in a separate file
- \input{data/diagrams}
- \section{Introduction}
- % TODO: rewrite intro?
- This chapter describes the realization of a design for the generic
- multi-touch gesture detection architecture. The chapter represents the
- architecture as a diagram of relations between different components.
- Sections \ref{sec:driver-support} to \ref{sec:daemon} define requirements
- for the architecture, and extend the diagram with components that meet
- these requirements. Section \ref{sec:example} describes an example usage of
- the architecture in an application.
- The input of the architecture comes from a multi-touch device driver.
- The task of the architecture is to translate this input to multi-touch
- gestures that are used by an application, as illustrated in figure
- \ref{fig:basicdiagram}. In the course of this chapter, the diagram is
- extended with the different components of the architecture.
- \basicdiagram{A diagram showing the position of the architecture
- relative to the device driver and a multi-touch application. The input
- of the architecture is given by a touch device driver. This output is
- translated to complex interaction gestures and passed to the
- application that is using the architecture.}
- \section{Supporting multiple drivers}
- \label{sec:driver-support}
- The TUIO protocol \cite{TUIO} is an example of a driver that can be used by
- multi-touch devices. TUIO uses ALIVE- and SET-messages to communicate
- low-level touch events (see appendix \ref{app:tuio} for more details).
- These messages are specific to the API of the TUIO protocol. Other drivers
- may use very different messages types. To support more than one driver in
- the architecture, there must be some translation from driver-specific
- messages to a common format for primitive touch events. After all, the
- gesture detection logic in a ``generic'' architecture should not be
- implemented based on driver-specific messages. The event types in this
- format should be chosen so that multiple drivers can trigger the same
- events. If each supported driver would add its own set of event types to
- the common format, it the purpose of being ``common'' would be defeated.
- A minimal expectation for a touch device driver is that it detects simple
- touch points, with a ``point'' being an object at an $(x, y)$ position on
- the touch surface. This yields a basic set of events: $\{point\_down,
- point\_move, point\_up\}$.
- The TUIO protocol supports fiducials\footnote{A fiducial is a pattern used
- by some touch devices to identify objects.}, which also have a rotational
- property. This results in a more extended set: $\{point\_down, point\_move,
- point\_up, object\_down, object\_move, object\_up,\\ object\_rotate\}$.
- Due to their generic nature, the use of these events is not limited to the
- TUIO protocol. Another driver that can keep apart rotated objects from
- simple touch points could also trigger them.
- The component that translates driver-specific messages to common events,
- will be called the \emph{event driver}. The event driver runs in a loop,
- receiving and analyzing driver messages. When a sequence of messages is
- analyzed as an event, the event driver delegates the event to other
- components in the architecture for translation to gestures. This
- communication flow is illustrated in figure \ref{fig:driverdiagram}.
- \driverdiagram
- Support for a touch driver can be added by adding an event driver
- implementation. The choice of event driver implementation that is used in an
- application is dependent on the driver support of the touch device being
- used.
- Because driver implementations have a common output format in the form of
- events, multiple event drivers can run at the same time (see figure
- \ref{fig:multipledrivers}).
- \multipledriversdiagram
- \section{Restricting events to a screen area}
- \label{sec:restricting-gestures}
- % TODO: in introduction: gestures zijn opgebouwd uit meerdere primitieven
- Touch input devices are unaware of the graphical input widgets rendered on
- screen and therefore generate events that simply identify the screen
- location at which an event takes place. In order to be able to direct a
- gesture to a particular widget on screen, an application programmer must
- restrict the occurrence of a gesture to the area of the screen covered by
- that widget. An important question is if the architecture should offer a
- solution to this problem, or leave it to the application developer to
- assign gestures to a widget.
- The latter case generates a problem when a gesture must be able to occur at
- different screen positions at the same time. Consider the example in figure
- \ref{fig:ex1}, where two squares must be able to be rotated independently
- at the same time. If the developer is left the task to assign a gesture to
- one of the squares, the event analysis component in figure
- \ref{fig:driverdiagram} receives all events that occur on the screen.
- Assuming that the rotation detection logic detects a single rotation
- gesture based on all of its input events, without detecting clusters of
- input events, only one rotation gesture can be triggered at the same time.
- When a user attempts to ``grab'' one rectangle with each hand, the events
- triggered by all fingers are combined to form a single rotation gesture
- instead of two separate gestures.
- \examplefigureone
- To overcome this problem, groups of events must be separated by the event
- analysis component before any detection logic is executed. An obvious
- solution for the given example is to incorporate this separation in the
- rotation detection logic itself, using a distance threshold that decides if
- an event should be added to an existing rotation gesture. Leaving the task
- of separating groups of events to detection logic leads to duplication of
- code. For instance, if the rotation gesture is replaced by a \emph{pinch}
- gesture that enlarges a rectangle, the detection logic that detects the
- pinch gesture would have to contain the same code that separates groups of
- events for different gestures. Also, a pinch gesture can be performed using
- fingers multiple hands as well, in which case the use of a simple distance
- threshold is insufficient. These examples show that gesture detection logic
- is hard to implement without knowledge about (the position of) the
- widget\footnote{``Widget'' is a name commonly used to identify an element
- of a graphical user interface (GUI).} that is receiving the gesture.
- Therefore, a better solution for the assignment of events to gesture
- detection is to make the gesture detection component aware of the locations
- of application widgets on the screen. To accomplish this, the architecture
- must contain a representation of the screen area covered by a widget. This
- leads to the concept of an \emph{area}, which represents an area on the
- touch surface in which events should be grouped before being delegated to a
- form of gesture detection. Examples of simple area implementations are
- rectangles and circles. However, area's could also be made to represent
- more complex shapes.
- An area groups events and assigns them to some piece of gesture detection
- logic. This possibly triggers a gesture, which must be handled by the
- client application. A common way to handle framework events in an
- application is a ``callback'' mechanism: the application developer binds a
- function to an event, that is called by the framework when the event
- occurs. Because of the familiarity of this concept with developers, the
- architecture uses a callback mechanism to handle gestures in an
- application. Since an area controls the grouping of events and thus the
- occurrence of gestures in an area, gesture handlers for a specific gesture
- type are bound to an area. Figure \ref{fig:areadiagram} shows the position
- of areas in the architecture.
- \areadiagram{Extension of the diagram from figure \ref{fig:driverdiagram},
- showing the position of areas in the architecture. An area delegate events
- to a gesture detection component that trigger gestures. The area then calls
- the handler that is bound to the gesture type by the application.}
- An area can be seen as an independent subset of a touch surface. Therefore,
- the parameters (coordinates) of events and gestures within an area should
- be relative to the area.
- Note that the boundaries of an area are only used to group events, not
- gestures. A gesture could occur outside the area that contains its
- originating events, as illustrated by the example in figure \ref{fig:ex2}.
- \examplefiguretwo
- A remark must be made about the use of areas to assign events the detection
- of some gesture. The concept of an ``area'' is based on the assumption that
- the set or originating events that form a particular gesture, can be
- determined based exclusively on the location of the events. This is a
- reasonable assumption for simple touch objects whose only parameter is a
- position, such as a pen or a human finger. However, more complex touch
- objects can have additional parameters, such as rotational orientation or
- color. An even more generic concept is the \emph{event filter}, which
- detects whether an event should be assigned to a particular piece of
- gesture detection based on all available parameters. This level of
- abstraction allows for constraints like ``Use all blue objects within a
- widget for rotation, and green objects for tapping.''. As mentioned in the
- introduction chapter [\ref{chapter:introduction}], the scope of this thesis
- is limited to multi-touch surface based devices, for which the \emph{area}
- concept suffices. Section \ref{sec:eventfilter} explores the possibility of
- areas to be replaced with event filters.
- \subsection{Area tree}
- \label{sec:tree}
- The most simple implementation of areas in the architecture is a list of
- areas. When the event driver delegates an event, it is delegated to gesture
- detection by each area that contains the event coordinates.
- If the architecture were to be used in combination with an application
- framework like GTK \cite{GTK}, each GTK widget that must receive gestures
- should have a mirroring area that synchronizes its position with that of
- the widget. Consider a panel with five buttons that all listen to a
- ``tap'' event. If the panel is moved as a result of movement of the
- application window, the position of each button has to be updated.
- This process is simplified by the arrangement of areas in a tree structure.
- A root area represents the panel, containing five subareas which are
- positioned relative to the root area. The relative positions do not need to
- be updated when the panel area changes its position. GUI frameworks, like
- GTK, use this kind of tree structure to manage widgets. A recommended first
- step when developing an application is to create some subclass of the area
- that synchronizes with the position of a widget from the GUI framework
- automatically.
- \section{Detecting gestures from events}
- \label{sec:gesture-detection}
- The events that are grouped by areas must be translated to complex gestures
- in some way. Gestures such as a button tap or the dragging of an object
- using one finger are easy to detect by comparing the positions of
- sequential $point\_down$ and $point\_move$ events.
- A way to detect more complex gestures is based on a sequence of input
- features is with the use of machine learning methods, such as Hidden Markov
- Models \footnote{A Hidden Markov Model (HMM) is a statistical model without
- a memory, it can be used to detect gestures based on the current input
- state alone.} \cite{conf/gw/RigollKE97}. A sequence of input states can be
- mapped to a feature vector that is recognized as a particular gesture with
- some probability. This type of gesture recognition is often used in video
- processing, where large sets of data have to be processed. Using an
- imperative programming style to recognize each possible sign in sign
- language detection is near impossible, and certainly not desirable.
- Sequences of events that are triggered by a multi-touch based surfaces are
- often of a manageable complexity. An imperative programming style is
- sufficient to detect many common gestures. The imperative programming style
- is also familiar and understandable for a wide range of application
- developers. Therefore, the aim is to use this programming style in the
- architecture implementation that is developed during this project.
- However, the architecture should not be limited to multi-touch surfaces
- alone. For example, the architecture should also be fit to be used in an
- application that detects hand gestures from video input.
- A problem with the imperative programming style is that the detection of
- different gestures requires different pieces of detection code. If this is
- not managed well, the detection logic is prone to become chaotic and
- over-complex.
- To manage complexity and support multiple methods of gesture detection, the
- architecture has adopted the tracker-based design as described by
- \cite{win7touch}. Different detection components are wrapped in separate
- gesture tracking units, or \emph{gesture trackers} The input of a gesture
- tracker is provided by an area in the form of events. When a gesture
- tracker detects a gesture, this gesture is triggered in the corresponding
- area. The area then calls the callbacks which are bound to the gesture
- type by the application. Figure \ref{fig:trackerdiagram} shows the position
- of gesture trackers in the architecture.
- \trackerdiagram{Extension of the diagram from figure
- \ref{fig:areadiagram}, showing the position of gesture trackers in the
- architecture.}
- The use of gesture trackers as small detection units provides extendability
- of the architecture. A developer can write a custom gesture tracker and
- register it in the architecture. The tracker can use any type of detection
- logic internally, as long as it translates events to gestures.
- An example of a possible gesture tracker implementation is a
- ``transformation tracker'' that detects rotation, scaling and translation
- gestures.
- \section{Reserving an event for a gesture}
- \label{sec:reserve-event}
- A problem occurs when areas overlap, as shown by figure
- \ref{fig:eventpropagation}. When the white square is rotated, the gray
- square should keep its current orientation. This means that events that are
- used for rotation of the white square, should not be used for rotation of
- the gray square. To achieve this, there must be some communication between
- the gesture trackers of the two squares. When an event in the white square
- is used for rotation, that event should not be used for rotation in the
- gray square. In other words, the event must be \emph{reserved} for the
- rotation gesture in the white square. In order to reserve an event, the
- event needs to be handled by the rotation tracker of the white before the
- rotation tracker of the grey square receives it. Otherwise, the gray square
- has already triggered a rotation gesture and it will be too late to reserve
- the event for rotation of the white square.
- When an object touches the touch surface, the event that is triggered
- should be delegated according to the order in which its corresponding areas
- are positioned over each other. The tree structure in which areas are
- arranged (see section \ref{sec:tree}), is an ideal tool to determine the
- order in which an event is delegated to different areas. Areas in the tree
- are positioned on top of their parent. An object touching the screen is
- essentially touching the deepest area in the tree that contains the
- triggered event. That area should be the first to delegate the event to its
- gesture trackers, and then move the event up in the tree to its ancestors.
- The movement of an event up in the area tree will be called \emph{event
- propagation}. To reserve an event for a particular gesture, a gesture
- tracker can stop its propagation. When propagation of an event is stopped,
- it will not be passed on the ancestor areas, thus reserving the event.
- The diagram in appendix \ref{app:eventpropagation} illustrates the use of
- event propagation, applied to the example of the white and gray squares.
- \section{Serving multiple applications}
- \label{sec:daemon}
- The design of the architecture is essentially complete with the components
- specified in this chapter. However, one specification has not yet been
- discussed: the ability address the architecture using a method of
- communication independent of the application programming language.
- If an application must start the architecture instance in a thread within
- the application itself, the architecture is required to be compatible with
- the programming language used to write the application. To overcome the
- language barrier, an instance of the architecture would have to run in a
- separate process.
- A common and efficient way of communication between two separate processes
- is through the use of a network protocol. In this particular case, the
- architecture can run as a daemon\footnote{``daemon'' is a name Unix uses to
- indicate that a process runs as a background process.} process, listening
- to driver messages and triggering gestures in registered applications.
- \vspace{-0.3em}
- \daemondiagram
- An advantage of a daemon setup is that is can serve multiple applications
- at the same time. Alternatively, each application that uses gesture
- interaction would start its own instance of the architecture in a separate
- process, which would be less efficient.
- \section{Example usage}
- \label{sec:example}
- This section describes an extended example to illustrate the data flow of
- the architecture. The example application listens to tap events on a button
- within an application window. The window also contains a draggable circle.
- The application window can be resized using \emph{pinch} gestures. Figure
- \ref{fig:examplediagram} shows the architecture created by the pseudo code
- below.
- \begin{verbatim}
- initialize GUI framework, creating a window and nessecary GUI widgets
- create a root area that synchronizes position and size with the application window
- define 'rotation' gesture handler and bind it to the root area
- create an area with the position and radius of the circle
- define 'drag' gesture handler and bind it to the circle area
- create an area with the position and size of the button
- define 'tap' gesture handler and bind it to the button area
- create a new event server and assign the created root area to it
- start the event server in a new thread
- start the GUI main loop in the current thread
- \end{verbatim}
- \examplediagram
- \chapter{Test applications}
- A reference implementation of the design is written in Python. Two test
- applications have been created to test if the design ``works'' in a practical
- application, and to detect its flaws. One application is mainly used to test
- the gesture tracker implementations. The other program uses areas in a tree,
- demonstrating event delegation and propagation.
- To test multi-touch interaction properly, a multi-touch device is required. The
- University of Amsterdam (UvA) has provided access to a multi-touch table from
- PQlabs. The table uses the TUIO protocol \cite{TUIO} to communicate touch
- events. See appendix \ref{app:tuio} for details regarding the TUIO protocol.
- %The reference implementation and its test applications are a Proof of Concept,
- %meant to show that the architecture design is effective.
- %that translates TUIO messages to some common multi-touch gestures.
- \section{Reference implementation}
- \label{sec:implementation}
- % TODO
- % een paar simpele areas en trackers
- % Geen netwerk protocol
- The reference implementation is written in Python and available at
- \cite{gitrepos}. The following component implementations are included:
- \textbf{Event drivers}
- \begin{itemize}
- \item TUIO driver, using only the support for simple touch points with an
- $(x, y)$ position.
- \end{itemize}
- \textbf{Gesture trackers}
- \begin{itemize}
- \item Basic tracker, supports $point\_down,~point\_move,~point\_up$ gestures.
- \item Tap tracker, supports $tap,~single\_tap,~double\_tap$ gestures.
- \item Transformation tracker, supports $rotate,~pinch,~drag$ gestures.
- \end{itemize}
- \textbf{Areas}
- \begin{itemize}
- \item Circular area
- \item Rectangular area
- \item Full screen area
- \end{itemize}
- The implementation does not include a network protocol to support the daemon
- setup as described in section \ref{sec:daemon}. Therefore, it is only usable in
- Python programs. Thus, the two test programs are also written in Python.
- The area implementations contain some geometric functions to determine whether
- an event should be delegated to an area. All gesture trackers have been
- implemented using an imperative programming style. Technical details about the
- implementation of gesture detection are described in appendix
- \ref{app:implementation-details}.
- \section{Full screen Pygame program}
- %The goal of this program was to experiment with the TUIO
- %protocol, and to discover requirements for the architecture that was to be
- %designed. When the architecture design was completed, the program was rewritten
- %using the new architecture components. The original variant is still available
- %in the ``experimental'' folder of the Git repository \cite{gitrepos}.
- An implementation of the detection of some simple multi-touch gestures (single
- tap, double tap, rotation, pinch and drag) using Processing\footnote{Processing
- is a Java-based programming environment with an export possibility for Android.
- See also \cite{processing}.} can be found in a forum on the Processing website
- \cite{processingMT}. The program has been ported to Python and adapted to
- receive input from the TUIO protocol. The implementation is fairly simple, but
- it yields some appealing results (see figure \ref{fig:draw}). In the original
- program, the detection logic of all gestures is combined in a single class
- file. As predicted by the GART article \cite{GART}, this leads to over-complex
- code that is difficult to read and debug.
- The application has been rewritten using the reference implementation of the
- architecture. The detection code is separated into two different gesture
- trackers, which are the ``tap'' and ``transformation'' trackers mentioned in
- section \ref{sec:implementation}.
- The application receives TUIO events and translates them to \emph{point\_down},
- \emph{point\_move} and \emph{point\_up} events. These events are then
- interpreted to be \emph{single tap}, \emph{double tap}, \emph{rotation} or
- \emph{pinch} gestures. The positions of all touch objects are drawn using the
- Pygame library. Since the Pygame library does not provide support to find the
- location of the display window, the root area captures events in the entire
- screens surface. The application can be run either full screen or in windowed
- mode. If windowed, screen-wide gesture coordinates are mapped to the size of
- the Pyame window. In other words, the Pygame window always represents the
- entire touch surface. The output of the program can be seen in figure
- \ref{fig:draw}.
- \begin{figure}[h!]
- \center
- \includegraphics[scale=0.4]{data/pygame_draw.png}
- \caption{Output of the experimental drawing program. It draws all touch
- points and their centroid on the screen (the centroid is used for rotation
- and pinch detection). It also draws a green rectangle which
- \label{fig:draw}
- responds to rotation and pinch events.}
- \end{figure}
- \section{GTK/Cairo program}
- The second test application uses the GIMP toolkit (GTK+) \cite{GTK} to create
- its user interface. Since GTK+ defines a main event loop that is started in
- order to use the interface, the architecture implementation runs in a separate
- thread. The application creates a main window, whose size and position are
- synchronized with the root area of the architecture.
- % TODO
- \emph{TODO: uitbreiden en screenshots erbij (dit programma is nog niet af)}
- \chapter{Conclusions}
- % TODO
- \chapter{Suggestions for future work}
- \section{A generic way for grouping events}
- \label{sec:eventfilter}
- % TODO
- % - "event filter" ipv "area"
- \section{Using a state machine for gesture detection}
- % TODO
- % - gebruik formelere definitie van gestures ipv expliciete detection logic,
- % bijv. een state machine
- \section{Daemon implementation}
- % TODO
- % - network protocol (ZeroMQ) voor meerdere talen en simultane processen
- % - volgende stap: maken van een library die meerdere drivers en complexe
- % gestures bevat
- \bibliographystyle{plain}
- \bibliography{report}{}
- \appendix
- \chapter{The TUIO protocol}
- \label{app:tuio}
- The TUIO protocol \cite{TUIO} defines a way to geometrically describe tangible
- objects, such as fingers or objects on a multi-touch table. Object information
- is sent to the TUIO UDP port (3333 by default).
- For efficiency reasons, the TUIO protocol is encoded using the Open Sound
- Control \cite[OSC]{OSC} format. An OSC server/client implementation is
- available for Python: pyOSC \cite{pyOSC}.
- A Python implementation of the TUIO protocol also exists: pyTUIO \cite{pyTUIO}.
- However, the execution of an example script yields an error regarding Python's
- built-in \texttt{socket} library. Therefore, the reference implementation uses
- the pyOSC package to receive TUIO messages.
- The two most important message types of the protocol are ALIVE and SET
- messages. An ALIVE message contains the list of session id's that are currently
- ``active'', which in the case of multi-touch a table means that they are
- touching the screen. A SET message provides geometric information of a session
- id, such as position, velocity and acceleration.
- Each session id represents an object. The only type of objects on the
- multi-touch table are what the TUIO protocol calls ``2DCur'', which is a (x, y)
- position on the screen.
- ALIVE messages can be used to determine when an object touches and releases the
- screen. For example, if a session id was in the previous message but not in the
- current, The object it represents has been lifted from the screen.
- SET provide information about movement. In the case of simple (x, y) positions,
- only the movement vector of the position itself can be calculated. For more
- complex objects such as fiducials, arguments like rotational position and
- acceleration are also included.
- ALIVE and SET messages can be combined to create ``point down'', ``point move''
- and ``point up'' events (as used by the Windows 7 implementation
- \cite{win7touch}).
- TUIO coordinates range from $0.0$ to $1.0$, with $(0.0, 0.0)$ being the left
- top corner of the screen and $(1.0, 1.0)$ the right bottom corner. To focus
- events within a window, a translation to window coordinates is required in the
- client application, as stated by the online specification
- \cite{TUIO_specification}:
- \begin{quote}
- In order to compute the X and Y coordinates for the 2D profiles a TUIO
- tracker implementation needs to divide these values by the actual sensor
- dimension, while a TUIO client implementation consequently can scale these
- values back to the actual screen dimension.
- \end{quote}
- \chapter{Diagram demonstrating event propagation}
- \label{app:eventpropagation}
- \eventpropagationfigure
- \chapter{Gesture detection in the reference implementation}
- \label{app:implementation-details}
- % TODO
- Both rotation and pinch use the centroid of all touch points. A \emph{rotation}
- gesture uses the difference in angle relative to the centroid of all touch
- points, and \emph{pinch} uses the difference in distance. Both values are
- normalized using division by the number of touch points. A pinch event contains
- a scale factor, and therefore uses a division of the current by the previous
- average distance to the centroid.
- \end{document}
|