| 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537 |
- \documentclass[twoside,openright]{uva-bachelor-thesis}
- \usepackage[english]{babel}
- \usepackage[utf8]{inputenc}
- \usepackage{hyperref,graphicx,float,tikz,subfigure}
- % Link colors
- \hypersetup{colorlinks=true,linkcolor=black,urlcolor=blue,citecolor=DarkGreen}
- % Title Page
- \title{A generic architecture for the detection of multi-touch gestures}
- \author{Taddeüs Kroes}
- \supervisors{Dr. Robert G. Belleman (UvA)}
- \signedby{Dr. Robert G. Belleman (UvA)}
- \begin{document}
- % Title page
- \maketitle
- \begin{abstract}
- % TODO
- \end{abstract}
- % Set paragraph indentation
- \parindent 0pt
- \parskip 1.5ex plus 0.5ex minus 0.2ex
- % Table of contant on separate page
- \tableofcontents
- \chapter{Introduction}
- % Ruwe probleemstelling
- Multi-touch interaction is becoming increasingly common, mostly due to the wide
- use of touch screens in phones and tablets. When programming applications using
- this method of interaction, the programmer needs an abstraction of the raw data
- provided by the touch driver of the device. This abstraction exists in several
- multi-touch application frameworks like Nokia's
- Qt\footnote{\url{http://qt.nokia.com/}}. However, applications that do not use
- these frameworks have no access to their multi-touch events.
- % Aanleiding
- This problem was observed during an attempt to create a multi-touch
- ``interactor'' class for the Visualization Toolkit \cite[VTK]{VTK}. Because VTK
- provides the application framework here, it is undesirable to use an entire
- framework like Qt simultaneously only for its multi-touch support.
- % Ruw doel
- The goal of this project is to define a generic multi-touch event triggering
- architecture. To test the definition, a reference implementation is written in
- Python.
- \section{Definition of the problem}
- % Hoofdvraag
- The goal of this thesis is to a create generic architecture for a
- multi-touch event triggering mechanism for use in multi-touch applications.
- % Deelvragen
- To design such an architecture properly, the following questions are relevant:
- \begin{itemize}
- \item What is the input of the architecture? Different touch drivers
- have different API's. To be able to support different drivers
- (which is highly desirable), there should be a translation from the
- driver API to a fixed input format.
- \item How can extendability be accomplished? The set of supported
- events should not be limited to a single implementation, but an
- application should be able to define its own custom events.
- \item How can the architecture be used by different programming
- languages? A generic architecture should not be limited to be used
- in only one language.
- \item Can events be used by multiple processes at the same time? For
- example, a network implementation could run as a service instead of
- within a single application, triggering events in any application
- that needs them.
- \end{itemize}
- % Afbakening
- The scope of this thesis includes the design of a generic multi-touch
- triggering architecture, a reference implementation of this design, and its
- integration into a test case application. To be successful, the design
- should allow for extensions to be added to any implementation.
- The reference implementation is a Proof of Concept that translates TUIO
- events to some simple touch gestures that are used by some test
- applications.
- %Being a Proof of Concept, the reference implementation itself does not
- %necessarily need to meet all the requirements of the design.
- \section{Structure of this document}
- % TODO: pas als thesis af is
- \chapter{Related work}
- \section{Gesture and Activity Recognition Toolkit}
- The Gesture and Activity Recognition Toolkit (GART) \cite{GART} is a
- toolkit for the development of gesture-based applications. The toolkit
- states that the best way to classify gestures is to use machine learning.
- The programmer trains a program to recognize using the machine learning
- library from the toolkit. The toolkit contains a callback-mechanism that
- the programmer uses to execute custom code when a gesture is recognized.
- Though multi-touch input is not directly supported by the toolkit, the
- level of abstraction does allow for it to be implemented in the form of a
- ``touch'' sensor.
- The reason to use machine learning is the statement that gesture detection
- ``is likely to become increasingly complex and unmanageable'' when using a
- set of predefined rules to detect whether some sensor input can be seen as
- a specific gesture. This statement is not necessarily true. If the
- programmer is given a way to separate the detection of different types of
- gestures and flexibility in rule definitions, over-complexity can be
- avoided.
- % oplossing: trackers. bijv. TapTracker, TransformationTracker gescheiden
- \section{Gesture recognition software for Windows 7}
- % TODO
- The online article \cite{win7touch} presents a Windows 7 application,
- written in Microsofts .NET. The application shows detected gestures in a
- canvas. Gesture trackers keep track of stylus locations to detect specific
- gestures. The event types required to track a touch stylus are ``stylus
- down'', ``stylus move'' and ``stylus up'' events. A
- \texttt{GestureTrackerManager} object dispatches these events to gesture
- trackers. The application supports a limited number of pre-defined
- gestures.
- An important observation in this application is that different gestures are
- detected by different gesture trackers, thus separating gesture detection
- code into maintainable parts.
- \section{Processing implementation of simple gestures in Android}
- An implementation of a detection architecture for some simple multi-touch
- gestures (tap, double tap, rotation, pinch and drag) using
- Processing\footnote{Processing is a Java-based development environment with
- an export possibility for Android. See also \url{http://processing.org/.}}
- can be found found in a forum on the Processing website
- \cite{processingMT}. The implementation is fairly simple, but it yields
- some very appealing results. The detection logic of all gestures is
- combined in a single class. This does not allow for extendability, because
- the complexity of this class would increase to an undesirable level (as
- predicted by the GART article \cite{GART}). However, the detection logic
- itself is partially re-used in the reference implementation of the
- generic gesture detection architecture.
- \section{Analysis of related work}
- The simple Processing implementation of multi-touch events provides most of
- the functionality that can be found in existing multi-touch applications.
- In fact, many applications for mobile phones and tablets only use tap and
- scroll events. For this category of applications, using machine learning
- seems excessive. Though the representation of a gesture using a feature
- vector in a machine learning algorithm is a generic and formal way to
- define a gesture, a programmer-friendly architecture should also support
- simple, ``hard-coded'' detection code. A way to separate different pieces
- of gesture detection code, thus keeping a code library manageable and
- extendable, is to user different gesture trackers.
- \chapter{Requirements}
- \label{chapter:requirements}
- % testimplementatie met taps, rotatie en pinch. Hieruit bleek:
- % - dat er verschillende manieren zijn om bijv. "rotatie" te
- % detecteren, (en dat daartussen onderscheid moet kunnen worden
- % gemaakt)
- % - dat detectie van verschillende soorten gestures moet kunnen
- % worden gescheiden, anders wordt het een chaos.
- % - Er zijn een aantal keuzes gemaakt bij het ontwerpen van de gestures,
- % bijv dat rotatie ALLE vingers gebruikt voor het centroid. Het is
- % wellicht in een ander programma nodig om maar 1 hand te gebruiken, en
- % dus punten dicht bij elkaar te kiezen (oplossing: windows).
- % TODO: Move content into the following sections:
- \section{Introduction}
- \section{Supporting multiple drivers}
- \section{Restricting gestures to a screen area}
- \section{Separating and extending code}
- \section{Introduction}
- % TODO
- To test multi-touch interaction properly, a multi-touch device is required.
- The University of Amsterdam (UvA) has provided access to a multi-touch
- table from PQlabs. The table uses the TUIO protocol \cite{TUIO} to
- communicate touch events. See appendix \ref{app:tuio} for details regarding
- the TUIO protocol.
- \section{Experimenting with TUIO and event bindings}
- \label{sec:experimental-draw}
- When designing a software library, its API should be understandable and
- easy to use for programmers. To find out the basic requirements of the API
- to be usable, an experimental program has been written based on the
- Processing code from \cite{processingMT}. The program receives TUIO events
- and translates them to point \emph{down}, \emph{move} and \emph{up} events.
- These events are then interpreted to be (double or single) \emph{tap},
- \emph{rotation} or \emph{pinch} gestures. A simple drawing program then
- draws the current state to the screen using the PyGame library. The output
- of the program can be seen in figure \ref{fig:draw}.
- \begin{figure}[H]
- \center
- \label{fig:draw}
- \includegraphics[scale=0.4]{data/experimental_draw.png}
- \caption{Output of the experimental drawing program. It draws the touch
- points and their centroid on the screen (the centroid is used
- as center point for rotation and pinch detection). It also
- draws a green rectangle which responds to rotation and pinch
- events.}
- \end{figure}
- One of the first observations is the fact that TUIO's \texttt{SET} messages
- use the TUIO coordinate system, as described in appendix \ref{app:tuio}.
- The test program multiplies these with its own dimensions, thus showing the
- entire screen in its window. Also, the implementation only works using the
- TUIO protocol. Other drivers are not supported.
- Though using relatively simple math, the rotation and pinch events work
- surprisingly well. Both rotation and pinch use the centroid of all touch
- points. A \emph{rotation} gesture uses the difference in angle relative to
- the centroid of all touch points, and \emph{pinch} uses the difference in
- distance. Both values are normalized using division by the number of touch
- points. A pinch event contains a scale factor, and therefore uses a
- division of the current by the previous average distance to the centroid.
- There is a flaw in this implementation. Since the centroid is calculated
- using all current touch points, there cannot be two or more rotation or
- pinch gestures simultaneously. On a large multi-touch table, it is
- desirable to support interaction with multiple hands, or multiple persons,
- at the same time. This kind of application-specific requirements should be
- defined in the application itself, whereas the experimental implementation
- defines detection algorithms based on its test program.
- Also, the different detection algorithms are all implemented in the same
- file, making it complex to read or debug, and difficult to extend.
- \section{Summary of observations}
- \label{sec:observations}
- \begin{itemize}
- \item The TUIO protocol uses a distinctive coordinate system and set of
- messages.
- \item Touch events occur outside of the application window.
- \item Gestures that use multiple touch points are using all touch
- points (not a subset of them).
- \item Code complexity increases when detection algorithms are added.
- \item A multi-touch application can have very specific requirements for
- gestures.
- \end{itemize}
- \section{Requirements}
- From the observations in section \ref{sec:observations}, a number of
- requirements can be specified for the design of the event mechanism:
- \begin{itemize}
- % vertalen driver-specifieke events naar algemeen formaat
- \item To be able to support multiple input drivers, there must be a
- translation from driver-specific messages to some common format
- that can be used in gesture detection algorithms.
- % events toewijzen aan GUI window (windows)
- \item An application GUI window should be able to receive only events
- occurring within that window, and not outside of it.
- % scheiden groepen touchpoints voor verschillende gestures (windows)
- \item To support multiple objects that are performing different
- gestures at the same time, the architecture must be able to perform
- gesture detection on a subset of the active touch points.
- % scheiden van detectiecode voor verschillende gesture types
- \item To avoid an increase in code complexity when adding new detection
- algorithms, detection code of different gesture types must be
- separated.
- % extendability
- \item The architecture should allow for extension with new detection
- algorithms to be added to an implementation. This enables a
- programmer to define custom gestures for an application.
- \end{itemize}
- \chapter{Design}
- \section{Components}
- Based on the requirements from chapter \ref{chapter:requirements}, a design
- for the architecture has been created. The design consists of a number
- of components, each having a specific set of tasks.
- % TODO: Rewrite components, use more diagrams
- \subsection{Event server}
- % vertaling driver naar point down, move, up
- % vertaling naar schermpixelcoordinaten
- % TUIO in reference implementation
- The \emph{event server} is an abstraction for driver-specific server
- implementations, such as a TUIO server. It receives driver-specific
- messages and tanslates these to a common set of events and a common
- coordinate system.
- A minimal example of a common set of events is $\{point\_down,
- point\_move, point\_up\}$. This is the set used by the reference
- implementation. Respectively, these events represent an object being
- placed on the screen, moving along the surface of the screen, and being
- released from the screen.
- A more extended set could also contain the same three events for an
- object touching the screen. However, a object can also have a
- rotational property, like the ``fiducials'' type in the TUIO protocol.
- This results in $\{point\_down, point\_move, point\_up, object\_down,
- object\_move, object\_up,\\object\_rotate\}$.
- % TODO: is dit handig? point_down/object_down op 1 of andere manier samenvoegen?
- An important note here, is that similar events triggered by different
- event servers must have the same event type and parameters. In other
- words, the output of the event servers should be determined by the
- gesture servers (not the contrary).
- The output of an event server implementation should also use a common
- coordinate system, that is the coordinate system used by the gesture
- server. For example, the reference implementation uses screen
- coordinates in pixels, where (0, 0) is the upper left corner and
- (\emph{screen width}, \emph{screen height}) the lower right corner of
- the screen.
- The abstract class definition of the event server should provide some
- functionality to detect which driver-specific event server
- implementation should be used.
- \subsection{Gesture trackers}
- Like \cite[the .NET implementation]{win7touch}, the architecture uses a
- \emph{gesture tracker} to detect if a sequence of events forms a
- particular gesture. A gesture tracker detects and triggers events for a
- limited set of gesture types, given a set of touch points. If one group
- of touch points is assigned to one tracker and another group to another
- tracker, multiple gestures can be detected at the same time. For the
- assignment of different groups of touch points to different gesture
- trackers, the architecture uses so-called \emph{windows}. These are
- described in the next section.
- % event binding/triggering
- A gesture tracker triggers a gesture event by executing a callback.
- Callbacks are ``bound'' to a tracker by the application. Because
- multiple gesture types can have very similar detection algorithm, a
- tracker can detect multiple different types of gestures. For instance,
- the rotation and pinch gestures from the experimental program in
- section \ref{sec:experimental-draw} both use the centroid of all touch
- points.
- If no callback is bound for a particular gesture type, no detection of
- that type is needed. A tracker implementation can use this knowledge
- for code optimization.
- % scheiding algoritmiek
- A tracker implementation defines the gesture types it can trigger, and
- the detection algorithms to trigger them. Consequently, detection
- algorithms can be separated in different trackers. Different
- trackers can be saved in different files, reducing the complexity of
- the code in a single file. \\
- % extendability
- Because a tracker defines its own set of gesture types, the application
- developer can define application-specific trackers (by extending a base
- \texttt{GestureTracker} class, for example). In fact, any built-in
- gesture trackers of an implementation are also created this way. This
- allows for a plugin-like way of programming, which is very desirable if
- someone would want to build a library of gesture trackers. Such a
- library can easy be extended by others.
- \subsection{Windows}
- A \emph{window} represents a subset of the entire screen surface. The
- goal of a window is to restrict the detection of certain gestures to
- certain areas. A window contains a list of touch points, and a list of
- trackers. A gesture server (defined in the next section) assigns touch
- points to a window, but the window itself defines functionality to
- check whether a touch point is inside the window. This way, new windows
- can be defined to fit over any 2D object used by the application.
- The first and most obvious use of a window is to restrict touch events
- to a single application window. However, the use of windows can be used
- in a lot more powerful way.
- For example, an application contains an image with a transparent
- background that can be dragged around. The user can only drag the image
- by touching its foreground. To accomplish this, the application
- programmer can define a window type that uses a bitmap to determine
- whether a touch point is on the visible image surface. The tracker
- which detects drag gestures is then bound to this window, limiting the
- occurence of drag events to the image surface.
- % toewijzen even aan deel v/h scherm:
- % TUIO coördinaten zijn over het hele scherm en van 0.0 tot 1.0, dus
- % moeten worden vertaald naar pixelcoördinaten binnen een ``window''
- % TODO
- \subsection{Gesture server}
- % luistert naar point down, move, up
- The \emph{gesture server} delegates events from the event server to the
- set of windows that contain the touch points related to the events.
- % toewijzing point (down) aan window(s)
- The gesture server contains a list of windows. When the event server
- triggers an event, the gesture server ``asks'' each window whether it
- contains the related touch point. If so, the window updates its gesture
- trackers, which can then trigger gestures.
- \section{Diagrams}
- \input{data/diagrams}
- \simplediagram
- \completediagrams
- \section{Example usage}
- This section describes an example that illustrates the communication
- between different components. The example application listens to tap events
- in a GUI window.
- \begin{verbatim}
- # Create a gesture server that will be started later
- server = new GestureServer object
- # Add a new window to the server, representing the GUI
- window = new Window object
- set window position and size to that of GUIO window
- add window to server
- # Define a handler that must be triggered when a tap gesture is detected
- begin function handler(gesture)
- # Do something
- end function
- # Create a tracker that detects tap gestures
- tracker = new TapTracker object # Where TapTracker is an implementation of
- # abstract Tracker
- add tracker tot window
- bind handler to tracker.tap
- # If the GUI toolkit allows it, bind window movement and resize handlers
- # that alter the position size and sieze of the window object
- # Start the gesture server (which in turn starts a driver-specific event
- # server)
- start server
- \end{verbatim}
- \chapter{Reference implementation}
- % TODO
- % alleen window.contains op point down, niet move/up
- % een paar simpele windows en trackers
- \chapter{Test applications}
- % TODO
- % testprogramma's met PyGame
- %\chapter{Conclusions}
- % TODO
- % Windows zijn een manier om globale events toe te wijzen aan vensters
- % Trackers zijn een effectieve manier om gebaren te detecteren
- % Trackers zijn uitbreidbaar door object-orientatie
- \chapter{Suggestions for future work}
- % TODO
- % geruik formele definitie van gestures in gesture trackers, bijv. state machine
- % Network protocol (ZeroMQ) voor meerdere talen en simultane processen
- % Hierij ook: extra laag die gesture windows aanmaakt die corresponderen met window manager
- % Window in boomstructuur voor efficientie
- \bibliographystyle{plain}
- \bibliography{report}{}
- \appendix
- \chapter{The TUIO protocol}
- \label{app:tuio}
- The TUIO protocol \cite{TUIO} defines a way to geometrically describe tangible
- objects, such as fingers or objects on a multi-touch table. Object information
- is sent to the TUIO UDP port (3333 by default).
- For efficiency reasons, the TUIO protocol is encoded using the Open Sound
- Control \cite[OSC]{OSC} format. An OSC server/client implementation is
- available for Python: pyOSC \cite{pyOSC}.
- A Python implementation of the TUIO protocol also exists: pyTUIO \cite{pyTUIO}.
- However, the execution of an example script yields an error regarding Python's
- built-in \texttt{socket} library. Therefore, the reference implementation uses
- the pyOSC package to receive TUIO messages.
- The two most important message types of the protocol are ALIVE and SET
- messages. An ALIVE message contains the list of session id's that are currently
- ``active'', which in the case of multi-touch a table means that they are
- touching the screen. A SET message provides geometric information of a session
- id, such as position, velocity and acceleration.
- Each session id represents an object. The only type of objects on the
- multi-touch table are what the TUIO protocol calls ``2DCur'', which is a (x, y)
- position on the screen.
- ALIVE messages can be used to determine when an object touches and releases the
- screen. For example, if a session id was in the previous message but not in the
- current, The object it represents has been lifted from the screen.
- SET provide information about movement. In the case of simple (x, y) positions,
- only the movement vector of the position itself can be calculated. For more
- complex objects such as fiducials, arguments like rotational position is also
- included.
- ALIVE and SET messages can be combined to create ``point down'', ``point move''
- and ``point up'' events (as used by the \cite[.NET application]{win7touch}).
- TUIO coordinates range from $0.0$ to $1.0$, with $(0.0, 0.0)$ being the left
- top corner of the screen and $(1.0, 1.0)$ the right bottom corner. To focus
- events within a window, a translation to window coordinates is required in the
- client application, as stated by the online specification
- \cite{TUIO_specification}:
- \begin{quote}
- In order to compute the X and Y coordinates for the 2D profiles a TUIO
- tracker implementation needs to divide these values by the actual sensor
- dimension, while a TUIO client implementation consequently can scale these
- values back to the actual screen dimension.
- \end{quote}
- \end{document}
|