| 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483 |
- \documentclass[twoside,openright]{uva-bachelor-thesis}
- \usepackage[english]{babel}
- \usepackage[utf8]{inputenc}
- \usepackage{hyperref,graphicx,float,tikz,subfigure}
- % Link colors
- \hypersetup{colorlinks=true,linkcolor=black,urlcolor=blue,citecolor=DarkGreen}
- % Title Page
- \title{A generic architecture for the detection of multi-touch gestures}
- \author{Taddeüs Kroes}
- \supervisors{Dr. Robert G. Belleman (UvA)}
- \signedby{Dr. Robert G. Belleman (UvA)}
- \begin{document}
- % Title page
- \maketitle
- \begin{abstract}
- % TODO
- \end{abstract}
- % Set paragraph indentation
- \parindent 0pt
- \parskip 1.5ex plus 0.5ex minus 0.2ex
- % Table of content on separate page
- \tableofcontents
- \chapter{Introduction}
- % TODO: put Qt link in bibtex
- Multi-touch devices enable a user to interact with software using intuitive
- body gestures, rather than with interaction tools like mouse and keyboard.
- With the upcoming use of touch screens in phones and tablets, multi-touch
- interaction is becoming increasingly common.The driver of a touch device
- provides low-level events. The most basic representation of these low-level
- event consists of \emph{down}, \emph{move} and \emph{up} events.
- Multi-touch gestures must be designed in such a way, that they can be
- represented by a sequence of basic events. For example, a ``tap'' gesture can
- be represented as a \emph{down} event that is followed by an \emph{up} event
- within a certain time.
- The translation process of driver-specific messages to basic events, and events
- to multi-touch gestures is a process that is often embedded in multi-touch
- application frameworks, like Nokia's Qt \cite{qt}. However, there is no
- separate implementation of the process itself. Consequently, an application
- developer who wants to use multi-touch interaction in an application is forced
- to choose an application framework that includes support for multi-touch
- gestures. Moreover, the set of supported gestures is limited by the application
- framework. To incorporate some custom event in an application, the chosen
- framework needs to provide a way to extend existing multi-touch gestures.
- % Hoofdvraag
- The goal of this thesis is to create a generic architecture for the support of
- multi-touch gestures in applications. To test the design of the architecture, a
- reference implementation is written in Python. The architecture should
- incorporate the translation process of low-level driver messages to multi-touch
- gestures. It should be able to run beside an application framework. The
- definition of multi-touch gestures should allow extensions, so that custom
- gestures can be defined.
- % Deelvragen
- To design such an architecture properly, the following questions are relevant:
- \begin{itemize}
- \item What is the input of the architecture? This is determined by the
- output of multi-touch drivers.
- \item How can extendability of the supported gestures be accomplished?
- % TODO: zijn onderstaande nog relevant? beter omschrijven naar "Design"
- % gerelateerde vragen?
- \item How can the architecture be used by different programming languages?
- A generic architecture should not be limited to be used in only one
- language.
- \item Can events be used by multiple processes at the same time? For
- example, a network implementation could run as a service instead of
- within a single application, triggering events in any application that
- needs them.
- \end{itemize}
- % Afbakening
- The scope of this thesis includes the design of a generic multi-touch
- triggering architecture, a reference implementation of this design, and its
- integration into a test case application. To be successful, the design should
- allow for extensions to be added to any implementation.
- The reference implementation is a Proof of Concept that translates TUIO
- messages to some simple touch gestures that are used by a test application.
- \section{Structure of this document}
- % TODO: pas als thesis af is
- \chapter{Related work}
- \section{Gesture and Activity Recognition Toolkit}
- The Gesture and Activity Recognition Toolkit (GART) \cite{GART} is a
- toolkit for the development of gesture-based applications. The toolkit
- states that the best way to classify gestures is to use machine learning.
- The programmer trains a program to recognize using the machine learning
- library from the toolkit. The toolkit contains a callback mechanism that
- the programmer uses to execute custom code when a gesture is recognized.
- Though multi-touch input is not directly supported by the toolkit, the
- level of abstraction does allow for it to be implemented in the form of a
- ``touch'' sensor.
- The reason to use machine learning is the statement that gesture detection
- ``is likely to become increasingly complex and unmanageable'' when using a
- set of predefined rules to detect whether some sensor input can be seen as
- a specific gesture. This statement is not necessarily true. If the
- programmer is given a way to separate the detection of different types of
- gestures and flexibility in rule definitions, over-complexity can be
- avoided.
- % oplossing: trackers. bijv. TapTracker, TransformationTracker gescheiden
- \section{Gesture recognition software for Windows 7}
- The online article \cite{win7touch} presents a Windows 7 application,
- written in Microsofts .NET. The application shows detected gestures in a
- canvas. Gesture trackers keep track of stylus locations to detect specific
- gestures. The event types required to track a touch stylus are ``stylus
- down'', ``stylus move'' and ``stylus up'' events. A
- \texttt{GestureTrackerManager} object dispatches these events to gesture
- trackers. The application supports a limited number of pre-defined
- gestures.
- An important observation in this application is that different gestures are
- detected by different gesture trackers, thus separating gesture detection
- code into maintainable parts.
- \section{Processing implementation of simple gestures in Android}
- An implementation of a detection architecture for some simple multi-touch
- gestures (tap, double tap, rotation, pinch and drag) using
- Processing\footnote{Processing is a Java-based development environment with
- an export possibility for Android. See also \url{http://processing.org/.}}
- can be found found in a forum on the Processing website
- \cite{processingMT}. The implementation is fairly simple, but it yields
- some very appealing results. The detection logic of all gestures is
- combined in a single class. This does not allow for extendability, because
- the complexity of this class would increase to an undesirable level (as
- predicted by the GART article \cite{GART}). However, the detection logic
- itself is partially re-used in the reference implementation of the
- generic gesture detection architecture.
- \section{Analysis of related work}
- The simple Processing implementation of multi-touch events provides most of
- the functionality that can be found in existing multi-touch applications.
- In fact, many applications for mobile phones and tablets only use tap and
- scroll events. For this category of applications, using machine learning
- seems excessive. Though the representation of a gesture using a feature
- vector in a machine learning algorithm is a generic and formal way to
- define a gesture, a programmer-friendly architecture should also support
- simple, ``hard-coded'' detection code. A way to separate different pieces
- of gesture detection code, thus keeping a code library manageable and
- extendable, is to user different gesture trackers.
- % FIXME: change title below
- \chapter{Design}
- % Diagrams are defined in a separate file
- \input{data/diagrams}
- \section{Introduction}
- % TODO: rewrite intro, reference to experiment appendix
- This chapter describes a design for a generic multi-touch gesture detection
- architecture. The architecture constists of multiple components, each with
- a specific set of tasks. Naturally, the design is based on a number of
- requirements. The first three sections each describe a requirement, and a
- solution that meets the requirement. The following sections show the
- cohesion of the different components in the architecture.
- To test multi-touch interaction properly, a multi-touch device is required.
- The University of Amsterdam (UvA) has provided access to a multi-touch
- table from PQlabs. The table uses the TUIO protocol \cite{TUIO} to
- communicate touch events. See appendix \ref{app:tuio} for details regarding
- the TUIO protocol.
- \subsection*{Position of architecture in software}
- The input of the architecture comes from some multi-touch device
- driver. For example, the table used in the experiments uses the TUIO
- protocol. The task of the architecture is to translate this input to
- multi-touch gestures that are used by an application, as illustrated in
- figure \ref{fig:basicdiagram}. In the course of this chapter, the
- diagram is extended with the different components of the architecture.
- \basicdiagram{A diagram showing the position of the architecture
- relative to the device driver and a multi-touch application.}
- \section{Supporting multiple drivers}
- The TUIO protocol is an example of a touch driver that can be used by
- multi-touch devices. Other drivers do exist, which should also be supported
- by the architecture. Therefore, there must be some translation of
- driver-specific messages to a common format in the arcitecture. Messages in
- this common format will be called \emph{events}. Events can be translated
- to multi-touch \emph{gestures}. The most basic set of events is
- $\{point\_down, point\_move, point\_up\}$. Here, a ``point'' is a touch
- object with only an (x, y) position on the screen.
- A more extended set could also contain more complex events. An object can
- also have a rotational property, like the ``fiducials'' type in the TUIO
- protocol. This results in $\{point\_down, point\_move,\\point\_up,
- object\_down, object\_move, object\_up, object\_rotate\}$.
- The component that translates driver-specific messages to events, is called
- the \emph{event driver}. The event driver runs in a loop, receiving and
- analyzing driver messages. The event driver that is used in an application
- is dependent of the support of the multi-touch device.
- When a sequence of messages is analyzed as an event, the event driver
- delegates the event to other components in the architecture for translation
- to gestures.
- \driverdiagram{Extension of the diagram from figure \ref{fig:basicdiagram},
- showing the position of the event driver in the architecture.}
- \section{Restricting gestures to a screen area}
- An application programmer should be able to bind a gesture handler to some
- element on the screen. For example, a button tap\footnote{A ``tap'' gesture
- is triggered when a touch object releases the screen within a certain time
- and distance from the point where it initially touched the screen.} should
- only occur on the button itself, and not in any other area of the screen. A
- solution to this program is the use of \emph{widgets}. The button from the
- example can be represented as a rectangular widget with a position and
- size. The position and size are compared with event coordinates to
- determine whether an event should occur within the button.
- \subsection*{Widget tree}
- A problem occurs when widgets overlap. If a button in placed over a
- container and an event occurs occurs inside the button, should the
- button handle the event first? And, should the container receive the
- event at all or should it be reserved for the button?.
- The solution to this problem is to save widgets in a tree structure.
- There is one root widget, whose size is limited by the size of the
- touch screen. Being the leaf widget, and thus the widget that is
- actually touched when an object touches the device, the button widget
- should receive an event before its container does. However, events
- occur on a screen-wide level and thus at the root level of the widget
- tree. Therefore, an event is delegated in the tree before any analysis
- is performed. Delegation stops at the ``lowest'' widget in the three
- containing the event coordinates. That widget then performs some
- analysis of the event, after which the event is released back to the
- parent widget for analysis. This release of an event to a parent widget
- is called \emph{propagation}. To be able to reserve an event to some
- widget or analysis, the propagation of an event can be stopped during
- analysis.
- % TODO: insprired by JavaScript DOM
- % TODO: add GTK to bibliography
- Many GUI frameworks, like GTK \cite{GTK}, also use a tree structure to
- manage their widgets. This makes it easy to connect the architecture to
- such a framework. For example, the programmer can define a
- \texttt{GtkTouchWidget} that synchronises the position of a touch
- widget with that of a GTK widget, using GTK signals.
- \subsection*{Callbacks}
- \label{sec:callbacks}
- When an event is propagated by a widget, it is first used for event
- analysis on that widget. The event analysis can then trigger a gesture
- in the widget, which has to be handled by the application. To handle a
- gesture, the widget should provide a callback mechanism: the
- application binds a handler for a specific type of gesture to a widget.
- When a gesture of that type is triggered after event analysis, the
- widget triggers the callback.
- \subsection*{Position of widget tree in architecture}
- \widgetdiagram{Extension of the diagram from figure
- \ref{fig:driverdiagram}, showing the position of widgets in the
- architecture.}
- \section{Event analysis}
- The events that are delegated to widgets must be analyzed in some way to
- from gestures. This analysis is specific to the type of gesture being
- detected. E.g. the detection of a ``tap'' gesture is very different from
- detection of a ``rotate'' gesture. The \cite[.NET
- implementation]{win7touch} separates the detection of different gestures
- into different \emph{gesture trackers}. This keeps the different pieces of
- detection code managable and extandable. Therefore, the architecture also
- uses gesture trackers to separate the analysis of events. A single gesture
- tracker detects a specific set of gesture types, given a sequence of
- events. An example of a possible gesture tracker implementation is a
- ``transformation tracker'' that detects rotation, scaling and translation
- gestures.
- \subsection*{Assignment of a gesture tracker to a widget}
- As explained in section \ref{sec:callbacks}, events are delegated from
- a widget to some event analysis. The analysis component of a widget
- consists of a list of gesture trackers, each tracking a specific set of
- gestures. No two trackers in the list should be tracking the same
- gesture type.
- When a handler for a gesture is ``bound'' to a widget, the widget
- asserts that it has a tracker that is tracking this gesture. Thus, the
- programmer does not create gesture trackers manually. Figure
- \ref{fig:trackerdiagram} shows the position of gesture trackers in the
- architecture.
- \trackerdiagram{Extension of the diagram from figure
- \ref{fig:widgetdiagram}, showing the position of gesture trackers in
- the architecture.}
- \section{Example usage}
- This section describes an example that illustrates the API of the
- architecture. The example application listens to tap events on a button.
- The button is located inside an application window, which can be resized
- using pinch gestures.
- \begin{verbatim}
- initialize GUI, creating a window
- # Add widgets representing the application window and button
- rootwidget = new rectangular Widget object
- set rootwidget position and size to that of the application window
- buttonwidget = new rectangular Widget object
- set buttonwidget position and size to that of the GUI button
- # Create an event server that will be started later
- server = new EventServer object
- set rootwidget as root widget for server
- # Define handlers and bind them to corresponding widgets
- begin function resize_handler(gesture)
- resize GUI window
- update position and size of root wigdet
- end function
- begin function tap_handler_handler(gesture)
- # Perform some action that the button is meant to do
- end function
- bind ('pinch', resize_handler) to rootwidget
- bind ('tap', tap_handler) to buttonwidget
- # Start event server (which in turn starts a driver-specific event server)
- start server
- \end{verbatim}
- \examplediagram{Diagram representation of the example above. Dotted arrows
- represent gestures, normal arrows represent events (unless labeled
- otherwise).}
- \chapter{Test applications}
- % TODO
- % testprogramma's met PyGame/Cairo
- \chapter{Suggestions for future work}
- % TODO
- % gebruik formele definitie van gestures in gesture trackers, bijv. state machine
- % network protocol (ZeroMQ) voor meerdere talen en simultane processen
- % tussenlaag die widget tree synchroniseert met een applicatieframework
- \bibliographystyle{plain}
- \bibliography{report}{}
- \appendix
- \chapter{The TUIO protocol}
- \label{app:tuio}
- The TUIO protocol \cite{TUIO} defines a way to geometrically describe tangible
- objects, such as fingers or objects on a multi-touch table. Object information
- is sent to the TUIO UDP port (3333 by default).
- For efficiency reasons, the TUIO protocol is encoded using the Open Sound
- Control \cite[OSC]{OSC} format. An OSC server/client implementation is
- available for Python: pyOSC \cite{pyOSC}.
- A Python implementation of the TUIO protocol also exists: pyTUIO \cite{pyTUIO}.
- However, the execution of an example script yields an error regarding Python's
- built-in \texttt{socket} library. Therefore, the reference implementation uses
- the pyOSC package to receive TUIO messages.
- The two most important message types of the protocol are ALIVE and SET
- messages. An ALIVE message contains the list of session id's that are currently
- ``active'', which in the case of multi-touch a table means that they are
- touching the screen. A SET message provides geometric information of a session
- id, such as position, velocity and acceleration.
- Each session id represents an object. The only type of objects on the
- multi-touch table are what the TUIO protocol calls ``2DCur'', which is a (x, y)
- position on the screen.
- ALIVE messages can be used to determine when an object touches and releases the
- screen. For example, if a session id was in the previous message but not in the
- current, The object it represents has been lifted from the screen.
- SET provide information about movement. In the case of simple (x, y) positions,
- only the movement vector of the position itself can be calculated. For more
- complex objects such as fiducials, arguments like rotational position and
- acceleration are also included.
- ALIVE and SET messages can be combined to create ``point down'', ``point move''
- and ``point up'' events (as used by the \cite[.NET application]{win7touch}).
- TUIO coordinates range from $0.0$ to $1.0$, with $(0.0, 0.0)$ being the left
- top corner of the screen and $(1.0, 1.0)$ the right bottom corner. To focus
- events within a window, a translation to window coordinates is required in the
- client application, as stated by the online specification
- \cite{TUIO_specification}:
- \begin{quote}
- In order to compute the X and Y coordinates for the 2D profiles a TUIO
- tracker implementation needs to divide these values by the actual sensor
- dimension, while a TUIO client implementation consequently can scale these
- values back to the actual screen dimension.
- \end{quote}
- \chapter{Experimental program}
- \label{app:experiment}
- % TODO: rewrite intro
- When designing a software library, its API should be understandable and easy to
- use for programmers. To find out the basic requirements of the API to be
- usable, an experimental program has been written based on the Processing code
- from \cite{processingMT}. The program receives TUIO events and translates them
- to point \emph{down}, \emph{move} and \emph{up} events. These events are then
- interpreted to be (double or single) \emph{tap}, \emph{rotation} or
- \emph{pinch} gestures. A simple drawing program then draws the current state to
- the screen using the PyGame library. The output of the program can be seen in
- figure \ref{fig:draw}.
- \begin{figure}[H]
- \center
- \label{fig:draw}
- \includegraphics[scale=0.4]{data/experimental_draw.png}
- \caption{Output of the experimental drawing program. It draws the touch
- points and their centroid on the screen (the centroid is used as center
- point for rotation and pinch detection). It also draws a green
- rectangle which responds to rotation and pinch events.}
- \end{figure}
- One of the first observations is the fact that TUIO's \texttt{SET} messages use
- the TUIO coordinate system, as described in appendix \ref{app:tuio}. The test
- program multiplies these with its own dimensions, thus showing the entire
- screen in its window. Also, the implementation only works using the TUIO
- protocol. Other drivers are not supported.
- Though using relatively simple math, the rotation and pinch events work
- surprisingly well. Both rotation and pinch use the centroid of all touch
- points. A \emph{rotation} gesture uses the difference in angle relative to the
- centroid of all touch points, and \emph{pinch} uses the difference in distance.
- Both values are normalized using division by the number of touch points. A
- pinch event contains a scale factor, and therefore uses a division of the
- current by the previous average distance to the centroid.
- There is a flaw in this implementation. Since the centroid is calculated using
- all current touch points, there cannot be two or more rotation or pinch
- gestures simultaneously. On a large multi-touch table, it is desirable to
- support interaction with multiple hands, or multiple persons, at the same time.
- This kind of application-specific requirements should be defined in the
- application itself, whereas the experimental implementation defines detection
- algorithms based on its test program.
- Also, the different detection algorithms are all implemented in the same file,
- making it complex to read or debug, and difficult to extend.
- \chapter{Reference implementation in Python}
- \label{app:implementation}
- % TODO
- % alleen window.contains op point down, niet move/up
- % een paar simpele windows en trackers
- \end{document}
|