|
|
@@ -8,7 +8,7 @@
|
|
|
\hypersetup{colorlinks=true,linkcolor=black,urlcolor=blue,citecolor=DarkGreen}
|
|
|
|
|
|
% Title Page
|
|
|
-\title{A generic architecture for the detection of multi-touch gestures}
|
|
|
+\title{A generic architecture for gesture-based interaction}
|
|
|
\author{Taddeüs Kroes}
|
|
|
\supervisors{Dr. Robert G. Belleman (UvA)}
|
|
|
\signedby{Dr. Robert G. Belleman (UvA)}
|
|
|
@@ -30,57 +30,72 @@
|
|
|
|
|
|
\chapter{Introduction}
|
|
|
|
|
|
-% TODO: put Qt link in bibtex
|
|
|
-Multi-touch devices enable a user to interact with software using intuitive
|
|
|
-hand gestures, rather with interaction tools like mouse and keyboard. With the
|
|
|
-increasing use of touch screens in phones and tablets, multi-touch interaction is
|
|
|
-becoming increasingly common.The driver of a touch device provides low-level
|
|
|
-events. The most basic representation of these low-level events consists of
|
|
|
-\emph{down}, \emph{move} and \emph{up} events.
|
|
|
-
|
|
|
-More complex gestures must be designed in such a way, that they can be
|
|
|
-represented by a sequence of basic events. For example, a ``tap'' gesture can
|
|
|
-be represented as a \emph{down} event that is followed by an \emph{up} event
|
|
|
-within a certain time.
|
|
|
-
|
|
|
-The translation process of driver-specific messages to basic events, and events
|
|
|
-to multi-touch gestures is a process that is often embedded in multi-touch
|
|
|
-application frameworks, like Nokia's Qt \cite{qt}. However, there is no
|
|
|
-separate implementation of the process itself. Consequently, an application
|
|
|
-developer who wants to use multi-touch interaction in an application is forced
|
|
|
-to choose an application framework that includes support for multi-touch
|
|
|
-gestures. Moreover, the set of supported gestures is limited by the application
|
|
|
-framework. To incorporate some custom event in an application, the chosen
|
|
|
-framework needs to provide a way to extend existing multi-touch gestures.
|
|
|
-
|
|
|
-% Hoofdvraag
|
|
|
-The goal of this thesis is to create a generic architecture for the support of
|
|
|
-multi-touch gestures in applications. To test the design of the architecture, a
|
|
|
-reference implementation is written in Python. The architecture should
|
|
|
-incorporate the translation process of low-level driver messages to multi-touch
|
|
|
-gestures. It should be able to run beside an application framework. The
|
|
|
-definition of multi-touch gestures should allow extensions, so that custom
|
|
|
-gestures can be defined.
|
|
|
-
|
|
|
-% Deelvragen
|
|
|
-To design such an architecture properly, the following questions are relevant:
|
|
|
-\begin{itemize}
|
|
|
- \item What is the input of the architecture? This is determined by the
|
|
|
- output of multi-touch drivers.
|
|
|
- \item How can extendability of the supported gestures be accomplished?
|
|
|
- % TODO: zijn onderstaande nog relevant? beter omschrijven naar "Design"
|
|
|
- % gerelateerde vragen?
|
|
|
- \item How can the architecture be used by different programming languages?
|
|
|
- A generic architecture should not be limited to one language.
|
|
|
- \item How can the architecture serve multiple applications at the same
|
|
|
- time?
|
|
|
-\end{itemize}
|
|
|
-
|
|
|
-
|
|
|
-% Afbakening
|
|
|
-The scope of this thesis includes the design of a generic multi-touch detection
|
|
|
-architecture, a reference implementation of this design, and the integration of
|
|
|
-the reference implementation in a test case application.
|
|
|
+Surface-touch devices have evolved from pen-based tablets to single-touch
|
|
|
+trackpads, to multi-touch devices like smartphones and tablets. Multi-touch
|
|
|
+devices enable a user to interact with software using hand gestures, making the
|
|
|
+interaction more expressive and intuitive. These gestures are more complex than
|
|
|
+primitive ``click'' or ``tap'' events that are used by single-touch devices.
|
|
|
+Some examples of more complex gestures are so-called ``pinch''\footnote{A
|
|
|
+``pinch'' gesture is formed by performing a pinching movement with multiple
|
|
|
+fingers on a multi-touch surface. Pinch gestures are often used to zoom in or
|
|
|
+out on an object.} and ``flick''\footnote{A ``flick'' gesture is the act of
|
|
|
+grabbing an object and throwing it in a direction on a touch surface, giving
|
|
|
+it momentum to move for some time after the hand releases the surface.}
|
|
|
+gestures.
|
|
|
+
|
|
|
+The complexity of gestures is not limited to navigation in smartphones. Some
|
|
|
+multi-touch devices are already capable of recognizing objects touching the
|
|
|
+screen \cite[Microsoft Surface]{mssurface}. In the near future, touch screens
|
|
|
+will possibly be extended or even replaced with in-air interaction (Microsoft's
|
|
|
+Kinect \cite{kinect} and the Leap \cite{leap}).
|
|
|
+
|
|
|
+The interaction devices mentioned above generate primitive events. In the case
|
|
|
+of surface-touch devices, these are \emph{down}, \emph{move} and \emph{up}
|
|
|
+events. Application programmers who want to incorporate complex, intuitive
|
|
|
+gestures in their application face the challenge of interpreting these
|
|
|
+primitive events as gestures. With the increasing complexity of gestures, the
|
|
|
+complexity of the logic required to detect these gestures increases as well.
|
|
|
+This challenge limits, or even deters the application developer to use complex
|
|
|
+gestures in an application.
|
|
|
+
|
|
|
+The main question in this research project is whether a generic architecture
|
|
|
+for the detection of complex interaction gestures can be designed, with the
|
|
|
+capability of managing the complexity of gesture detection logic.
|
|
|
+
|
|
|
+Application frameworks for surface-touch devices, such as Nokia's Qt \cite{qt},
|
|
|
+include the detection of commonly used gestures like \emph{pinch} gestures.
|
|
|
+However, this detection logic is dependent on the application framework.
|
|
|
+Consequently, an application developer who wants to use multi-touch interaction
|
|
|
+in an application is forced to choose an application framework that includes
|
|
|
+support for multi-touch gestures. Therefore, a requirement of the generic
|
|
|
+architecture is that it must not be bound to a specific application framework.
|
|
|
+Moreover, the set of supported gestures is limited by the application framework
|
|
|
+of choice. To incorporate a custom event in an application, the application
|
|
|
+developer needs to extend the framework. This requires extensive knowledge of
|
|
|
+the framework's architecture. Also, if the same gesture is used in another
|
|
|
+application that is based on another framework, the detection logic has to be
|
|
|
+translated for use in that framework. Nevertheless, application frameworks are
|
|
|
+a necessity when it comes to fast, cross-platform development. Therefore, the
|
|
|
+architecture design should aim to be compatible with existing frameworks, but
|
|
|
+provide a way to detect and extend gestures independent of the framework.
|
|
|
+
|
|
|
+An application framework is written in a specific programming language. A
|
|
|
+generic architecture should not limited to a single programming language. The
|
|
|
+ultimate goal of this thesis is to provide support for complex gesture
|
|
|
+interaction in any application. Thus, applications should be able to address
|
|
|
+the architecture using a language-independent method of communication. This
|
|
|
+intention leads towards the concept of a dedicated gesture detection
|
|
|
+application that serves gestures to multiple programs at the same time.
|
|
|
+
|
|
|
+The scope of this thesis is limited to the detection of gestures on multi-touch
|
|
|
+surface devices. It presents a design for a generic gesture detection
|
|
|
+architecture for use in multi-touch based applications. A reference
|
|
|
+implementation of this design is used in some test case applications, whose
|
|
|
+goal is to test the effectiveness of the design and detect its shortcomings.
|
|
|
+
|
|
|
+% FIXME: Moet deze nog in de introductie?
|
|
|
+% How can the input of the architecture be normalized? This is needed, because
|
|
|
+% multi-touch drivers use their own specific message format.
|
|
|
|
|
|
\section{Structure of this document}
|
|
|
|
|
|
@@ -109,9 +124,7 @@ the reference implementation in a test case application.
|
|
|
gestures and flexibility in rule definitions, over-complexity can be
|
|
|
avoided.
|
|
|
|
|
|
- % oplossing: trackers. bijv. TapTracker, TransformationTracker gescheiden
|
|
|
-
|
|
|
- \section{Gesture recognition software for Windows 7}
|
|
|
+ \section{Gesture recognition implementation for Windows 7}
|
|
|
|
|
|
The online article \cite{win7touch} presents a Windows 7 application,
|
|
|
written in Microsofts .NET. The application shows detected gestures in a
|
|
|
@@ -128,6 +141,7 @@ the reference implementation in a test case application.
|
|
|
feature by also using different gesture trackers to track different gesture
|
|
|
types.
|
|
|
|
|
|
+ % TODO: This is not really 'related', move it to somewhere else
|
|
|
\section{Processing implementation of simple gestures in Android}
|
|
|
|
|
|
An implementation of a detection architecture for some simple multi-touch
|
|
|
@@ -168,50 +182,69 @@ the reference implementation in a test case application.
|
|
|
multi-touch gesture detection architecture. The chapter represents the
|
|
|
architecture as a diagram of relations between different components.
|
|
|
Sections \ref{sec:driver-support} to \ref{sec:event-analysis} define
|
|
|
- requirements for the archtitecture, and extend the diagram with components
|
|
|
+ requirements for the architecture, and extend the diagram with components
|
|
|
that meet these requirements. Section \ref{sec:example} describes an
|
|
|
example usage of the architecture in an application.
|
|
|
|
|
|
\subsection*{Position of architecture in software}
|
|
|
|
|
|
- The input of the architecture comes from some multi-touch device
|
|
|
- driver. The task of the architecture is to translate this input to
|
|
|
- multi-touch gestures that are used by an application, as illustrated in
|
|
|
- figure \ref{fig:basicdiagram}. In the course of this chapter, the
|
|
|
- diagram is extended with the different components of the architecture.
|
|
|
+ The input of the architecture comes from a multi-touch device driver.
|
|
|
+ The task of the architecture is to translate this input to multi-touch
|
|
|
+ gestures that are used by an application, as illustrated in figure
|
|
|
+ \ref{fig:basicdiagram}. In the course of this chapter, the diagram is
|
|
|
+ extended with the different components of the architecture.
|
|
|
|
|
|
\basicdiagram{A diagram showing the position of the architecture
|
|
|
- relative to the device driver and a multi-touch application.}
|
|
|
+ relative to the device driver and a multi-touch application. The input
|
|
|
+ of the architecture is given by a touch device driver. This output is
|
|
|
+ translated to complex interaction gestures and passed to the
|
|
|
+ application that is using the architecture.}
|
|
|
|
|
|
\section{Supporting multiple drivers}
|
|
|
\label{sec:driver-support}
|
|
|
|
|
|
The TUIO protocol \cite{TUIO} is an example of a touch driver that can be
|
|
|
- used by multi-touch devices. Other drivers do exist, which should also be
|
|
|
- supported by the architecture. Therefore, there must be some translation of
|
|
|
- driver-specific messages to a common format in the arcitecture. Messages in
|
|
|
- this common format will be called \emph{events}. Events can be translated
|
|
|
- to multi-touch \emph{gestures}. The most basic set of events is
|
|
|
- $\{point\_down, point\_move, point\_up\}$. Here, a ``point'' is a touch
|
|
|
- object with only an (x, y) position on the screen.
|
|
|
-
|
|
|
- A more extended set could also contain more complex events. An object can
|
|
|
- also have a rotational property, like the ``fiducials''\footnote{A fiducial
|
|
|
- is a pattern used by some touch devices to identify objects.} type
|
|
|
- in the TUIO protocol. This results in $\{point\_down, point\_move,\\
|
|
|
- point\_up, object\_down, object\_move, object\_up, object\_rotate\}$.
|
|
|
-
|
|
|
- The component that translates driver-specific messages to events, is called
|
|
|
- the \emph{event driver}. The event driver runs in a loop, receiving and
|
|
|
- analyzing driver messages. The event driver that is used in an application
|
|
|
- is dependent of the support of the multi-touch device.
|
|
|
-
|
|
|
- When a sequence of messages is analyzed as an event, the event driver
|
|
|
- delegates the event to other components in the architecture for translation
|
|
|
- to gestures.
|
|
|
+ used by multi-touch devices. TUIO uses ALIVE- and SET-messages to communicate
|
|
|
+ low-level touch events (see appendix \ref{app:tuio} for more details).
|
|
|
+ These messages are specific to the API of the TUIO protocol. Other touch
|
|
|
+ drivers may use very different messages types. To support more than
|
|
|
+ one driver in the architecture, there must be some translation from
|
|
|
+ driver-specific messages to a common format for primitive touch events.
|
|
|
+ After all, the gesture detection logic in a ``generic'' architecture should
|
|
|
+ not be implemented based on driver-specific messages. The event types in
|
|
|
+ this format should be chosen so that multiple drivers can trigger the same
|
|
|
+ events. If each supported driver adds its own set of event types to the
|
|
|
+ common format, it the purpose of being ``common'' would be defeated.
|
|
|
+
|
|
|
+ A reasonable expectation for a touch device driver is that it detects
|
|
|
+ simple touch points, with a ``point'' being an object at an $(x, y)$
|
|
|
+ position on the touch surface. This yields a basic set of events:
|
|
|
+ $\{point\_down, point\_move, point\_up\}$.
|
|
|
+
|
|
|
+ The TUIO protocol supports fiducials\footnote{A fiducial is a pattern used
|
|
|
+ by some touch devices to identify objects.}, which also have a rotational
|
|
|
+ property. This results in a more extended set: $\{point\_down, point\_move,
|
|
|
+ point\_up, object\_down, object\_move, object\_up,\\ object\_rotate\}$.
|
|
|
+ Due to their generic nature, the use of these events is not limited to the
|
|
|
+ TUIO protocol. Another driver that can keep apart rotated objects from
|
|
|
+ simple touch points could also trigger them.
|
|
|
+
|
|
|
+ The component that translates driver-specific messages to common events,
|
|
|
+ will be called the \emph{event driver}. The event driver runs in a loop,
|
|
|
+ receiving and analyzing driver messages. When a sequence of messages is
|
|
|
+ analyzed as an event, the event driver delegates the event to other
|
|
|
+ components in the architecture for translation to gestures. This
|
|
|
+ communication flow is illustrated in figure \ref{fig:driverdiagram}.
|
|
|
+
|
|
|
+ A touch device driver can be supported by adding an event driver
|
|
|
+ implementation for it. The event driver implementation that is used in an
|
|
|
+ application is dependent of the support of the touch device.
|
|
|
|
|
|
\driverdiagram{Extension of the diagram from figure \ref{fig:basicdiagram},
|
|
|
- showing the position of the event driver in the architecture.}
|
|
|
+ showing the position of the event driver in the architecture. The event
|
|
|
+ driver translates driver-specific to a common set of events, which are
|
|
|
+ delegated to analysis components that will interpret them as more complex
|
|
|
+ gestures.}
|
|
|
|
|
|
\section{Restricting gestures to a screen area}
|
|
|
|