13 years ago · d509d424de
--- a/docs/report.bib
+++ b/docs/report.bib
@@ -114,3 +114,25 @@
 
				 	x-fetchedfrom = "Bibsonomy",
			
 
				 	year = 2012
			
 
				 }
			
 
				+
			
 
				+@misc{mssurface,
			
 
				+	author = "Corporation, Microsoft",
			
 
				+	howpublished = "\url{http://www.samsunglfd.com/product/feature.do?modelCd=SUR40}",
			
 
				+	title = "{Microsoft Surface}",
			
 
				+	year = "2011"
			
 
				+}
			
 
				+
			
 
				+@misc{kinect,
			
 
				+	author = "Corporation, Microsoft",
			
 
				+	howpublished = "\url{http://www.microsoft.com/en-us/kinectforwindows/}",
			
 
				+	title = "{Microsoft kinect}",
			
 
				+	year = "2010"
			
 
				+}
			
 
				+
			
 
				+@misc{leap,
			
 
				+	author = "{David Holz}, Michael Buckwald (the Leap Motion team)",
			
 
				+	howpublished = "\url{http://leapmotion.com/}",
			
 
				+	title = "{Leap}",
			
 
				+	year = "2012"
			
 
				+}
			
 
				+
			
--- a/docs/report.tex
+++ b/docs/report.tex
@@ -8,7 +8,7 @@
 
				 \hypersetup{colorlinks=true,linkcolor=black,urlcolor=blue,citecolor=DarkGreen}
			
 
				 
			
 
				 % Title Page
			
 
				-\title{A generic architecture for the detection of multi-touch gestures}
			
 
				+\title{A generic architecture for gesture-based interaction}
			
 
				 \author{Taddeüs Kroes}
			
 
				 \supervisors{Dr. Robert G. Belleman (UvA)}
			
 
				 \signedby{Dr. Robert G. Belleman (UvA)}
			
@@ -30,57 +30,72 @@
 
				 
			
 
				 \chapter{Introduction}
			
 
				 
			
 
				-% TODO: put Qt link in bibtex
			
 
				-Multi-touch devices enable a user to interact with software using intuitive
			
 
				-hand gestures, rather with interaction tools like mouse and keyboard. With the
			
 
				-increasing use of touch screens in phones and tablets, multi-touch interaction is
			
 
				-becoming increasingly common.The driver of a touch device provides low-level
			
 
				-events. The most basic representation of these low-level events consists of
			
 
				-\emph{down}, \emph{move} and \emph{up} events.
			
 
				-
			
 
				-More complex gestures must be designed in such a way, that they can be
			
 
				-represented by a sequence of basic events. For example, a ``tap'' gesture can
			
 
				-be represented as a \emph{down} event that is followed by an \emph{up} event
			
 
				-within a certain time.
			
 
				-
			
 
				-The translation process of driver-specific messages to basic events, and events
			
 
				-to multi-touch gestures is a process that is often embedded in multi-touch
			
 
				-application frameworks, like Nokia's Qt \cite{qt}. However, there is no
			
 
				-separate implementation of the process itself. Consequently, an application
			
 
				-developer who wants to use multi-touch interaction in an application is forced
			
 
				-to choose an application framework that includes support for multi-touch
			
 
				-gestures. Moreover, the set of supported gestures is limited by the application
			
 
				-framework. To incorporate some custom event in an application, the chosen
			
 
				-framework needs to provide a way to extend existing multi-touch gestures.
			
 
				-
			
 
				-% Hoofdvraag
			
 
				-The goal of this thesis is to create a generic architecture for the support of
			
 
				-multi-touch gestures in applications. To test the design of the architecture, a
			
 
				-reference implementation is written in Python. The architecture should
			
 
				-incorporate the translation process of low-level driver messages to multi-touch
			
 
				-gestures. It should be able to run beside an application framework. The
			
 
				-definition of multi-touch gestures should allow extensions, so that custom
			
 
				-gestures can be defined.
			
 
				-
			
 
				-% Deelvragen
			
 
				-To design such an architecture properly, the following questions are relevant:
			
 
				-\begin{itemize}
			
 
				-    \item What is the input of the architecture? This is determined by the
			
 
				-        output of multi-touch drivers.
			
 
				-    \item How can extendability of the supported gestures be accomplished?
			
 
				-    % TODO: zijn onderstaande nog relevant? beter omschrijven naar "Design"
			
 
				-    % gerelateerde vragen?
			
 
				-    \item How can the architecture be used by different programming languages?
			
 
				-        A generic architecture should not be limited to one language.
			
 
				-    \item How can the architecture serve multiple applications at the same
			
 
				-        time?
			
 
				-\end{itemize}
			
 
				-
			
 
				-
			
 
				-% Afbakening
			
 
				-The scope of this thesis includes the design of a generic multi-touch detection
			
 
				-architecture, a reference implementation of this design, and the integration of
			
 
				-the reference implementation in a test case application.
			
 
				+Surface-touch devices have evolved from pen-based tablets to single-touch
			
 
				+trackpads, to multi-touch devices like smartphones and tablets. Multi-touch
			
 
				+devices enable a user to interact with software using hand gestures, making the
			
 
				+interaction more expressive and intuitive. These gestures are more complex than
			
 
				+primitive ``click'' or ``tap'' events that are used by single-touch devices.
			
 
				+Some examples of more complex gestures are so-called ``pinch''\footnote{A
			
 
				+``pinch'' gesture is formed by performing a pinching movement with multiple
			
 
				+fingers on a multi-touch surface. Pinch gestures are often used to zoom in or
			
 
				+out on an object.} and ``flick''\footnote{A ``flick'' gesture is the act of
			
 
				+grabbing an object and throwing it in a direction on a touch surface, giving
			
 
				+it momentum to move for some time after the hand releases the surface.}
			
 
				+gestures.
			
 
				+
			
 
				+The complexity of gestures is not limited to navigation in smartphones. Some
			
 
				+multi-touch devices are already capable of recognizing objects touching the
			
 
				+screen \cite[Microsoft Surface]{mssurface}. In the near future, touch screens
			
 
				+will possibly be extended or even replaced with in-air interaction (Microsoft's
			
 
				+Kinect \cite{kinect} and the Leap \cite{leap}).
			
 
				+
			
 
				+The interaction devices mentioned above generate primitive events. In the case
			
 
				+of surface-touch devices, these are \emph{down}, \emph{move} and \emph{up}
			
 
				+events. Application programmers who want to incorporate complex, intuitive
			
 
				+gestures in their application face the challenge of interpreting these
			
 
				+primitive events as gestures. With the increasing complexity of gestures, the
			
 
				+complexity of the logic required to detect these gestures increases as well.
			
 
				+This challenge limits, or even deters the application developer to use complex
			
 
				+gestures in an application.
			
 
				+
			
 
				+The main question in this research project is whether a generic architecture
			
 
				+for the detection of complex interaction gestures can be designed, with the
			
 
				+capability of managing the complexity of gesture detection logic.
			
 
				+
			
 
				+Application frameworks for surface-touch devices, such as Nokia's Qt \cite{qt},
			
 
				+include the detection of commonly used gestures like \emph{pinch} gestures.
			
 
				+However, this detection logic is dependent on the application framework.
			
 
				+Consequently, an application developer who wants to use multi-touch interaction
			
 
				+in an application is forced to choose an application framework that includes
			
 
				+support for multi-touch gestures. Therefore, a requirement of the generic
			
 
				+architecture is that it must not be bound to a specific application framework.
			
 
				+Moreover, the set of supported gestures is limited by the application framework
			
 
				+of choice. To incorporate a custom event in an application, the application
			
 
				+developer needs to extend the framework. This requires extensive knowledge of
			
 
				+the framework's architecture. Also, if the same gesture is used in another
			
 
				+application that is based on another framework, the detection logic has to be
			
 
				+translated for use in that framework. Nevertheless, application frameworks are
			
 
				+a necessity when it comes to fast, cross-platform development. Therefore, the
			
 
				+architecture design should aim to be compatible with existing frameworks, but
			
 
				+provide a way to detect and extend gestures independent of the framework.
			
 
				+
			
 
				+An application framework is written in a specific programming language. A
			
 
				+generic architecture should not limited to a single programming language. The
			
 
				+ultimate goal of this thesis is to provide support for complex gesture
			
 
				+interaction in any application. Thus, applications should be able to address
			
 
				+the architecture using a language-independent method of communication. This
			
 
				+intention leads towards the concept of a dedicated gesture detection
			
 
				+application that serves gestures to multiple programs at the same time.
			
 
				+
			
 
				+The scope of this thesis is limited to the detection of gestures on multi-touch
			
 
				+surface devices. It presents a design for a generic gesture detection
			
 
				+architecture for use in multi-touch based applications. A reference
			
 
				+implementation of this design is used in some test case applications, whose
			
 
				+goal is to test the effectiveness of the design and detect its shortcomings.
			
 
				+
			
 
				+% FIXME: Moet deze nog in de introductie?
			
 
				+% How can the input of the architecture be normalized? This is needed, because
			
 
				+% multi-touch drivers use their own specific message format.
			
 
				 
			
 
				     \section{Structure of this document}
			
 
				 
			
@@ -109,9 +124,7 @@ the reference implementation in a test case application.
 
				     gestures and flexibility in rule definitions, over-complexity can be
			
 
				     avoided.
			
 
				 
			
 
				-    % oplossing: trackers. bijv. TapTracker, TransformationTracker gescheiden
			
 
				-
			
 
				-    \section{Gesture recognition software for Windows 7}
			
 
				+    \section{Gesture recognition implementation for Windows 7}
			
 
				 
			
 
				     The online article \cite{win7touch} presents a Windows 7 application,
			
 
				     written in Microsofts .NET. The application shows detected gestures in a
			
@@ -128,6 +141,7 @@ the reference implementation in a test case application.
 
				     feature by also using different gesture trackers to track different gesture
			
 
				     types.
			
 
				 
			
 
				+    % TODO: This is not really 'related', move it to somewhere else
			
 
				     \section{Processing implementation of simple gestures in Android}
			
 
				 
			
 
				     An implementation of a detection architecture for some simple multi-touch
			
@@ -168,50 +182,69 @@ the reference implementation in a test case application.
 
				     multi-touch gesture detection architecture. The chapter represents the
			
 
				     architecture as a diagram of relations between different components.
			
 
				     Sections \ref{sec:driver-support} to \ref{sec:event-analysis} define
			
 
				-    requirements for the archtitecture, and extend the diagram with components
			
 
				+    requirements for the architecture, and extend the diagram with components
			
 
				     that meet these requirements. Section \ref{sec:example} describes an
			
 
				     example usage of the architecture in an application.
			
 
				 
			
 
				     \subsection*{Position of architecture in software}
			
 
				 
			
 
				-        The input of the architecture comes from some multi-touch device
			
 
				-        driver. The task of the architecture is to translate this input to
			
 
				-        multi-touch gestures that are used by an application, as illustrated in
			
 
				-        figure \ref{fig:basicdiagram}. In the course of this chapter, the
			
 
				-        diagram is extended with the different components of the architecture.
			
 
				+        The input of the architecture comes from a multi-touch device driver.
			
 
				+        The task of the architecture is to translate this input to multi-touch
			
 
				+        gestures that are used by an application, as illustrated in figure
			
 
				+        \ref{fig:basicdiagram}. In the course of this chapter, the diagram is
			
 
				+        extended with the different components of the architecture.
			
 
				 
			
 
				         \basicdiagram{A diagram showing the position of the architecture
			
 
				-        relative to the device driver and a multi-touch application.}
			
 
				+        relative to the device driver and a multi-touch application. The input
			
 
				+        of the architecture is given by a touch device driver. This output is
			
 
				+        translated to complex interaction gestures and passed to the
			
 
				+        application that is using the architecture.}
			
 
				 
			
 
				     \section{Supporting multiple drivers}
			
 
				     \label{sec:driver-support}
			
 
				 
			
 
				     The TUIO protocol \cite{TUIO} is an example of a touch driver that can be
			
 
				-    used by multi-touch devices. Other drivers do exist, which should also be
			
 
				-    supported by the architecture. Therefore, there must be some translation of
			
 
				-    driver-specific messages to a common format in the arcitecture. Messages in
			
 
				-    this common format will be called \emph{events}. Events can be translated
			
 
				-    to multi-touch \emph{gestures}. The most basic set of events is
			
 
				-    $\{point\_down, point\_move, point\_up\}$. Here, a ``point'' is a touch
			
 
				-    object with only an (x, y) position on the screen.
			
 
				-
			
 
				-    A more extended set could also contain more complex events. An object can
			
 
				-    also have a rotational property, like the ``fiducials''\footnote{A fiducial
			
 
				-    is a pattern used by some touch devices to identify objects.} type
			
 
				-    in the TUIO protocol. This results in $\{point\_down, point\_move,\\
			
 
				-    point\_up, object\_down, object\_move, object\_up, object\_rotate\}$.
			
 
				-
			
 
				-    The component that translates driver-specific messages to events, is called
			
 
				-    the \emph{event driver}. The event driver runs in a loop, receiving and
			
 
				-    analyzing driver messages. The event driver that is used in an application
			
 
				-    is dependent of the support of the multi-touch device.
			
 
				-
			
 
				-    When a sequence of messages is analyzed as an event, the event driver
			
 
				-    delegates the event to other components in the architecture for translation
			
 
				-    to gestures.
			
 
				+    used by multi-touch devices. TUIO uses ALIVE- and SET-messages to communicate
			
 
				+    low-level touch events (see appendix \ref{app:tuio} for more details).
			
 
				+    These messages are specific to the API of the TUIO protocol. Other touch
			
 
				+    drivers may use very different messages types. To support more than
			
 
				+    one driver in the architecture, there must be some translation from
			
 
				+    driver-specific messages to a common format for primitive touch events.
			
 
				+    After all, the gesture detection logic in a ``generic'' architecture should
			
 
				+    not be implemented based on driver-specific messages.  The event types in
			
 
				+    this format should be chosen so that multiple drivers can trigger the same
			
 
				+    events. If each supported driver adds its own set of event types to the
			
 
				+    common format, it the purpose of being ``common'' would be defeated.
			
 
				+
			
 
				+    A reasonable expectation for a touch device driver is that it detects
			
 
				+    simple touch points, with a ``point'' being an object at an $(x, y)$
			
 
				+    position on the touch surface. This yields a basic set of events:
			
 
				+    $\{point\_down, point\_move, point\_up\}$.
			
 
				+
			
 
				+    The TUIO protocol supports fiducials\footnote{A fiducial is a pattern used
			
 
				+    by some touch devices to identify objects.}, which also have a rotational
			
 
				+    property. This results in a more extended set: $\{point\_down, point\_move,
			
 
				+    point\_up, object\_down, object\_move, object\_up,\\ object\_rotate\}$.
			
 
				+    Due to their generic nature, the use of these events is not limited to the
			
 
				+    TUIO protocol. Another driver that can keep apart rotated objects from
			
 
				+    simple touch points could also trigger them.
			
 
				+
			
 
				+    The component that translates driver-specific messages to common events,
			
 
				+    will be called the \emph{event driver}. The event driver runs in a loop,
			
 
				+    receiving and analyzing driver messages. When a sequence of messages is
			
 
				+    analyzed as an event, the event driver delegates the event to other
			
 
				+    components in the architecture for translation to gestures. This
			
 
				+    communication flow is illustrated in figure \ref{fig:driverdiagram}.
			
 
				+
			
 
				+    A touch device driver can be supported by adding an event driver
			
 
				+    implementation for it.  The event driver implementation that is used in an
			
 
				+    application is dependent of the support of the touch device.
			
 
				 
			
 
				     \driverdiagram{Extension of the diagram from figure \ref{fig:basicdiagram},
			
 
				-    showing the position of the event driver in the architecture.}
			
 
				+    showing the position of the event driver in the architecture. The event
			
 
				+    driver translates driver-specific to a common set of events, which are
			
 
				+    delegated to analysis components that will interpret them as more complex
			
 
				+    gestures.}
			
 
				 
			
 
				     \section{Restricting gestures to a screen area}