Improved 'Introduction' and 'Design - Supporting multiple drivers' in report.

d509d424 · Taddeüs Kroes · 019f0e6d · d509d424 · d509d424
Commit d509d424 authored Jun 11, 2012 by Taddeüs Kroes
Show whitespace changes
Inline Side-by-side

Showing with 140 additions and 85 deletions

docs/report.bib docs/report.bib +22 -0

docs/report.tex docs/report.tex +118 -85

No files found.
--- a/docs/report.bib
+++ b/docs/report.bib
@@ -114,3 +114,25 @@
 	x-fetchedfrom = "Bibsonomy",
 	year = 2012
 }
+
+@misc{mssurface,
+	author = "Corporation, Microsoft",
+	howpublished = "\url{http://www.samsunglfd.com/product/feature.do?modelCd=SUR40}",
+	title = "{Microsoft Surface}",
+	year = "2011"
+}
+
+@misc{kinect,
+	author = "Corporation, Microsoft",
+	howpublished = "\url{http://www.microsoft.com/en-us/kinectforwindows/}",
+	title = "{Microsoft kinect}",
+	year = "2010"
+}
+
+@misc{leap,
+	author = "{David Holz}, Michael Buckwald (the Leap Motion team)",
+	howpublished = "\url{http://leapmotion.com/}",
+	title = "{Leap}",
+	year = "2012"
+}
+
--- a/docs/report.tex
+++ b/docs/report.tex
@@ -8,7 +8,7 @@
 \hypersetup{colorlinks=true,linkcolor=black,urlcolor=blue,citecolor=DarkGreen}

 % Title Page
-\title{A generic architecture for the detection of multi-touch gestures}
+\title{A generic architecture for gesture-based interaction}
 \author{Taddeüs Kroes}
 \supervisors{Dr. Robert G. Belleman (UvA)}
 \signedby{Dr. Robert G. Belleman (UvA)}
@@ -30,57 +30,72 @@

 \chapter{Introduction}

-% TODO: put Qt link in bibtex
-Multi-touch devices enable a user to interact with software using intuitive
-hand gestures, rather with interaction tools like mouse and keyboard. With the
-increasing use of touch screens in phones and tablets, multi-touch interaction is
-becoming increasingly common.The driver of a touch device provides low-level
-events. The most basic representation of these low-level events consists of
-\emph{down}, \emph{move} and \emph{up} events.
-
-More complex gestures must be designed in such a way, that they can be
-represented by a sequence of basic events. For example, a ``tap'' gesture can
-be represented as a \emph{down} event that is followed by an \emph{up} event
-within a certain time.
-
-The translation process of driver-specific messages to basic events, and events
-to multi-touch gestures is a process that is often embedded in multi-touch
-application frameworks, like Nokia's Qt \cite{qt}. However, there is no
-separate implementation of the process itself. Consequently, an application
-developer who wants to use multi-touch interaction in an application is forced
-to choose an application framework that includes support for multi-touch
-gestures. Moreover, the set of supported gestures is limited by the application
-framework. To incorporate some custom event in an application, the chosen
-framework needs to provide a way to extend existing multi-touch gestures.
-
-% Hoofdvraag
-The goal of this thesis is to create a generic architecture for the support of
-multi-touch gestures in applications. To test the design of the architecture, a
-reference implementation is written in Python. The architecture should
-incorporate the translation process of low-level driver messages to multi-touch
-gestures. It should be able to run beside an application framework. The
-definition of multi-touch gestures should allow extensions, so that custom
-gestures can be defined.
-
-% Deelvragen
-To design such an architecture properly, the following questions are relevant:
-\begin{itemize}
-    \item What is the input of the architecture? This is determined by the
-        output of multi-touch drivers.
-    \item How can extendability of the supported gestures be accomplished?
-    % TODO: zijn onderstaande nog relevant? beter omschrijven naar "Design"
-    % gerelateerde vragen?
-    \item How can the architecture be used by different programming languages?
-        A generic architecture should not be limited to one language.
-    \item How can the architecture serve multiple applications at the same
-        time?
-\end{itemize}
-
-
-% Afbakening
-The scope of this thesis includes the design of a generic multi-touch detection
-architecture, a reference implementation of this design, and the integration of
-the reference implementation in a test case application.
+Surface-touch devices have evolved from pen-based tablets to single-touch
+trackpads, to multi-touch devices like smartphones and tablets. Multi-touch
+devices enable a user to interact with software using hand gestures, making the
+interaction more expressive and intuitive. These gestures are more complex than
+primitive ``click'' or ``tap'' events that are used by single-touch devices.
+Some examples of more complex gestures are so-called ``pinch''\footnote{A
+``pinch'' gesture is formed by performing a pinching movement with multiple
+fingers on a multi-touch surface. Pinch gestures are often used to zoom in or
+out on an object.} and ``flick''\footnote{A ``flick'' gesture is the act of
+grabbing an object and throwing it in a direction on a touch surface, giving
+it momentum to move for some time after the hand releases the surface.}
+gestures.
+
+The complexity of gestures is not limited to navigation in smartphones. Some
+multi-touch devices are already capable of recognizing objects touching the
+screen \cite[Microsoft Surface]{mssurface}. In the near future, touch screens
+will possibly be extended or even replaced with in-air interaction (Microsoft's
+Kinect \cite{kinect} and the Leap \cite{leap}).
+
+The interaction devices mentioned above generate primitive events. In the case
+of surface-touch devices, these are \emph{down}, \emph{move} and \emph{up}
+events. Application programmers who want to incorporate complex, intuitive
+gestures in their application face the challenge of interpreting these
+primitive events as gestures. With the increasing complexity of gestures, the
+complexity of the logic required to detect these gestures increases as well.
+This challenge limits, or even deters the application developer to use complex
+gestures in an application.
+
+The main question in this research project is whether a generic architecture
+for the detection of complex interaction gestures can be designed, with the
+capability of managing the complexity of gesture detection logic.
+
+Application frameworks for surface-touch devices, such as Nokia's Qt \cite{qt},
+include the detection of commonly used gestures like \emph{pinch} gestures.
+However, this detection logic is dependent on the application framework.
+Consequently, an application developer who wants to use multi-touch interaction
+in an application is forced to choose an application framework that includes
+support for multi-touch gestures. Therefore, a requirement of the generic
+architecture is that it must not be bound to a specific application framework.
+Moreover, the set of supported gestures is limited by the application framework
+of choice. To incorporate a custom event in an application, the application
+developer needs to extend the framework. This requires extensive knowledge of
+the framework's architecture. Also, if the same gesture is used in another
+application that is based on another framework, the detection logic has to be
+translated for use in that framework. Nevertheless, application frameworks are
+a necessity when it comes to fast, cross-platform development. Therefore, the
+architecture design should aim to be compatible with existing frameworks, but
+provide a way to detect and extend gestures independent of the framework.
+
+An application framework is written in a specific programming language. A
+generic architecture should not limited to a single programming language. The
+ultimate goal of this thesis is to provide support for complex gesture
+interaction in any application. Thus, applications should be able to address
+the architecture using a language-independent method of communication. This
+intention leads towards the concept of a dedicated gesture detection
+application that serves gestures to multiple programs at the same time.
+
+The scope of this thesis is limited to the detection of gestures on multi-touch
+surface devices. It presents a design for a generic gesture detection
+architecture for use in multi-touch based applications. A reference
+implementation of this design is used in some test case applications, whose
+goal is to test the effectiveness of the design and detect its shortcomings.
+
+% FIXME: Moet deze nog in de introductie?
+% How can the input of the architecture be normalized? This is needed, because
+% multi-touch drivers use their own specific message format.

    \section{Structure of this document}

@@ -109,9 +124,7 @@ the reference implementation in a test case application.
    gestures and flexibility in rule definitions, over-complexity can be
    avoided.

-    % oplossing: trackers. bijv. TapTracker, TransformationTracker gescheiden
-
-    \section{Gesture recognition software for Windows 7}
+    \section{Gesture recognition implementation for Windows 7}

    The online article \cite{win7touch} presents a Windows 7 application,
    written in Microsofts .NET. The application shows detected gestures in a
@@ -128,6 +141,7 @@ the reference implementation in a test case application.
    feature by also using different gesture trackers to track different gesture
    types.

+    % TODO: This is not really 'related', move it to somewhere else
    \section{Processing implementation of simple gestures in Android}

    An implementation of a detection architecture for some simple multi-touch
@@ -168,50 +182,69 @@ the reference implementation in a test case application.
    multi-touch gesture detection architecture. The chapter represents the
    architecture as a diagram of relations between different components.
    Sections \ref{sec:driver-support} to \ref{sec:event-analysis} define
-    requirements for the archtitecture, and extend the diagram with components
+    requirements for the architecture, and extend the diagram with components
    that meet these requirements. Section \ref{sec:example} describes an
    example usage of the architecture in an application.

    \subsection*{Position of architecture in software}

-        The input of the architecture comes from some multi-touch device
-        driver. The task of the architecture is to translate this input to
-        multi-touch gestures that are used by an application, as illustrated in
-        figure \ref{fig:basicdiagram}. In the course of this chapter, the
-        diagram is extended with the different components of the architecture.
+        The input of the architecture comes from a multi-touch device driver.
+        The task of the architecture is to translate this input to multi-touch
+        gestures that are used by an application, as illustrated in figure
+        \ref{fig:basicdiagram}. In the course of this chapter, the diagram is
+        extended with the different components of the architecture.

        \basicdiagram{A diagram showing the position of the architecture
-        relative to the device driver and a multi-touch application.}
+        relative to the device driver and a multi-touch application. The input
+        of the architecture is given by a touch device driver. This output is
+        translated to complex interaction gestures and passed to the
+        application that is using the architecture.}

    \section{Supporting multiple drivers}
    \label{sec:driver-support}

    The TUIO protocol \cite{TUIO} is an example of a touch driver that can be
-    used by multi-touch devices. Other drivers do exist, which should also be
-    supported by the architecture. Therefore, there must be some translation of
-    driver-specific messages to a common format in the arcitecture. Messages in
-    this common format will be called \emph{events}. Events can be translated
-    to multi-touch \emph{gestures}. The most basic set of events is
-    $\{point\_down, point\_move, point\_up\}$. Here, a ``point'' is a touch
-    object with only an (x, y) position on the screen.
-
-    A more extended set could also contain more complex events. An object can
-    also have a rotational property, like the ``fiducials''\footnote{A fiducial
-    is a pattern used by some touch devices to identify objects.} type
-    in the TUIO protocol. This results in $\{point\_down, point\_move,\\
-    point\_up, object\_down, object\_move, object\_up, object\_rotate\}$.
-
-    The component that translates driver-specific messages to events, is called
-    the \emph{event driver}. The event driver runs in a loop, receiving and
-    analyzing driver messages. The event driver that is used in an application
-    is dependent of the support of the multi-touch device.
-
-    When a sequence of messages is analyzed as an event, the event driver
-    delegates the event to other components in the architecture for translation
-    to gestures.
+    used by multi-touch devices. TUIO uses ALIVE- and SET-messages to communicate
+    low-level touch events (see appendix \ref{app:tuio} for more details).
+    These messages are specific to the API of the TUIO protocol. Other touch
+    drivers may use very different messages types. To support more than
+    one driver in the architecture, there must be some translation from
+    driver-specific messages to a common format for primitive touch events.
+    After all, the gesture detection logic in a ``generic'' architecture should
+    not be implemented based on driver-specific messages.  The event types in
+    this format should be chosen so that multiple drivers can trigger the same
+    events. If each supported driver adds its own set of event types to the
+    common format, it the purpose of being ``common'' would be defeated.
+
+    A reasonable expectation for a touch device driver is that it detects
+    simple touch points, with a ``point'' being an object at an $(x, y)$
+    position on the touch surface. This yields a basic set of events:
+    $\{point\_down, point\_move, point\_up\}$.
+
+    The TUIO protocol supports fiducials\footnote{A fiducial is a pattern used
+    by some touch devices to identify objects.}, which also have a rotational
+    property. This results in a more extended set: $\{point\_down, point\_move,
+    point\_up, object\_down, object\_move, object\_up,\\ object\_rotate\}$.
+    Due to their generic nature, the use of these events is not limited to the
+    TUIO protocol. Another driver that can keep apart rotated objects from
+    simple touch points could also trigger them.
+
+    The component that translates driver-specific messages to common events,
+    will be called the \emph{event driver}. The event driver runs in a loop,
+    receiving and analyzing driver messages. When a sequence of messages is
+    analyzed as an event, the event driver delegates the event to other
+    components in the architecture for translation to gestures. This
+    communication flow is illustrated in figure \ref{fig:driverdiagram}.
+
+    A touch device driver can be supported by adding an event driver
+    implementation for it.  The event driver implementation that is used in an
+    application is dependent of the support of the touch device.

    \driverdiagram{Extension of the diagram from figure \ref{fig:basicdiagram},
-    showing the position of the event driver in the architecture.}
+    showing the position of the event driver in the architecture. The event
+    driver translates driver-specific to a common set of events, which are
+    delegated to analysis components that will interpret them as more complex
+    gestures.}

    \section{Restricting gestures to a screen area}