Commit d509d424 authored by Taddeüs Kroes's avatar Taddeüs Kroes

Improved 'Introduction' and 'Design - Supporting multiple drivers' in report.

parent 019f0e6d
......@@ -114,3 +114,25 @@
x-fetchedfrom = "Bibsonomy",
year = 2012
}
@misc{mssurface,
author = "Corporation, Microsoft",
howpublished = "\url{http://www.samsunglfd.com/product/feature.do?modelCd=SUR40}",
title = "{Microsoft Surface}",
year = "2011"
}
@misc{kinect,
author = "Corporation, Microsoft",
howpublished = "\url{http://www.microsoft.com/en-us/kinectforwindows/}",
title = "{Microsoft kinect}",
year = "2010"
}
@misc{leap,
author = "{David Holz}, Michael Buckwald (the Leap Motion team)",
howpublished = "\url{http://leapmotion.com/}",
title = "{Leap}",
year = "2012"
}
......@@ -8,7 +8,7 @@
\hypersetup{colorlinks=true,linkcolor=black,urlcolor=blue,citecolor=DarkGreen}
% Title Page
\title{A generic architecture for the detection of multi-touch gestures}
\title{A generic architecture for gesture-based interaction}
\author{Taddeüs Kroes}
\supervisors{Dr. Robert G. Belleman (UvA)}
\signedby{Dr. Robert G. Belleman (UvA)}
......@@ -30,57 +30,72 @@
\chapter{Introduction}
% TODO: put Qt link in bibtex
Multi-touch devices enable a user to interact with software using intuitive
hand gestures, rather with interaction tools like mouse and keyboard. With the
increasing use of touch screens in phones and tablets, multi-touch interaction is
becoming increasingly common.The driver of a touch device provides low-level
events. The most basic representation of these low-level events consists of
\emph{down}, \emph{move} and \emph{up} events.
More complex gestures must be designed in such a way, that they can be
represented by a sequence of basic events. For example, a ``tap'' gesture can
be represented as a \emph{down} event that is followed by an \emph{up} event
within a certain time.
The translation process of driver-specific messages to basic events, and events
to multi-touch gestures is a process that is often embedded in multi-touch
application frameworks, like Nokia's Qt \cite{qt}. However, there is no
separate implementation of the process itself. Consequently, an application
developer who wants to use multi-touch interaction in an application is forced
to choose an application framework that includes support for multi-touch
gestures. Moreover, the set of supported gestures is limited by the application
framework. To incorporate some custom event in an application, the chosen
framework needs to provide a way to extend existing multi-touch gestures.
% Hoofdvraag
The goal of this thesis is to create a generic architecture for the support of
multi-touch gestures in applications. To test the design of the architecture, a
reference implementation is written in Python. The architecture should
incorporate the translation process of low-level driver messages to multi-touch
gestures. It should be able to run beside an application framework. The
definition of multi-touch gestures should allow extensions, so that custom
gestures can be defined.
% Deelvragen
To design such an architecture properly, the following questions are relevant:
\begin{itemize}
\item What is the input of the architecture? This is determined by the
output of multi-touch drivers.
\item How can extendability of the supported gestures be accomplished?
% TODO: zijn onderstaande nog relevant? beter omschrijven naar "Design"
% gerelateerde vragen?
\item How can the architecture be used by different programming languages?
A generic architecture should not be limited to one language.
\item How can the architecture serve multiple applications at the same
time?
\end{itemize}
% Afbakening
The scope of this thesis includes the design of a generic multi-touch detection
architecture, a reference implementation of this design, and the integration of
the reference implementation in a test case application.
Surface-touch devices have evolved from pen-based tablets to single-touch
trackpads, to multi-touch devices like smartphones and tablets. Multi-touch
devices enable a user to interact with software using hand gestures, making the
interaction more expressive and intuitive. These gestures are more complex than
primitive ``click'' or ``tap'' events that are used by single-touch devices.
Some examples of more complex gestures are so-called ``pinch''\footnote{A
``pinch'' gesture is formed by performing a pinching movement with multiple
fingers on a multi-touch surface. Pinch gestures are often used to zoom in or
out on an object.} and ``flick''\footnote{A ``flick'' gesture is the act of
grabbing an object and throwing it in a direction on a touch surface, giving
it momentum to move for some time after the hand releases the surface.}
gestures.
The complexity of gestures is not limited to navigation in smartphones. Some
multi-touch devices are already capable of recognizing objects touching the
screen \cite[Microsoft Surface]{mssurface}. In the near future, touch screens
will possibly be extended or even replaced with in-air interaction (Microsoft's
Kinect \cite{kinect} and the Leap \cite{leap}).
The interaction devices mentioned above generate primitive events. In the case
of surface-touch devices, these are \emph{down}, \emph{move} and \emph{up}
events. Application programmers who want to incorporate complex, intuitive
gestures in their application face the challenge of interpreting these
primitive events as gestures. With the increasing complexity of gestures, the
complexity of the logic required to detect these gestures increases as well.
This challenge limits, or even deters the application developer to use complex
gestures in an application.
The main question in this research project is whether a generic architecture
for the detection of complex interaction gestures can be designed, with the
capability of managing the complexity of gesture detection logic.
Application frameworks for surface-touch devices, such as Nokia's Qt \cite{qt},
include the detection of commonly used gestures like \emph{pinch} gestures.
However, this detection logic is dependent on the application framework.
Consequently, an application developer who wants to use multi-touch interaction
in an application is forced to choose an application framework that includes
support for multi-touch gestures. Therefore, a requirement of the generic
architecture is that it must not be bound to a specific application framework.
Moreover, the set of supported gestures is limited by the application framework
of choice. To incorporate a custom event in an application, the application
developer needs to extend the framework. This requires extensive knowledge of
the framework's architecture. Also, if the same gesture is used in another
application that is based on another framework, the detection logic has to be
translated for use in that framework. Nevertheless, application frameworks are
a necessity when it comes to fast, cross-platform development. Therefore, the
architecture design should aim to be compatible with existing frameworks, but
provide a way to detect and extend gestures independent of the framework.
An application framework is written in a specific programming language. A
generic architecture should not limited to a single programming language. The
ultimate goal of this thesis is to provide support for complex gesture
interaction in any application. Thus, applications should be able to address
the architecture using a language-independent method of communication. This
intention leads towards the concept of a dedicated gesture detection
application that serves gestures to multiple programs at the same time.
The scope of this thesis is limited to the detection of gestures on multi-touch
surface devices. It presents a design for a generic gesture detection
architecture for use in multi-touch based applications. A reference
implementation of this design is used in some test case applications, whose
goal is to test the effectiveness of the design and detect its shortcomings.
% FIXME: Moet deze nog in de introductie?
% How can the input of the architecture be normalized? This is needed, because
% multi-touch drivers use their own specific message format.
\section{Structure of this document}
......@@ -109,9 +124,7 @@ the reference implementation in a test case application.
gestures and flexibility in rule definitions, over-complexity can be
avoided.
% oplossing: trackers. bijv. TapTracker, TransformationTracker gescheiden
\section{Gesture recognition software for Windows 7}
\section{Gesture recognition implementation for Windows 7}
The online article \cite{win7touch} presents a Windows 7 application,
written in Microsofts .NET. The application shows detected gestures in a
......@@ -128,6 +141,7 @@ the reference implementation in a test case application.
feature by also using different gesture trackers to track different gesture
types.
% TODO: This is not really 'related', move it to somewhere else
\section{Processing implementation of simple gestures in Android}
An implementation of a detection architecture for some simple multi-touch
......@@ -168,50 +182,69 @@ the reference implementation in a test case application.
multi-touch gesture detection architecture. The chapter represents the
architecture as a diagram of relations between different components.
Sections \ref{sec:driver-support} to \ref{sec:event-analysis} define
requirements for the archtitecture, and extend the diagram with components
requirements for the architecture, and extend the diagram with components
that meet these requirements. Section \ref{sec:example} describes an
example usage of the architecture in an application.
\subsection*{Position of architecture in software}
The input of the architecture comes from some multi-touch device
driver. The task of the architecture is to translate this input to
multi-touch gestures that are used by an application, as illustrated in
figure \ref{fig:basicdiagram}. In the course of this chapter, the
diagram is extended with the different components of the architecture.
The input of the architecture comes from a multi-touch device driver.
The task of the architecture is to translate this input to multi-touch
gestures that are used by an application, as illustrated in figure
\ref{fig:basicdiagram}. In the course of this chapter, the diagram is
extended with the different components of the architecture.
\basicdiagram{A diagram showing the position of the architecture
relative to the device driver and a multi-touch application.}
relative to the device driver and a multi-touch application. The input
of the architecture is given by a touch device driver. This output is
translated to complex interaction gestures and passed to the
application that is using the architecture.}
\section{Supporting multiple drivers}
\label{sec:driver-support}
The TUIO protocol \cite{TUIO} is an example of a touch driver that can be
used by multi-touch devices. Other drivers do exist, which should also be
supported by the architecture. Therefore, there must be some translation of
driver-specific messages to a common format in the arcitecture. Messages in
this common format will be called \emph{events}. Events can be translated
to multi-touch \emph{gestures}. The most basic set of events is
$\{point\_down, point\_move, point\_up\}$. Here, a ``point'' is a touch
object with only an (x, y) position on the screen.
A more extended set could also contain more complex events. An object can
also have a rotational property, like the ``fiducials''\footnote{A fiducial
is a pattern used by some touch devices to identify objects.} type
in the TUIO protocol. This results in $\{point\_down, point\_move,\\
point\_up, object\_down, object\_move, object\_up, object\_rotate\}$.
The component that translates driver-specific messages to events, is called
the \emph{event driver}. The event driver runs in a loop, receiving and
analyzing driver messages. The event driver that is used in an application
is dependent of the support of the multi-touch device.
When a sequence of messages is analyzed as an event, the event driver
delegates the event to other components in the architecture for translation
to gestures.
used by multi-touch devices. TUIO uses ALIVE- and SET-messages to communicate
low-level touch events (see appendix \ref{app:tuio} for more details).
These messages are specific to the API of the TUIO protocol. Other touch
drivers may use very different messages types. To support more than
one driver in the architecture, there must be some translation from
driver-specific messages to a common format for primitive touch events.
After all, the gesture detection logic in a ``generic'' architecture should
not be implemented based on driver-specific messages. The event types in
this format should be chosen so that multiple drivers can trigger the same
events. If each supported driver adds its own set of event types to the
common format, it the purpose of being ``common'' would be defeated.
A reasonable expectation for a touch device driver is that it detects
simple touch points, with a ``point'' being an object at an $(x, y)$
position on the touch surface. This yields a basic set of events:
$\{point\_down, point\_move, point\_up\}$.
The TUIO protocol supports fiducials\footnote{A fiducial is a pattern used
by some touch devices to identify objects.}, which also have a rotational
property. This results in a more extended set: $\{point\_down, point\_move,
point\_up, object\_down, object\_move, object\_up,\\ object\_rotate\}$.
Due to their generic nature, the use of these events is not limited to the
TUIO protocol. Another driver that can keep apart rotated objects from
simple touch points could also trigger them.
The component that translates driver-specific messages to common events,
will be called the \emph{event driver}. The event driver runs in a loop,
receiving and analyzing driver messages. When a sequence of messages is
analyzed as an event, the event driver delegates the event to other
components in the architecture for translation to gestures. This
communication flow is illustrated in figure \ref{fig:driverdiagram}.
A touch device driver can be supported by adding an event driver
implementation for it. The event driver implementation that is used in an
application is dependent of the support of the touch device.
\driverdiagram{Extension of the diagram from figure \ref{fig:basicdiagram},
showing the position of the event driver in the architecture.}
showing the position of the event driver in the architecture. The event
driver translates driver-specific to a common set of events, which are
delegated to analysis components that will interpret them as more complex
gestures.}
\section{Restricting gestures to a screen area}
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment