|
@@ -84,7 +84,7 @@ detection for every new gesture-based application.
|
|
|
multi-touch surface devices. It presents a design for a generic gesture
|
|
multi-touch surface devices. It presents a design for a generic gesture
|
|
|
detection architecture for use in multi-touch based applications. A
|
|
detection architecture for use in multi-touch based applications. A
|
|
|
reference implementation of this design is used in some test case
|
|
reference implementation of this design is used in some test case
|
|
|
- applications, whose goal is to test the effectiveness of the design and
|
|
|
|
|
|
|
+ applications, whose purpose is to test the effectiveness of the design and
|
|
|
detect its shortcomings.
|
|
detect its shortcomings.
|
|
|
|
|
|
|
|
Chapter \ref{chapter:related} describes related work that inspired a design
|
|
Chapter \ref{chapter:related} describes related work that inspired a design
|
|
@@ -102,72 +102,82 @@ detection for every new gesture-based application.
|
|
|
\chapter{Related work}
|
|
\chapter{Related work}
|
|
|
\label{chapter:related}
|
|
\label{chapter:related}
|
|
|
|
|
|
|
|
- % TODO: herstructureren
|
|
|
|
|
-
|
|
|
|
|
- \section{Existing application frameworks}
|
|
|
|
|
-
|
|
|
|
|
- Application frameworks for surface-touch devices, such as Nokia's Qt
|
|
|
|
|
- \cite{qt}, do already include the detection of commonly used gestures like
|
|
|
|
|
- \emph{pinch} gestures. However, this detection logic is dependent on the
|
|
|
|
|
- application framework. Consequently, an application developer who wants to
|
|
|
|
|
- use multi-touch interaction in an application is forced to use an
|
|
|
|
|
- application framework that includes support for multi-touch gestures.
|
|
|
|
|
- Moreover, the set of supported gestures is limited by the application
|
|
|
|
|
- framework of choice. To incorporate a custom event in an application, the
|
|
|
|
|
- application developer needs to extend the framework. This requires
|
|
|
|
|
- extensive knowledge of the framework's architecture. Also, if the same
|
|
|
|
|
- gesture is needed in another application that is based on another
|
|
|
|
|
- framework, the detection logic has to be translated for use in that
|
|
|
|
|
- framework.
|
|
|
|
|
-
|
|
|
|
|
- \section{Gesture and Activity Recognition Toolkit}
|
|
|
|
|
-
|
|
|
|
|
- The Gesture and Activity Recognition Toolkit (GART) \cite{GART} is a
|
|
|
|
|
- toolkit for the development of gesture-based applications. The toolkit
|
|
|
|
|
- states that the best way to classify gestures is to use machine learning.
|
|
|
|
|
- The programmer trains a program to recognize using the machine learning
|
|
|
|
|
- library from the toolkit. The toolkit contains a callback mechanism that
|
|
|
|
|
- the programmer uses to execute custom code when a gesture is recognized.
|
|
|
|
|
-
|
|
|
|
|
- Though multi-touch input is not directly supported by the toolkit, the
|
|
|
|
|
- level of abstraction does allow for it to be implemented in the form of a
|
|
|
|
|
- ``touch'' sensor.
|
|
|
|
|
-
|
|
|
|
|
- The reason to use machine learning is the statement that gesture detection
|
|
|
|
|
- ``is likely to become increasingly complex and unmanageable'' when using a
|
|
|
|
|
- set of predefined rules to detect whether some sensor input can be seen as
|
|
|
|
|
- a specific gesture. This statement is not necessarily true. If the
|
|
|
|
|
- programmer is given a way to separate the detection of different types of
|
|
|
|
|
- gestures and flexibility in rule definitions, over-complexity can be
|
|
|
|
|
- avoided.
|
|
|
|
|
-
|
|
|
|
|
- \section{Gesture recognition implementation for Windows 7}
|
|
|
|
|
-
|
|
|
|
|
- The online article \cite{win7touch} presents a Windows 7 application,
|
|
|
|
|
- written in Microsofts .NET. The application shows detected gestures in a
|
|
|
|
|
- canvas. Gesture trackers keep track of stylus locations to detect specific
|
|
|
|
|
- gestures. The event types required to track a touch stylus are ``stylus
|
|
|
|
|
- down'', ``stylus move'' and ``stylus up'' events. A
|
|
|
|
|
- \texttt{GestureTrackerManager} object dispatches these events to gesture
|
|
|
|
|
- trackers. The application supports a limited number of pre-defined
|
|
|
|
|
- gestures.
|
|
|
|
|
-
|
|
|
|
|
- An important observation in this application is that different gestures are
|
|
|
|
|
- detected by different gesture trackers, thus separating gesture detection
|
|
|
|
|
- code into maintainable parts.
|
|
|
|
|
|
|
+ Applications that use gesture-based interaction need a graphical user
|
|
|
|
|
+ interface (GUI) on which gestures can be performed. The creation of a GUI
|
|
|
|
|
+ is a platform-specific task. For instance, Windows and Linux support
|
|
|
|
|
+ different window managers. To create a window in a platform-independent
|
|
|
|
|
+ application, the application would need to include separate functionalities
|
|
|
|
|
+ for supported platforms. For this reason, GUI-based applications are often
|
|
|
|
|
+ built on top of an application framework that abstracts platform-specific
|
|
|
|
|
+ tasks. Frameworks often include a set of tools and events that help the
|
|
|
|
|
+ developer to easily build advanced GUI widgets.
|
|
|
|
|
+
|
|
|
|
|
+ % Existing frameworks (and why they're not good enough)
|
|
|
|
|
+ Some frameworks, such as Nokia's Qt \cite{qt}, provide support for basic
|
|
|
|
|
+ multi-touch gestures like tapping, rotation or pinching. However, the
|
|
|
|
|
+ detection of gestures is embedded in the framework code in an inseparable
|
|
|
|
|
+ way. Consequently, an application developer who wants to use multi-touch
|
|
|
|
|
+ interaction in an application, is forced to use an application framework
|
|
|
|
|
+ that includes support for those multi-touch gestures that are required by
|
|
|
|
|
+ the application. Kivy \cite{kivy} is a GUI framework for Python
|
|
|
|
|
+ applications, with support for multi-touch gestures. It uses a basic
|
|
|
|
|
+ gesture detection algorithm that allows developers to define custom
|
|
|
|
|
+ gestures to some degree \cite{kivygesture} using a set of touch point
|
|
|
|
|
+ coordinates. However, these frameworks do not provide support for extension
|
|
|
|
|
+ with custom complex gestures.
|
|
|
|
|
+
|
|
|
|
|
+ Many frameworks are also device-specific, meaning that they are developed
|
|
|
|
|
+ for use on either a tablet, smartphone, PC or other device. OpenNI
|
|
|
|
|
+ \cite{OpenNI2010}, for example, provides API's for only natural interaction
|
|
|
|
|
+ (NI) devices such as webcams and microphones. The concept of complex
|
|
|
|
|
+ gesture-based interaction, however, is applicable to a much wider set of
|
|
|
|
|
+ devices. VRPN \cite{VRPN} provides a software library that abstracts the
|
|
|
|
|
+ output of devices, which enables it to support a wide set of devices used
|
|
|
|
|
+ in Virtual Reality (VR) interaction. The framework makes the low-level
|
|
|
|
|
+ events of these devices accessible in a client application using network
|
|
|
|
|
+ communication. Gesture detection is not included in VRPN.
|
|
|
|
|
+
|
|
|
|
|
+ % Methods of gesture detection
|
|
|
|
|
+ The detection of high-level gestures from low-level events can be
|
|
|
|
|
+ approached in several ways. GART \cite{GART} is a toolkit for the
|
|
|
|
|
+ development of gesture-based applications, which states that the best way
|
|
|
|
|
+ to classify gestures is to use machine learning. The programmer trains an
|
|
|
|
|
+ application to recognize gestures using a machine learning library from the
|
|
|
|
|
+ toolkit. Though multi-touch input is not directly supported by the toolkit,
|
|
|
|
|
+ the level of abstraction does allow for it to be implemented in the form of
|
|
|
|
|
+ a ``touch'' sensor. The reason to use machine learning is that gesture
|
|
|
|
|
+ detection ``is likely to become increasingly complex and unmanageable''
|
|
|
|
|
+ when using a predefined set of rules to detect whether some sensor input
|
|
|
|
|
+ can be classified as a specific gesture.
|
|
|
|
|
+
|
|
|
|
|
+ The alternative to machine learning is to define a predefined set of rules
|
|
|
|
|
+ for each gesture. Manoj Kumar \cite{win7touch} presents a Windows 7
|
|
|
|
|
+ application, written in Microsofts .NET, which detects a set of basic
|
|
|
|
|
+ directional gestures based the movement of a stylus. The complexity of the
|
|
|
|
|
+ code is managed by the separation of different gesture types in different
|
|
|
|
|
+ detection units called ``gesture trackers''. The application shows that
|
|
|
|
|
+ predefined gesture detection rules do not necessarily produce unmanageable
|
|
|
|
|
+ code.
|
|
|
|
|
|
|
|
\section{Analysis of related work}
|
|
\section{Analysis of related work}
|
|
|
|
|
|
|
|
- The simple Processing implementation of multi-touch events provides most of
|
|
|
|
|
- the functionality that can be found in existing multi-touch applications.
|
|
|
|
|
- In fact, many applications for mobile phones and tablets only use tap and
|
|
|
|
|
- scroll events. For this category of applications, using machine learning
|
|
|
|
|
- seems excessive. Though the representation of a gesture using a feature
|
|
|
|
|
- vector in a machine learning algorithm is a generic and formal way to
|
|
|
|
|
- define a gesture, a programmer-friendly architecture should also support
|
|
|
|
|
- simple, ``hard-coded'' detection code. A way to separate different pieces
|
|
|
|
|
- of gesture detection code, thus keeping a code library manageable and
|
|
|
|
|
- extendable, is to user different gesture trackers.
|
|
|
|
|
|
|
+ Implementations for the support of complex gesture based interaction do
|
|
|
|
|
+ already exist. However, gesture detection in these implementations is
|
|
|
|
|
+ device-specific (Nokia Qt and OpenNI) or limited to use within an
|
|
|
|
|
+ application framework (Kivy).
|
|
|
|
|
+
|
|
|
|
|
+ An abstraction of device output allows VRPN and GART to support multiple
|
|
|
|
|
+ devices. However, VRPN does not incorporate gesture detection. GART does,
|
|
|
|
|
+ but only in the form of machine learning algorithms. Many applications for
|
|
|
|
|
+ mobile phones and tablets only use simple gestures such as taps. For this
|
|
|
|
|
+ category of applications, machine learning is an excessively complex method
|
|
|
|
|
+ of gesture detection. Manoj Kumar shows that when managed well, a
|
|
|
|
|
+ predefined set of gesture detection rules is sufficient to detect simple
|
|
|
|
|
+ gestures.
|
|
|
|
|
+
|
|
|
|
|
+ This thesis explores the possibility to create an architecture that
|
|
|
|
|
+ combines support for multiple input devices with different methods of
|
|
|
|
|
+ gesture detection.
|
|
|
|
|
|
|
|
\chapter{Design}
|
|
\chapter{Design}
|
|
|
\label{chapter:design}
|
|
\label{chapter:design}
|
|
@@ -180,17 +190,17 @@ detection for every new gesture-based application.
|
|
|
Application frameworks are a necessity when it comes to fast,
|
|
Application frameworks are a necessity when it comes to fast,
|
|
|
cross-platform development. A generic architecture design should aim to be
|
|
cross-platform development. A generic architecture design should aim to be
|
|
|
compatible with existing frameworks, and provide a way to detect and extend
|
|
compatible with existing frameworks, and provide a way to detect and extend
|
|
|
- gestures independent of the framework. An application framework is written
|
|
|
|
|
- in a specific programming language. To support multiple frameworks and
|
|
|
|
|
- programming languages, the architecture should be accessible for
|
|
|
|
|
- applications using a language-independent method of communication. This
|
|
|
|
|
- intention leads towards the concept of a dedicated gesture detection
|
|
|
|
|
- application that serves gestures to multiple applications at the same time.
|
|
|
|
|
|
|
+ gestures independent of the framework. Since an application framework is
|
|
|
|
|
+ written in a specific programming language, the architecture should be
|
|
|
|
|
+ accessible for applications using a language-independent method of
|
|
|
|
|
+ communication. This intention leads towards the concept of a dedicated
|
|
|
|
|
+ gesture detection application that serves gestures to multiple applications
|
|
|
|
|
+ at the same time.
|
|
|
|
|
|
|
|
This chapter describes a design for such an architecture. The architecture
|
|
This chapter describes a design for such an architecture. The architecture
|
|
|
is represented as diagram of relations between different components.
|
|
is represented as diagram of relations between different components.
|
|
|
Sections \ref{sec:multipledrivers} to \ref{sec:daemon} define requirements
|
|
Sections \ref{sec:multipledrivers} to \ref{sec:daemon} define requirements
|
|
|
- for the architecture, and extend the diagram with components that meet
|
|
|
|
|
|
|
+ for the architecture, and extend this diagram with components that meet
|
|
|
these requirements. Section \ref{sec:example} describes an example usage of
|
|
these requirements. Section \ref{sec:example} describes an example usage of
|
|
|
the architecture in an application.
|
|
the architecture in an application.
|
|
|
|
|
|
|
@@ -255,7 +265,6 @@ detection for every new gesture-based application.
|
|
|
\section{Restricting events to a screen area}
|
|
\section{Restricting events to a screen area}
|
|
|
\label{sec:areas}
|
|
\label{sec:areas}
|
|
|
|
|
|
|
|
- % TODO: in introduction: gestures zijn opgebouwd uit meerdere primitieven
|
|
|
|
|
Touch input devices are unaware of the graphical input
|
|
Touch input devices are unaware of the graphical input
|
|
|
widgets\footnote{``Widget'' is a name commonly used to identify an element
|
|
widgets\footnote{``Widget'' is a name commonly used to identify an element
|
|
|
of a graphical user interface (GUI).} rendered by an application, and
|
|
of a graphical user interface (GUI).} rendered by an application, and
|
|
@@ -581,18 +590,20 @@ The reference implementation is written in Python and available at
|
|
|
\item Basic tracker, supports $point\_down,~point\_move,~point\_up$ gestures.
|
|
\item Basic tracker, supports $point\_down,~point\_move,~point\_up$ gestures.
|
|
|
\item Tap tracker, supports $tap,~single\_tap,~double\_tap$ gestures.
|
|
\item Tap tracker, supports $tap,~single\_tap,~double\_tap$ gestures.
|
|
|
\item Transformation tracker, supports $rotate,~pinch,~drag$ gestures.
|
|
\item Transformation tracker, supports $rotate,~pinch,~drag$ gestures.
|
|
|
|
|
+ \item Hand tracker, supports $hand\_down,~hand\_up$ gestures.
|
|
|
\end{itemize}
|
|
\end{itemize}
|
|
|
|
|
|
|
|
\textbf{Event areas}
|
|
\textbf{Event areas}
|
|
|
\begin{itemize}
|
|
\begin{itemize}
|
|
|
\item Circular area
|
|
\item Circular area
|
|
|
\item Rectangular area
|
|
\item Rectangular area
|
|
|
|
|
+ \item Polygon area
|
|
|
\item Full screen area
|
|
\item Full screen area
|
|
|
\end{itemize}
|
|
\end{itemize}
|
|
|
|
|
|
|
|
The implementation does not include a network protocol to support the daemon
|
|
The implementation does not include a network protocol to support the daemon
|
|
|
setup as described in section \ref{sec:daemon}. Therefore, it is only usable in
|
|
setup as described in section \ref{sec:daemon}. Therefore, it is only usable in
|
|
|
-Python programs. Thus, the two test programs are also written in Python.
|
|
|
|
|
|
|
+Python programs. The two test programs are also written in Python.
|
|
|
|
|
|
|
|
The event area implementations contain some geometric functions to determine
|
|
The event area implementations contain some geometric functions to determine
|
|
|
whether an event should be delegated to an event area. All gesture trackers
|
|
whether an event should be delegated to an event area. All gesture trackers
|
|
@@ -819,8 +830,7 @@ complex objects such as fiducials, arguments like rotational position and
|
|
|
acceleration are also included.
|
|
acceleration are also included.
|
|
|
|
|
|
|
|
ALIVE and SET messages can be combined to create ``point down'', ``point move''
|
|
ALIVE and SET messages can be combined to create ``point down'', ``point move''
|
|
|
-and ``point up'' events (as used by the Windows 7 implementation
|
|
|
|
|
-\cite{win7touch}).
|
|
|
|
|
|
|
+and ``point up'' events.
|
|
|
|
|
|
|
|
TUIO coordinates range from $0.0$ to $1.0$, with $(0.0, 0.0)$ being the left
|
|
TUIO coordinates range from $0.0$ to $1.0$, with $(0.0, 0.0)$ being the left
|
|
|
top corner of the screen and $(1.0, 1.0)$ the right bottom corner. To focus
|
|
top corner of the screen and $(1.0, 1.0)$ the right bottom corner. To focus
|