report.tex 12 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312
  1. \documentclass[twoside,openright]{uva-bachelor-thesis}
  2. \usepackage[english]{babel}
  3. \usepackage[utf8]{inputenc}
  4. \usepackage{hyperref,graphicx,float}
  5. % Link colors
  6. \hypersetup{colorlinks=true,linkcolor=black,urlcolor=blue,citecolor=DarkGreen}
  7. % Title Page
  8. \title{A universal detection mechanism for multi-touch gestures}
  9. \author{Taddeüs Kroes}
  10. \supervisors{Dr. Robert G. Belleman (UvA)}
  11. \signedby{Dr. Robert G. Belleman (UvA)}
  12. \begin{document}
  13. % Title page
  14. \maketitle
  15. \begin{abstract}
  16. % TODO
  17. \end{abstract}
  18. % Set paragraph indentation
  19. \parindent 0pt
  20. \parskip 1.5ex plus 0.5ex minus 0.2ex
  21. % Table of contant on separate page
  22. \tableofcontents
  23. \chapter{Introduction}
  24. % Ruwe probleemstelling
  25. Multi-touch interaction is becoming increasingly common, mostly due to the wide
  26. use of touch screens in phones and tablets. When programming applications using
  27. this method of interaction, the programmer needs an abstraction of the raw data
  28. provided by the touch driver of the device. This abstraction exists in several
  29. multi-touch application frameworks like Nokia's
  30. Qt\footnote{\url{http://qt.nokia.com/}}. However, applications that do not use
  31. these frameworks have no access to their multi-touch events.
  32. % Aanleiding
  33. This problem was observed during an attempt to create a multi-touch
  34. ``interactor'' class for the Visualization Toolkit (VTK \cite{VTK}). Because
  35. VTK provides the application framework here, it is undesirable to use an entire
  36. framework like Qt simultaneously only for its multi-touch support.
  37. % Ruw doel
  38. The goal of this project is to define a universal multi-touch event triggering
  39. mechanism. To test the definition, a reference implementation is written in
  40. Python.
  41. % Setting
  42. To test multi-touch interaction properly, a multi-touch device is required.
  43. The University of Amsterdam (UvA) has provided access to a multi-touch table
  44. from PQlabs. The table uses the TUIO protocol \cite{TUIO} to communicate touch
  45. events.
  46. \section{Definition of the problem}
  47. % Hoofdvraag
  48. The goal of this thesis is to create a multi-touch event triggering mechanism
  49. for use in a VTK interactor. The design of the mechanism must be universal.
  50. % Deelvragen
  51. To design such a mechanism properly, the following questions are relevant:
  52. \begin{itemize}
  53. \item What are the requirements of the mechanism to be universal?
  54. \item What is the input of the mechanism? Different touch drivers have
  55. different API's. To be able to support different drivers (which is
  56. highly desirable), there should probably be a translation from the
  57. driver API to a fixed input format.
  58. \item How can extendability be accomplished? The set of supported events
  59. should not be limited to a single implementation, but an application
  60. should be able to define its own custom events.
  61. \item Can events be shared with multiple processes at the same time? For
  62. example, a network implementation could run as a service instead of
  63. within a single application, triggering events in any application that
  64. needs them.
  65. \item Is performance an issue? For example, an event loop with rotation
  66. detection could swallow up more processing resources than desired.
  67. \end{itemize}
  68. % Afbakening
  69. The scope of this thesis includes the design of an multi-touch triggering
  70. mechanism, a reference implementation of this design, and its integration
  71. into a VTK interactor. To be successful, the design should allow for
  72. extensions to be added to any implementation. The reference implementation
  73. is a Proof of Concept that translates TUIO events to some simple touch
  74. gestures that are used by a VTK interactor.
  75. \section{Structure of this document}
  76. % TODO
  77. \chapter{Related work}
  78. \section{Gesture and Activity Recognition Toolkit}
  79. The Gesture and Activity Recognition Toolkit (GART) \cite{GART} is a
  80. toolkit for the development of gesture-based applications. The toolkit
  81. states that the best way to classify gestures is to use machine learning.
  82. The programmer trains a program to recognize using the machine learning
  83. library from the toolkit. The toolkit contains a callback-mechanism that
  84. the programmer uses to execute custom code when a gesture is recognized.
  85. Though multi-touch input is not directly supported by the toolkit, the
  86. level of abstraction does allow for it to be implemented in the form of a
  87. ``touch'' sensor.
  88. The reason to use machine learning is the statement that gesture detection
  89. ``is likely to become increasingly complex and unmanageable'' when using a
  90. set of predefined rules to detect whether some sensor input can be seen as
  91. a specific gesture. This statement is not necessarily true. If the
  92. programmer is given a way to separate the detection of different types of
  93. gestures and flexibility in rule definitions, over-complexity can be
  94. avoided.
  95. % oplossing: trackers. bijv. TapTracker, TransformationTracker gescheiden
  96. \section{Gesture recognition software for Windows 7}
  97. % TODO
  98. The online article \cite{win7touch} presents a Windows 7 application,
  99. written in Microsofts .NET. The application shows detected gestures in a
  100. canvas. Gesture trackers keep track of stylus locations to detect specific
  101. gestures. The event types required to track a touch stylus are ``stylus
  102. down'', ``stylus move'' and ``stylus up'' events. A
  103. \texttt{GestureTrackerManager} object dispatches these events to gesture
  104. trackers. The application supports a limited number of pre-defined
  105. gestures.
  106. An important observation in this application is that different gestures are
  107. detected by different gesture trackers, thus separating gesture detection
  108. code into maintainable parts.
  109. \section{The TUIO protocol}
  110. The TUIO protocol \cite{TUIO} defines a way to geometrically describe
  111. tangible objects, such as fingers or fiducials on a multi-touch table. The
  112. table used for this thesis uses the protocol in its driver. Object
  113. information is sent to the TUIO UDP port (3333 by default).
  114. For efficiency reasons, the TUIO protocol is encoded using the Open Sound
  115. Control (OSC)\footnote{\url{http://opensoundcontrol.org/specification}}
  116. format. An OSC server/client implementation is available for Python:
  117. pyOSC\footnote{\url{https://trac.v2.nl/wiki/pyOSC}}.
  118. A Python implementation of the TUIO protocol also exists:
  119. pyTUIO\footnote{\url{http://code.google.com/p/pytuio/}}. However, the
  120. execution of an example script yields an error regarding Python's built-in
  121. \texttt{socket} library. Therefore, the reference implementation uses the
  122. pyOSC package to receive TUIO messages.
  123. The two most important message types of the protocol are ALIVE and SET
  124. messages. An ALIVE message contains the list of session id's that are
  125. currently ``active'', which in the case of multi-touch a table means that
  126. they are touching the screen. A SET message provides geometric information
  127. of a session id, such as position, velocity and acceleration.
  128. Each session id represents an object. The only type of objects on the
  129. multi-touch table are what the TUIO protocol calls ``2DCur'', which is a
  130. (x, y) position on the screen.
  131. ALIVE messages can be used to determine when an object touches and releases
  132. the screen. For example, if a session id was in the previous message but
  133. not in the current, The object it represents has been lifted from the
  134. screen.
  135. SET provide information about movement. In the case of simple (x, y)
  136. positions, only the movement vector of the position itself can be
  137. calculated. For more complex objects such as fiducials, arguments like
  138. rotational position is also included.
  139. ALIVE and SET messages can be combined to create ``point down'', ``point
  140. move'' and ``point up'' events (as used by the \cite[.NET
  141. application]{win7touch}).
  142. TUIO coordinates range from $0.0$ to $1.0$, with $(0.0, 0.0)$ being the
  143. left top corner of the screen and $(1.0, 1.0)$ the right bottom corner. To
  144. focus events within a window, a translation to window coordinates is
  145. required in the client application, as stated by the online specification
  146. \cite{TUIO_specification}:
  147. \begin{quote}
  148. In order to compute the X and Y coordinates for the 2D profiles a TUIO
  149. tracker implementation needs to divide these values by the actual
  150. sensor dimension, while a TUIO client implementation consequently can
  151. scale these values back to the actual screen dimension.
  152. \end{quote}
  153. In other words, the design of the gesture detection mechanism should
  154. incorporate a translation from driver-specific coordinates to pixel
  155. coordinates.
  156. \section{Processing implementation of simple gestures in Android}
  157. An implementation of a detection mechanism for some simple multi-touch
  158. gestures (tap, double tap, rotation, pinch and drag) using
  159. Processing\footnote{Processing is a Java-based development environment with
  160. an export possibility for Android. See also \url{http://processing.org/.}}
  161. can be found found in a forum on the Processing website
  162. \cite{processingMT}. The implementation is fairly simple, but it yields
  163. some very appealing results. The detection logic of all gestures is
  164. combined in a single class. This does not allow for extendability, because
  165. the complexity of this class would increase to an undesirable level (as
  166. predicted by the GART article \cite{GART}). However, the detection logic
  167. itself is partially re-used in the reference implementation of the
  168. universal gesture detection mechanism.
  169. % TODO
  170. \chapter{Experiments}
  171. % testimplementatie met taps, rotatie en pinch. Hieruit bleek:
  172. % - dat er verschillende manieren zijn om bijv. "rotatie" te
  173. % detecteren, (en dat daartussen onderscheid moet kunnen worden
  174. % gemaakt)
  175. % - dat detectie van verschillende soorten gestures moet kunnen
  176. % worden gescheiden, anders wordt het een chaos.
  177. % - Er zijn een aantal keuzes gemaakt bij het ontwerpen van de gestures,
  178. % bijv dat rotatie ALLE vingers gebruikt voor het centroid. Het is
  179. % wellicht in een ander programma nodig om maar 1 hand te gebruiken, en
  180. % dus punten dicht bij elkaar te kiezen (oplossing: windows).
  181. % Tekenprogramma dat huidige points + centroid tekent en waarmee
  182. % transformatie kan worden getest Link naar appendix "supported events"
  183. % Proof of Concept: VTK interactor
  184. % -------
  185. % Results
  186. % -------
  187. \chapter{Design}
  188. \section{Requirements}
  189. % TODO
  190. % ondersteunen van meerdere drivers
  191. % gesture detectie koppelen aan bepaald gedeelte van het scherm
  192. % scheiden van detectiecode voor verschillende gesture types
  193. % eventueel te gebruiken in meerdere talen
  194. \section{Input server}
  195. % TODO
  196. % vertaling driver naar point down, move, up
  197. % TUIO in reference implementation
  198. \section{Gesture server}
  199. % TODO
  200. % vertaling naar pixelcoordinaten
  201. % toewijzing aan windows
  202. \section{Windows}
  203. % TODO
  204. % toewijzen even aan deel v/h scherm:
  205. % TUIO coördinaten zijn over het hele scherm en van 0.0 tot 1.0, dus moeten
  206. % worden vertaald naar pixelcoördinaten binnen een ``window''
  207. \section{Trackers}
  208. % TODO
  209. % event binding/triggering
  210. % extendability
  211. % TODO: link naar appendix met schema
  212. \chapter{Reference implementation}
  213. % TODO
  214. \chapter{Integration in VTK}
  215. % VTK interactor
  216. \chapter{Conclusions}
  217. % TODO
  218. % Windows zijn een manier om globale events toe te wijzen aan vensters
  219. % Trackers zijn een effectieve manier om gebaren te detecteren
  220. % Trackers zijn uitbreidbaar door object-orientatie
  221. \chapter{Suggestions for future work}
  222. % TODO
  223. % Network protocol (ZeroMQ)
  224. % State machine
  225. \bibliographystyle{plain}
  226. \bibliography{report}{}
  227. \appendix
  228. \chapter{Diagram of mechanism structure}
  229. \label{app:schema}
  230. \begin{figure}[H]
  231. \hspace{-14em}
  232. \includegraphics{data/server_scheme.pdf}
  233. \caption{}
  234. %TODO: caption
  235. \end{figure}
  236. \chapter{Supported events in reference implementation}
  237. \label{app:supported-events}
  238. % TODO
  239. \end{document}