Models for Quasi-Objective Rating of Digital Samples

Projekttitel Models for Quasi-Objective Rating of Digital Samples
Projekttype Anvendt forskning og udvikling
Frascati Ja
Tema IT | Teknik
Teaser How can we build machine learning models that can tell good handwriting from bad handwriting?
Status Igangværende
Ejer  
- Akademi Københavns Erhvervsakademi (KEA)
- Kontaktperson Henrik Strøm
Adjunkt
hstr@kea.dk
Nat./Int. Internationalt
Projektperiode 01. juni 2019 - 31. maj 2023
Projektbeskrivelse  
- Projektresumé

A project for researching and developing quasi-objective models to perform a human-like rating of digital samples, more specifically, handwriting samples. The project covers data collection via observational studies, theory-building of machine learning models, system development of prototypes, and finally, a real-world application experiment. The project is currently in an initial phase of data collection.

- Baggrund og formål

Simultaneously doing research in machine learning and trying to learn Chinese, I came up with the idea of making an app that would help me training writing Chinese characters based on machine learning. It turned out to be far from trivial. The aesthetics of handwriting is an intrinsically subjective matter. However, most people would still agree about whether a particular handwriting sample was “good” or “bad,” which makes handwriting a good case study. Machine learning models to rate handwriting will at best be quasi-objective, meaning they have high external validity in that they align with the consensus of a large number of real humans.

This Ph.D. project is aimed at establishing data sets and researching quasi-objective machine learning models based on these data sets, for rating handwriting in real-time, to provide a user with immediate feedback on his handwriting.

- Aktiviteter og handling

During this project, two data sets will is collected and analyzed:

(1) an augmentation to the MNIST data set with real human ratings to establish a ground truth to train, evaluate, and test models

(2) a data set of the handwriting of Chinese characters with a temporal factor, that is, time-series of the strokes.

For each of these data sets, the following activities will take place:

(1) Analysis of how data sets should be collected and analyzed. (2) Collection of data sets.
(3) Theory building of machine learning models
(4) Development of prototype models

- Projektets Metode

The two major activities in this project are generating data sets to establish a ground truth and the research and development of models around these data sets. A multi-methodological approach, as described by Nunamaker et al*., is therefore taken.

Observation, and more specifically, survey studies, are used to generate data sets. These data sets support theory building in form of machine learning models, and systems development of prototypes* (see figure).

Finally, a field experiment* will implement the selected best model into an app, that can be used by real users. App development is planned to happen in collaboration with a third party.

 

* Jay F Nunamaker Jr, Minder Chen, and Titus DM Purdin. 1990. Systems development in information systems research. 7, 3 (1990), 89–106.

- Projektets Forventede Resultater

Quasi-objective models for the rating of digital samples with high external validity, two data sets, 4+ articles published.

- Projektets Forventede Effekt

Models can be applied in digital learning scenarios, e.g. students can practice handwriting on a tablet device with a pen, and get immediate feedback while freeing the teacher to do other tasks.

Tags datascience | machinelearning
Deltagere  
- Studerende
- Medarbejdere Københavns Erhvervsakademi (KEA)
Henrik Strøm
- Virksomhedsrepræsentanter
- Andre
Partnere Aalborg Universitet
Finansiering  
- Intern 100%
- Ekstern
Resultat
Evaluering
Formidlingsform  
- Resultatets formidling
- Resultaternes værdi
- Målgruppen
- Publikationer