torsdag 5 december 2013

Theme 4: Quantitative Research post-reflection


As mentioned in my post-reflection, this weeks topic have been very common in my studies during the last year. I mainly focused on studies in machine learning and audio content analysis, in which quantitative research is the very common. Machine learning is all about trying to make systems that learn from data in order to either predict the future and make decisions accordingly. E.g. if you want to classify apples according to their colour and acidity, you show the machine learning applications a lot of apples (in form of the measures for colour and acidity) and tells it which kind it is. After showing it a variety of apples (the more the better), the application soon will get confident enough to make its own decisions if the next apple is a royal gala or granny smith.

The machine learning workflow is in some ways really interesting to relate to theory and quantitative methods. In quantitative methods, the workflows foundation is data that is collected through experiments. This data is then analysed in search for patterns that can reveal some correlation between variables or that seem to be specially relevant to predict future outcomes. 

Then of course, one needs to somehow statistically show that the correlation is relevant and probable, which is also a very important step in the machine learning workflow. Since the machine learning applications are self driven and not human controlled, the application need to learn how to ensure that its classification is probable and not obviously false and thereby forming some kind of artificial intelligence. This can be done with such statistical measures, i.e. measures that can show correlation between data variables and outcome. For example, let's say that the apple classifier suddenly was to classify a rotten apple. The colour is different from anything it has seen before, and the acidity is also quite unusual. In this case of an "outlier", it is important for the classifier to understand that it does not understand and output this uncertainty instead of a probable false classification. 

In the same way, quantitative methods need to be based on a variety of data that clearly reflects the reality it tries to measure and also need to be ensured to have correlations between variables and result.

2 kommentarer:

  1. Hi! Interesting post. How do you think that systems should handle outliers in general? Following your example of a rotten apple, you classify it as an outlier whereas it still belongs to some variety of apple. Would you argue that it is a shortcoming of the system and that it could be fixed by adding more features in addition to colour and acidity?

    SvaraRadera
  2. Depends really on the features and the nature of the rotting process. If rotting means that the taste and colour decreases, then maybe the system could still see the relation between the two (since both decrease), but if the colour and taste just were different, then the risk is big for a false classification.

    Yes, if you add features that can characterise apples even if they are rotting, then you could get rid of these outliers. This means that what you want is data that is a good representation of the whole apple population, i.e. you wanna include rotten apples in your data. Cause rotten apples are still apples.

    SvaraRadera