DM2572 Matts' Portfolio: november 2013

torsdag 28 november 2013

Theme 4: Quantitative research pre-reflection

I have read a paper on pitch extraction from noisy signals by Tetsuya Shimamura, Member and Hajime Kobayashi, in which they try to improve the "standard" Auto Correlation Function (ACF) used to estimate the frequency, in which vocal folds oscillate when speaking. This can e.g. be used for karaoke games (such as singstar) or for analysing speech to recognise language.

reference: Shimamura, Tetsuya, and Hajime Kobayashi. "Weighted autocorrelation for pitch extraction of noisy speech." Speech and Audio Processing, IEEE Transactions on 9.7 (2001): 727-730.

In the paper the authors propose a improved estimation algorithm, which they test and evaluate on eight 10 s long speech tracks (4 male and 4 female). This makes a total of 80 s of data which is divided into parts of 23 ms each, resulting in ca. 3470 data points. All these data points have assigned true values to them and the testing is done through adding noise with different level (SNR Inf,10 dB, 5 dB, 0 dB, -5 dB) to the speech tracks and by then measuring how well the algorithm estimates the pitch for each data point and each SNR level. Then the authors calculate the absolute distance between the estimated pitch and the true pitch, and tolerates a difference on ± 10 Hz. They also compare their algorithms result to three other algorithms results in order to show the improved performance.

The method used in the paper is quite "standard" in the field and since I have been working on project in almost the same field, I didn't learn that much from the method used. On the other hand, this is a solid method for evaluating the algorithm since it evaluated the relevant factors, that are also the aim of the paper. The authors try to improve the algorithm and therefore they also compare the improved one with the other ones to ensure that they have succeeded. They also try to estimate the performance with different levels of noise, which also is exactly what they mathematically prove the improved algorithm to handle better than the other ones.

The critic I have on the method is the tolerance and the data they use. Since frequency is not linearly perceived by our ears, a 10 Hz tolerance on a estimated 200 Hz pitch is less perceivable than a tolerance of 10 Hz on a 100 Hz pitch, which is roughly the mean pitches for men respectively women. The data might also be biased, since they only use 8 japanese 10 s sentences. The human speech system works universally in the same way, but language is different. Maybe the algorithm only works on japaneese? There is no discussion about this in the paper.

Physical Activity, Stress, and Self-Reported Upper Respiratory Tract Infection by Olle Bälter et al.

The study is a population-based study performed in a middle-size county with an "normal" rate of urbanisation. The researchers sent out web questionaries by email that the subjects were asked to fill. The subject were also given follow-up-questionaries to see if their circumstances changed. The questionaries included questions on how much they do physical activity, perceived stress, age, gender and other relevant information. From these data, the researchers tried to see patterns in what factors possibly could impact how much we suffer from URTI. This is done through calculating the risk of somebody developing URTI and comparing these with different group inside the population (men, women, young old etc.). They also try to fir the data to poisson regression models, i.e. they try to predict how the data can be predicted in the future.

The quantitative method used in the paper is based on questionnaires where subjects express how they think they are. These data might well be biased by what people like to think about themselves and might not reflect what they actually are. This uncertainty can of course be generalized through using many subjects, but could still make an impact on the research. Even though using these questionnaires as a base for the analyze, the authors uses statistical methods for validating the data and seek correlations between different groups in the population. In the end they conclude that there were a connection between physical activity, stress and URTI and this is done with the help of such measures and methods.

Theme 3: Research and Theory Post-reflection

The previous weeks seminars included interesting assignments and discussions leading to an improved understanding of what theory is and is not. Through these seminars I broadened my understanding of what the different kinds of theory (according to Gregor) mean in practise and how to classify different theory into different categories.
Interesting to see was also that the course-collective definition of theory varied between participants. Even though the text have been corrected by other course participants from different years, the text still had some sentences that was questionable and also “corrected”. This shows the relevance of the discussion of theory and its characteristics and the diversity of peoples understanding of the theory about “theory”.

So why is this important to do? In the same manner as Gregor aims to understand and categorize different theory, it is of course important for me to do the same when faced with a scientific task. Understanding what the theory aims to do will ultimately help me in understanding what I am doing when I apply certain theory to my research.

Apart from a deeper understanding of theory, I also learned to analyse text in search of theory and also critically analyse this theory in its context. By reading articles and identifying the different theory and category, one can easily analyse the relevance of the theory in the context of the paper, and also see how the theory is applied and applicable to the method and results. This is a good exercise do to improve analysing skills as ones own writing skills.

fredag 22 november 2013

Theme 3: Research and Theory post-reflection

Journal Description - Transactions of Audio, Speech and Language Processing

The journal covers science in the field of audio, speech and language processing and publishes papers concerning design, development and evaluation of such applications and its associated theory. Papers are mainly application-oriented and applied machine learning or pattern recognition analysis are also welcome, even though the journal does not include this in the description.

Impact factor: 1.675

Critical Review on "Machine Learning Paradigms for Speech Recognition:An Overview"

The papers main goal is to present an overview of the machine learning paradigm used for speech or speaker recognition. It gives an overview on the mathematical notion and terminology commonly used in the field. This explanatory exercise is done through a fundamental literature search after which the mathematical notion is "standardised" and the terminology is categorised and explained. The main hypothesis is that the two different scientific fields, machine learning and speaker recognition, should be more involved with each other, since machine learning is not only a tool for speak recognisers but also a source for new machine learning research.

reference:

Li Deng; Xiao Li, "Machine Learning Paradigms for Speech Recognition: An Overview," Audio, Speech, and Language Processing, IEEE Transactions on , vol.21, no.5, pp.1060,1089, May 2013

doi: 10.1109/TASL.2013.2244083

What is Theory?

Theory can be of different kinds in different fields of knowledge/science. The author Shirley Gregor presents a taxonomy (a classification) where she divides theory into five different parts: "Analyse", "Prediction", "Explanation", "Explanation and Prediction" and "Design and Action". The difference between these lies in what they aim to do. Analyse for example, is theory that aim to analyse something and give answers to the question: "what is it?". Such theory could be to analyse if somebody is sick or not, according to certain symptoms. To predict or to explain on the other hand, also includes analysing but also tries to either explain why this is so or explain what will come of this fact. E.g. Why is the symptoms such and such or what will the symptoms lead to if the patient is not medicated. Design and action, is also linked with the others, but this more like a recipe for success. A patient could get such theory by a doctor in form of a prescription or a guide to get better.

In other words, theory is not just a collection of data or other persons theories, theory is based on observation and other peoples theories. The way from observation through others theories to ones own theory, is a logic path where one has to prove the different steps you take in order to be sure that your theory is in fact not false.

What Kind of Theory is Found in the Article?

The article reviewed above constitutes of a lot of theory, since it is an overview of the field of speech recognition and speaker recognition. But all these theories and the nature of them, also brings the author to a conclusion, i.e. their theory. The theory they present is a mix o what Gregor calls "Explenation and Prediction" and "Design and Action", meaning that aims to analyse what the field is and how to use it, and how it could benefit from this action. The theory is not the data they collected nor the diagrams they show, it is what they can see in the diagrams and the data.

Benefits and limitations

The clear problem with using theory types as they do in the article, is that the theories are just touched at the surface and are just mentioned and briefly explained. To clearly understand the outlined theories, one has to dig deeper into the references supplied in the text in order to understand why such theories are to be used. The article only gives an overview of the field and is not to be the only source for theory when trying to include machine learning in ones speech/speaker recognition application. If one is on the mission to understand the correlation and the relevance of including machine learning research in the speech community (and vice versa) suggested in the text, then the article suffice.

torsdag 21 november 2013

Theme 2 Criticial Media Theory post-reflection

As apposed to last week, this week involved both a reflection and a seminar, which for my part made more sense. I had some questions on this weeks text by Adorno and Horkheimer and was hoping for to get these answered and discussed on the seminar. My main concern with the text was why they (and the questions we had to prepare answers to) focused on what they describes as "old" and "new" media. Reading the sections from the book, I didn't really get why they made this distinction. Was it a technical statement, i.e. that new media enables things that old media doesn't or something else? The discussed this topic on the seminar and I got my question answered.

In the historical context in which the book is written, new media such as radio and television had recently emerged into society. This media were not driven by, as apposed to old media such as magazines, e.g. a political party but by big industrial companies. This was why this distinction was important in the text, due to their theory about what they call "culture industry".

By and large the topic of this week was ok. I found it more interesting than last week, but it still didn't boost my interest level to the roof and I think this is mainly because of the format and content of the readings. I and other course friends constantly asked ourselves if there weren't any more modern or less philosophical text to read about critical media theory? Maybe one that is more linked to what we can recognise in our daily living? Surely there must be something that is more relatable than reading a text from the 1940s? Of course, there is a point of understanding the historical perspective and see the similarities back then and now, but to what extent? Personally, if I narrow the week down to one line describing what I've learned, it would be something like: "Some people might be and were critical to media. I should maybe be too" ...

fredag 15 november 2013

Theme 2: Critical media studies pre-reflection

Enlightenment

In the book “Dialectic of Enlightment”, the authors Theodor Adorno and Max Horkheimer write about Enlightenment and the historical time around the 16^th century with the same name. The historical time period is known for the strive for people to get back the control over nature and for moving away from a strict religion and other mythical beliefs and instead trying to learn from and control nature. When Adorno and Horkheimer writes about “enlightenment” however, they seem to not be as excited about it as the history books. By the word “enlightenment” they mean the situation where people start to separate between nature and myth, realized in the separation between church and science. By thinking out of the “box”, people start to question current authorities and develop own theories and try to take back control over their surroundings, instead of being govern by them.

Myth

In the authors description of enlightenment, the word “myth” or “mythological” is frequently used to differentiate between what is based on belief and driven by fear of the unknown and what actually could be observed and studied by experience. The moving away from the myth to what we today call science, is the essence of the enlightenment.

“Old” and “new” media

To understand the terms “old” and “new” media, one has to understand the historical context in which the book is written. The 1940 were turbulent times in the world, and specially in Germany. By then the radio, cinema, magazines and also music was the most common medium for communication. Radio and magazine streamed propaganda to the people without any interaction with their consumers. The consumers therefore were passive consumers, without any possibility to respond or develop the debate. “New” media on the other hand, was television, where different mediums merged together to form a new medium.

Culture industry

Trough this “old” and “new” media and the lack of corporation from the passive consumers, the culture is not as “free” as one would think culture in an enlightened society would be. Since the culture is run by the media industry or governments (as were the case in WW2 Germany), the consumers that are products of enlightenment find themselves govern by authorities out of their control. This is the argument and the antithesis that the book tries to communicate.

Mass media and mass deception

As mentioned before, the culture industry in which the media (in the historical context of the book) operates is according to the authors not mass media, but rather mass deception. The masses are deceived to believe they are free and thinking individuals, but the reality is that they are as governed and fooled as before the enlightenment.

Interesting concepts

What I found interesting are the concepts of mass deception and culture industry. A couple of years ago, me and three of my friends made a short film named “ghettopop”,which is a cultural dystopia concerned with the same concept. In the film which can be seen here, we vision a future where a genre of music have taken over the whole world and transformed the world into a mono-culture, where everything that is not “ghettopop” is not accepted. The parallel to culture industry and mass deception is that in our film, the “ghettopop” is the only acceptable thing to produce and hence the only musical-communication allowed deceiving the public to believing there is nothing else. What is interesting with this phenomenon is that we today can se examples of media monopoly and the strive for keeping the passive consumer, instead of one that explores other more “open” media channels.

torsdag 14 november 2013

Theme 1: Theory of Science post-reflection

This week’s theme was (as written above) Theory of Science. As I mentioned in my pre-reflection, I have some familiarity with the theme since I studied philosophy in before. So I was looking forward to discussing the topic even more and get input from others and possibly widen my understanding of the topic. Unfortunately, both the lecture and the seminar were canceled and hence the discussions never happened. I tried to read as much of the other’s reflections as I could, but after a while I noticed that since the questions didn’t involve that much reflection, the answers to the questions were quite similar. What was I supposed to comment on? I found it hard to comment on something unless there is a difference in the reflection in question and my own understanding.

So unfortunately, I have to say I haven’t learned that much this week. I didn’t find the texts that interesting, maybe because I didn’t understand them good enough or maybe because they were not just up my alley. Hopefully the next weeks’ texts, lectures and seminars will be better and I look forward to the next seminar…