DM2572 Matts' Portfolio: Theme 4: Quantitative research pre-reflection

I have read a paper on pitch extraction from noisy signals by Tetsuya Shimamura, Member and Hajime Kobayashi, in which they try to improve the "standard" Auto Correlation Function (ACF) used to estimate the frequency, in which vocal folds oscillate when speaking. This can e.g. be used for karaoke games (such as singstar) or for analysing speech to recognise language.

reference: Shimamura, Tetsuya, and Hajime Kobayashi. "Weighted autocorrelation for pitch extraction of noisy speech." Speech and Audio Processing, IEEE Transactions on 9.7 (2001): 727-730.

In the paper the authors propose a improved estimation algorithm, which they test and evaluate on eight 10 s long speech tracks (4 male and 4 female). This makes a total of 80 s of data which is divided into parts of 23 ms each, resulting in ca. 3470 data points. All these data points have assigned true values to them and the testing is done through adding noise with different level (SNR Inf,10 dB, 5 dB, 0 dB, -5 dB) to the speech tracks and by then measuring how well the algorithm estimates the pitch for each data point and each SNR level. Then the authors calculate the absolute distance between the estimated pitch and the true pitch, and tolerates a difference on ± 10 Hz. They also compare their algorithms result to three other algorithms results in order to show the improved performance.

The method used in the paper is quite "standard" in the field and since I have been working on project in almost the same field, I didn't learn that much from the method used. On the other hand, this is a solid method for evaluating the algorithm since it evaluated the relevant factors, that are also the aim of the paper. The authors try to improve the algorithm and therefore they also compare the improved one with the other ones to ensure that they have succeeded. They also try to estimate the performance with different levels of noise, which also is exactly what they mathematically prove the improved algorithm to handle better than the other ones.

The critic I have on the method is the tolerance and the data they use. Since frequency is not linearly perceived by our ears, a 10 Hz tolerance on a estimated 200 Hz pitch is less perceivable than a tolerance of 10 Hz on a 100 Hz pitch, which is roughly the mean pitches for men respectively women. The data might also be biased, since they only use 8 japanese 10 s sentences. The human speech system works universally in the same way, but language is different. Maybe the algorithm only works on japaneese? There is no discussion about this in the paper.

Physical Activity, Stress, and Self-Reported Upper Respiratory Tract Infection by Olle Bälter et al.

The study is a population-based study performed in a middle-size county with an "normal" rate of urbanisation. The researchers sent out web questionaries by email that the subjects were asked to fill. The subject were also given follow-up-questionaries to see if their circumstances changed. The questionaries included questions on how much they do physical activity, perceived stress, age, gender and other relevant information. From these data, the researchers tried to see patterns in what factors possibly could impact how much we suffer from URTI. This is done through calculating the risk of somebody developing URTI and comparing these with different group inside the population (men, women, young old etc.). They also try to fir the data to poisson regression models, i.e. they try to predict how the data can be predicted in the future.

The quantitative method used in the paper is based on questionnaires where subjects express how they think they are. These data might well be biased by what people like to think about themselves and might not reflect what they actually are. This uncertainty can of course be generalized through using many subjects, but could still make an impact on the research. Even though using these questionnaires as a base for the analyze, the authors uses statistical methods for validating the data and seek correlations between different groups in the population. In the end they conclude that there were a connection between physical activity, stress and URTI and this is done with the help of such measures and methods.

1 kommentar:

Unknown29 november 2013 kl. 02:50
Are you regarding the 3470 data points or the eight persons as the population of this study? I find it quite hard to decide on which is "right".

I think I would have the same critique regarding the use of a single language. However, unless they have a more robust technique ready it is advisable to start in a simple manner and then add more complexity. One may argue that this is something they should address in their discussion.

Regarding the linear approach to frequency, I think it is more advisable to use percent or cent. A semitone is ~5.9% or 100 cent across the entire spectrum.
SvaraRadera
Svar

Lägg till kommentar

torsdag 28 november 2013

Theme 4: Quantitative research pre-reflection

1 kommentar: