I
have read a paper on pitch extraction from noisy signals by Tetsuya Shimamura,
Member and Hajime Kobayashi, in which they try to improve the
"standard" Auto Correlation Function (ACF) used to estimate the
frequency, in which vocal folds oscillate when speaking. This can e.g. be used
for karaoke games (such as singstar) or for analysing speech to recognise
language.
reference: Shimamura, Tetsuya, and
Hajime Kobayashi. "Weighted autocorrelation for pitch extraction of noisy
speech." Speech and Audio Processing, IEEE Transactions on 9.7
(2001): 727-730.
In the paper the authors propose a
improved estimation algorithm, which they test and evaluate on eight 10 s long
speech tracks (4 male and 4 female). This makes a total of 80 s of data which
is divided into parts of 23 ms each, resulting in ca. 3470 data points. All
these data points have assigned true values to them and the testing is done
through adding noise with different level (SNR Inf,10 dB, 5 dB, 0 dB, -5 dB) to
the speech tracks and by then measuring how well the algorithm estimates the pitch
for each data point and each SNR level. Then the authors calculate the absolute
distance between the estimated pitch and the true pitch, and tolerates a
difference on ± 10 Hz. They also compare their algorithms result to three other
algorithms results in order to show the improved performance.
The method used in the paper is quite
"standard" in the field and since I have been working on project in
almost the same field, I didn't learn that much from the method used. On the
other hand, this is a solid method for evaluating the algorithm since it
evaluated the relevant factors, that are also the aim of the paper. The authors
try to improve the algorithm and therefore they also compare the improved one
with the other ones to ensure that they have succeeded. They also try to
estimate the performance with different levels of noise, which also is exactly
what they mathematically prove the improved algorithm to handle better than the
other ones.
The critic I have on the method is the
tolerance and the data they use. Since frequency is not linearly perceived by
our ears, a 10 Hz tolerance on a estimated 200 Hz pitch is less perceivable
than a tolerance of 10 Hz on a 100 Hz pitch, which is roughly the mean pitches
for men respectively women. The data might also be biased, since they only use
8 japanese 10 s sentences. The human speech system works universally in the
same way, but language is different. Maybe the algorithm only works on
japaneese? There is no discussion about this in the paper.
Physical
Activity, Stress, and Self-Reported Upper Respiratory Tract Infection by Olle Bälter
et al.
The study is a population-based study
performed in a middle-size county with an "normal" rate of
urbanisation. The researchers sent out web questionaries by email that the
subjects were asked to fill. The subject were also given follow-up-questionaries
to see if their circumstances changed. The questionaries included questions on
how much they do physical activity, perceived stress, age, gender and other
relevant information. From these data, the researchers tried to see patterns in
what factors possibly could impact how much we suffer from URTI. This is done through
calculating the risk of somebody developing URTI and comparing these with
different group inside the population (men, women, young old etc.). They also
try to fir the data to poisson regression models, i.e. they try to predict how
the data can be predicted in the future.
The quantitative method used in the
paper is based on questionnaires where subjects express how they think they
are. These data might well be biased by what people like to think about themselves
and might not reflect what they actually are. This uncertainty can of course be
generalized through using many subjects, but could still make an impact on the
research. Even though using these questionnaires as a base for the analyze, the
authors uses statistical methods for validating the data and seek correlations
between different groups in the population. In the end they conclude that there
were a connection between physical activity, stress and URTI and this is done
with the help of such measures and methods.