Modelling an Auditory System
Signal Processing – Modelling an Auditory System
An auditory system mimics the behaviour of a biological cochlea found in humans and other mammals. The system converts a 1D discrete-time audio signal to a 2D time-frequency signal called an auditory spectrogram. From this spectrogram, audio information can be extracted as shown in Table 1. Its application includes hearing aids, speech and musical information retrieval, audio multimedia systems, and brain modelling.
The objective of this project is to model an auditory system using MATLAB. Note that this project is an individual task.
Index | Audio Information | Responsible For |
1 | Intensity | Sound loudness. |
2 | Direction | Indicates where a sound is coming from. |
3 | Pitch | Difference between musical notes and also male
and female voices. |
4 | Timbre | Sound colour and shape indicating from which
specific source a sound is coming from. |
Table 1: Information extractable from an auditory spectrogram.
To convert a one-dimensional (1D) sound signal into a two-dimensional (2D) time-frequency representation, a cochlear filterbank is used. A cochlear filterbank comprises multiple gammatone filters either in parallel or cascaded form. The bandwidth of each gammatone filter increases with increasing frequency so that a high centre frequency filter has a higher bandwidth than a filter with low centre frequency as shown in Figure 1(a).
A gammatone filter behaves like a bandpass filter. Each gammatone filter is tuned to a specific centre frequency so that it only responds to a specific frequency. So, when the input signal resonates close to the centre frequency of the filter, the filter will output a resonating signal at its centre frequency. Hence, a filterbank will have a bank of gammatone filters whose centre frequencies are tuned from low to high for the entire spectrum of a sound signal.
Figure 1: Increasing bandwidth with increasing centre frequency in the gain response of gammatone filters. (a) x-axis is linearly scaled where intervals between frequencies are the same; (b) x-axis is logarithmically-scaled where intervals between frequencies are nonlinear.
Ideally, the varying filters tuned differently will react to the different frequencies in the input signal and will output multiple signals. These signals are then half-wave rectified, where all negative values are set to 0 and only positive values are maintained. They can be visualised as a 2D image known as an auditory spectrogram, as shown in Figure 2.
An alternative method of showing an auditory spectrogram is by calculating the short-time Fourier transform (STFT) on a sound signal.
Figure 2: Block diagram of auditory system
MATLAB Model Tasks
Implement an auditory system in MATLAB using the following steps:
- Modify the sample code in tgz according to your specifications from Table 2 based on your right-most digit in your student number. After your changes are introduced, ensure the following:
- The heights of the two spectrograms are the same as the number of channels in your settings.
- The lowest centre frequency in your gain response display should be within ±8 Hz of your lowest centre frequency
Right-most index of
your student number |
Gammatone filter
with lowest centre frequency |
Number of channels,
𝑁 (gammatone filters) |
Gammatone filter order, 𝑝 |
0 | 60 Hz | 90 | 2 |
1 | 70 Hz | 92 | 3 |
2 | 80 Hz | 94 | 4 |
3 | 90 Hz | 96 | 5 |
4 | 100 Hz | 98 | 6 |
5 | 110 Hz | 100 | 2 |
6 | 120 Hz | 102 | 3 |
7 | 130 Hz | 104 | 4 |
8 | 140 Hz | 106 | 5 |
9 | 150 Hz | 108 | 6 |
Table 2: Cochlear filterbank specification
- In the same script file, generate a time vector (a vector is also known as an array) 𝑡1 that contains numbers from [0 to (𝑇 − 1)]/𝑠𝑟. Ensure the division by 𝑠𝑟 is done after generating the vector 0 to 𝑇 − 1. Note that 𝑇 is the length (in number of samples, not time duration) of the sound signal stored in wav that is found in gammatonegram.tgz and 𝑠𝑟 is the sampling rate of the sound signal.
- Use the sound signal found in sa2.wav provided in tgz, as input to your model. In MATLAB, display the waveform of the sound signal with respect to time vector 𝑡1 in figure 1 and label the x-axis to reflect time in seconds and y-axis to reflect amplitude (unitless).
- In the original figure 1 from tgz, two spectrograms are shown. Redesignate these spectrograms to figure 2. Identify the spectrogram generated by a gammatone filterbank and change its title to reflect this. Also, identify the spectrogram by the short-time Fourier transform (STFT) and change its title to reflect this.
- Generate a time vector 𝑡2 that contains 𝑛 number of samples in the range from 0 to the time duration of the sound signal in wav. Here, 𝑛 is a fixed number dependent on the length (number of samples) of the auditory spectrogram.
- Calculate and plot the average power (in Watts) of the STFT and auditory spectrograms at the top half and bottom half, respectively in MATLAB figure 3. Here, average power is to be computed independently for each column of the two Label the axes and title the graphs. Hint: See online Mathworks help
page on bandpower command. Also, the 𝑡2 time vector is helpful for display of the graphs and axis labels.
- In Figure 1 displayed above, only the gain response of every 5th channel of the gammatone filterbank is displayed. Generate and display the gain response 𝑔1 (the equation has already been implemented for you in the second argument of the plot line in demo_gammatone.m) of all the channels in the gammatone filterbank on a linearly scaled x-axis and the same response on a logarithmically scaled x-axis in figure 4 in Plot the linearly-scaled gain response on top of MATLAB figure 4 and the log-scaled gain response below it. In the graph, your settings from Table 2 can be checked inspecting the peak of the first filter (left-most curve). This value should within ±8 Hz of your setting from Table 2. The peak of the last filter (right-most curve) should be close to but less than 8 kHz.
- Display the centre frequencies of every 5th channel from the gain response in the command window using fprintf in a The centre frequencies are the maximum values of every channel in the gain response. The following string should be displayed in a new line in the command for every 5th channel: “Centre frequency of channel 𝑛: 𝑓𝑐 Hz” where 𝑛 is the channel number and 𝑓𝑐 is the centre frequency.
- Generate two temporal profiles – one from the auditory spectrogram and another from the STFT spectrogram. A temporal profile can be generated by summing all the rows of a
- Generate two spectral profiles – one from the auditory spectrogram and another from the STFT A spectral profile can be generated by summing all the columns of a spectrogram.
- Display two temporal profiles and two spectral profiles in figure 5. The x-axis of each temporal profile should be displayed with respect to 𝑡2 (in seconds). The x-axis of each spectral profile should be displayed with respect to 𝐹 and 𝐹2 vectors (in Hertz) that correspond to the two spectrograms – these vectors have been automatically generated for you in m. The amplitude (y-axis) for all four graphs are unitless. Display:
- The spectral profile from the STFT spectrogram on the top-left corner in MATLAB figure 5;
- The spectral profile from the auditory spectrogram on the top-right corner in MATLAB figure 5;
- The temporal profile from the STFT spectrogram on the bottom-left corner in MATLAB figure 5.
- The temporal profile from the auditory spectrogram on the bottom-right corner in MATLAB figure
- Use 2D correlation coefficient (CC) to show the quantitative difference between the following comparisons (note that only one CC should be generated per comparison). Use fprintf to display the comparisons below one line at a time in your command
- Auditory spectrogram versus STFT
- Auditory spectrogram bandpower versus STFT spectrogram
- Auditory spectrogram temporal profile versus STFT spectrogram temporal
- Auditory spectrogram spectral profile versus STFT spectrogram spectral
- Use symbolic variables and display the impulse response of a 𝑝-order gammatone filter for channel 10 where 𝑝 can be found from Table 2 based on your right-most index of your student number. Also substitute the numeric centre frequency for channel 10 into the 𝑓𝑐 Display the equation in 6 significant figures. The impulse response equation is defined by 𝑔[𝑛] in the Auditory Signal Processing.pdf slides.
Add comments to the code you have modified or introduced in MATLAB. Submit only the
MATLAB script files that you have modified on vUWS submission link under “Assessment 2”.
Progress Report (25%)
You are expected to complete up to task 4 from the MATLAB model section. Prepare a 2-page progress report on the tasks. Describe what you have done to complete the tasks. If you are unable to complete any task, explain what you are experiencing. Use any online English grammar and vocabulary checking application to ensure that your report is coherent and clear, e.g. Grammarly – marks will be given if you are able to convey your ideas clearly and concisely.
Submit your progress report and your MATLAB script files that you have modified using the Turnitin link in vUWS under “Assessment 1”.
Final Report (50%)
Prepare the final report with the results above using a standard format. The final report should include the images as well as the correlation coefficient results and the gammatone filter impulse response (screen capture – do not use your phone to capture any images). Use any online English grammar and vocabulary checking application to ensure that your report is coherent and clear, e.g. Grammarly – marks will be given if you are able to convey your ideas clearly and concisely.
The sections to be included in the final report are:
- Objectives – alternatively, you can include a motivation statement on why this project is important.
- Components of the auditory
- Modelling the auditory model using MATLAB using the specification from Table 2 clearly described. Also, mention about the filter order from Table 2 required to show the gammatone impulse response. Also, address the following questions in your report for additional marks:
- The inner hair cell response has all its negative values set to 0. How can these values be converted from negative to positive? Suggest a computational method to differentiate the current positive values with the positive values converted from the negative values.
- Suggest a computational method to attain temporal and spectral profiles with their amplitudes between 0 to
- Results (screen capture of all the; screen capture of MATLAB command window showing correlation coefficients (CC) and the symbolic equation). Comment on the CC results to indicate the degree of difference between pairs of vectors and matrices in task 12.
- Address which CC result is highest and thus, most
- Conversely, address which CC result is lowest and thus, least
- Conclusion (discuss your experience in using MATLAB for modelling of the auditory model, its usefulness, and difficulties).
- Either Harvard-style or IEEE-style referencing is acceptable – See the last slide in Auditory Signal Processing.pdf as an example.
Submit your final report and your MATLAB script files that you have modified using the Turnitin link in vUWS under “Assessment 3”.
Resources
Expert's Answer
Chat with our Experts
Want to contact us directly? No Problem. We are always here for you
Get Online
Online Tutoring Services