Hints for Python Challenge 2
The easiest way to manipulate audio data is as a '.wav' file.
Packages
In this solution to this problem, pygame can be used to play sound. The scipy packages has modules for loading '.wav' sound files within scipy.io, and scipy.signal has a method to create a spectrogram.
Within matplotlib the contourf function is used to plot the spectrogram above. Try experimenting with different colourmaps (using colormap within matplotlib)
Hints
- It can be quite tricky to find the path to a file sometimes. On a first run, the easiest way to open a file may be to specify the path directly, but it is best practise to append paths to sys.path.
- Often a wav file will be in `stereo', so the sound is in two channels (left and right). A single signal can be found for analysis either by finding the average of these two signals (for example using numpy.mean) or each channel could be chosen individually.
- The data that is read in from scipy.io is for a whole song. What time does each entry of the array count for? It is helpful to only select a few seconds of the song to analyse, because each file is large.
- The output of the spectrogram is in the form of 'complex numbers'. To convert to find the 'magnitude' of the response, which we are plotting, use the 'abs' function within numpy.
- When plotting the spectrogram, sometimes the range is too large to get a nice picture. Adding 1 to all points in the logarithm (for example, Z = np.log10(np.abs(spect) + 1)) sometimes makes figures more clear by ignoring very small responses.
Notes
Sound is perceived on a logarithmic scale with frequency. That is, doubling the frequency increases the pitch of a note by one octave. Plotting the spectrogram with a logarithmic frequency scale (using, for example, plt.gca().set_yscale('log') may therefore make the spectrogram more intelligible, and also highlights low notes better than if a linear scale from 0 to 22 kHz is used (human hearing is approximately 0 to 20 kHz).