How to use VocalPy with scikit-learn to fit supervised learning models to acoustic features#
Many analyses in bioacoustics and communication rely on machine learning models. For example, it is common to use supervised machine learning models to support the idea that vocalizations contain information about individual identity or emotional valence. This is done by showing that a classifier can successfully predict the identity or valence of vocalizations when fit to acoustic features extracted from the sounds. See for example [1] or [2].
scikit-learn is one of the most widely used library in the Python programming language for fitting supervised machine learning models. Here we will show you how to extract acoustics features from sounds using VocalPy, and then fit a model to those features with scikit-learn. The example we will walk through is of classifying individual zebra finches using acoustic parameters extracted from their calls. The material here is adapted in part from the BioSound tutorial from the Theunissen lab (https://github.com/theunissenlab/BioSoundTutorial) that demonstrates how to use their library soundsig (https://github.com/theunissenlab/soundsig).
import numpy as np
import pandas as pd
import sklearn
import vocalpy as voc
For this how-to, we use a subset of data from this dataset shared by Elie and Theunissen, as used in their 2016 paper and their 2018 paper. To get the subset of data we will use, we can call the vocalpy.example()
function (that, under the hood, “fetches” the data using the excellent library pooch
).
zblib = voc.example("zblib", return_path=True)
Downloading file 'Elie-Theunissen-2016-vocalization-library-zebra-finch-subset.zip' from 'doi:10.5281/zenodo.10685639/Elie-Theunissen-2016-vocalization-library-zebra-finch-subset.zip' to '/home/docs/.cache/vocalpy'.
Unzipping contents of '/home/docs/.cache/vocalpy/Elie-Theunissen-2016-vocalization-library-zebra-finch-subset.zip' to '/home/docs/.cache/vocalpy/Elie-Theunissen-2016-vocalization-library-zebra-finch-subset.zip.unzip'
Since this example data is more than one file, when we call vocalpy.example()
we will get back an instance of the ExampleData
class. This class is like a dict
where you can get the values by using dot notation instead of keys, e.g. by writing zblib.sound
instead of zblib["sound"]
.
print(zblib)
ExampleData(sound=[PosixPath('/h...11-DC-01.wav'), PosixPath('/h...18-DC-02.wav'), PosixPath('/h...21-DC-04.wav'), PosixPath('/h...21-DC-06.wav'), PosixPath('/h...21-DC-07.wav'), PosixPath('/h...21-DC-08.wav'), ...])
By default, the vocalpy.example()
function gives us back an {} that contains VocalPy data types like vocalpy.Sound
, but in this case we want the paths to the files. We want the paths to the files because the filenames contain the ID of the zebra finch that made the sound, and below we will need to extract those IDs from the filenames so we can train a model to classify IDs. To get back a list
of pathlib.Path
instances, instead of vocalpy.Sound
instances, we set the argument return_path
to True
. We confirm that we got back paths by printing the first element of the list.
print(zblib.sound[0])
/home/docs/.cache/vocalpy/Elie-Theunissen-2016-vocalization-library-zebra-finch-subset.zip.unzip/Elie-Theunissen-2016-zebra-finch-song-library-subset/WhiLbl0010_110411-DC-01.wav
We make a helper function to get the bird IDs from the filenames.
We will use this below when we want to predict the bird ID from the extracted features.
def bird_id_from_path(wav_path):
"""Helper function that gets a bird ID from a path"""
return wav_path.name.split('_')[0]
We run a quick test to confirm this works as we expect.
bird_id_from_path(zblib.sound[0])
'WhiLbl0010'
Then we use a list comprehension to get the ID from all 91 files.
bird_ids = [
bird_id_from_path(wav_path)
for wav_path in zblib.sound
]
Feature extraction#
Now we extract the acoustic features we will use to classify.
To extract the acoustic features, we will use the function vocalpy.feature.biosound()
.
We will extract a subset of the features used by Elie and Theunissen in the articles linked above (and related work), that are depicted schematically below (figure 1 from their 2016 paper):
For this example we use only the features extracted from the temporal and spectral envelope, since those are relatively quick to extract. For an example that uses fundamental frequency estimation, see the notebook that this is adapted from: https://github.com/theunissenlab/BioSoundTutorial/blob/master/BioSound4.ipynb
Here we are going to use the FeatureExtractor
class. This works like other pipeline classes in VocalPy, where we tell the FeatureExtractor
what callback
we want to use, and we explicitly declare a set of parameters params
to use with the callback function. This design is meant to help us document the methods we use more clearly and concisely.
callback = voc.feature.biosound
params = dict(ftr_groups=("temporal", "spectral"))
extractor = voc.FeatureExtractor(callback, params)
We are going to only use channel of the audio to extract features. The function we will use to extract features, vocalpy.feature.biosound()
, will work on audio with multiple channels, but for demonstration purposes we just need one. To just get the first channel from each sound, we can use indexing (for more detail and examples of how this works, see the API documentation for this class: vocalpy.Sound
).
sounds = []
for wav_path in zblib.sound:
sounds.append(
voc.Sound.read(wav_path)[0] # indexing with `[0]` gives us the first channel
)
Now finally we can pass this list of (single-channel) sounds into the extract()
method, and get back a list of Features
, one for every sound.
features_list = extractor.extract(sounds, parallelize=True)
Show code cell output
[ ] | 0% Completed | 155.60 us
[############ ] | 30% Completed | 102.91 ms
[######################### ] | 63% Completed | 203.89 ms
[#################################### ] | 92% Completed | 305.68 ms
[########################################] | 100% Completed | 406.74 ms
Data preparation#
Now what we want to get from our extracted features is two NumPy arrays, X
and y
.
These represent the samples \(X_i\) in our dataset with their features \(x\), and the labels for those samples \(y_i\). In this case we have a total of \(m=\)91 samples (where \(i \in 1, 2, ... m\)).
We get these arrays as follows (noting there are always multiple ways to do things when you’re programming):
Take the
data
attribute of theFeatures
we got back from theFeatureExtractor
and convert it to apandas.DataFrame
with one row: the scalar set of features for exactly one soundUse
pandas
to concatenate all thoseDataFrame
s, so we end up with 91 rowsAdd a column to this
DataFrame
with the IDs of the birds – we then have \(X\) and \(y\) in a single table we could save to a csv file, to do further analysis on laterWe get \(X\) by using the
values
attribute of theDataFrame
, which is a numpy arrayWe get \(y\) using
pandas.factorize()
, that converts the unique set of strings in the"id"
column into integer class labels: i.e., since there are 4 birds, for every row we get a value from \(\{0, 1, 2, 3\}\)
df = pd.concat(
[features.data.to_pandas()
for features in features_list]
)
df.head()
mean_t | std_t | skew_t | kurtosis_t | entropy_t | max_amp | mean_s | std_s | skew_s | kurtosis_s | entropy_s | q1 | q2 | q3 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
channel | ||||||||||||||
0 | 0.086408 | 0.046053 | 0.082465 | 1.730271 | 0.989797 | 3398.840054 | 3445.838000 | 1128.554969 | 0.365371 | 6.485988 | 0.705099 | 2670.117188 | 3789.843750 | 4048.242188 |
0 | 0.106894 | 0.053288 | -0.115429 | 1.800985 | 0.991144 | 3324.856389 | 3718.839185 | 900.491254 | 1.804076 | 15.690687 | 0.608491 | 3273.046875 | 3875.976562 | 3962.109375 |
0 | 0.117597 | 0.053064 | -0.256712 | 1.898931 | 0.986517 | 2853.960603 | 3499.235181 | 1094.553821 | 1.233182 | 10.790179 | 0.652285 | 3057.714844 | 3402.246094 | 4091.308594 |
0 | 0.107487 | 0.055891 | -0.015197 | 1.769766 | 0.992411 | 3784.152991 | 3798.428046 | 815.427984 | 0.979291 | 11.873697 | 0.656394 | 3445.312500 | 3488.378906 | 4478.906250 |
0 | 0.102632 | 0.057484 | 0.042124 | 1.749709 | 0.994080 | 4240.745245 | 3867.019529 | 788.750213 | 0.413044 | 8.460579 | 0.645909 | 3445.312500 | 3531.445312 | 4565.039062 |
df["id"] = pd.array(bird_ids, dtype="str")
y, _ = df["id"].factorize()
X = df.values[:, :-1] # -1 because we don't want 'id' column
Fitting a Random Forest classifier#
Finally we will train a classifer from scikit-learn
to classify these individuals. The original paper uses Linear Discriminant Analysis, but here we fit a random forest classifier, again simply because the random forest models are fast to fit.
import sklearn.model_selection
First we split the data into training and test splits using sklearn.model_selection.train_test_split()
.
X_train, X_val, y_train, y_val = sklearn.model_selection.train_test_split(
X, y, stratify=y, train_size=0.8
)
from sklearn.ensemble import RandomForestClassifier
Finally we can instantiate our model and fit it.
clf = RandomForestClassifier(max_depth=2, random_state=0)
clf.fit(X_train, y_train)
RandomForestClassifier(max_depth=2, random_state=0)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
RandomForestClassifier(max_depth=2, random_state=0)
And we can evaluate the model’s performance on a test set.
print(
f"Accuracy: {clf.score(X_val, y_val) * 100:0.2f}%"
)
Accuracy: 73.68%
Looks pretty good!
Now you have seen a simple example of how to extract acoustic features from your data with VocalPy, and fit a model to them with scikit-learn.