How to use VocalPy with scikit-learn to fit supervised learning models to acoustic features

How to use VocalPy with scikit-learn to fit supervised learning models to acoustic features#

Many analyses in bioacoustics and communication rely on machine learning models. For example, it is common to use supervised machine learning models to support the idea that vocalizations contain information about individual identity or emotional valence. This is done by showing that a classifier can successfully predict the identity or valence of vocalizations when fit to acoustic features extracted from the sounds. See for example [1] or [2].

scikit-learn is one of the most widely used library in the Python programming language for fitting supervised machine learning models. Here we will show you how to extract acoustics features from sounds using VocalPy, and then fit a model to those features with scikit-learn. The example we will walk through is of classifying individual zebra finches using acoustic parameters extracted from their calls. The material here is adapted in part from the BioSound tutorial from the Theunissen lab (https://github.com/theunissenlab/BioSoundTutorial) that demonstrates how to use their library soundsig (https://github.com/theunissenlab/soundsig).

import numpy as np
import pandas as pd
import sklearn
import vocalpy as voc

For this how-to, we use a subset of data from this dataset shared by Elie and Theunissen, as used in their 2016 paper and their 2018 paper. To get the subset of data we will use, we can call the vocalpy.example() function (that, under the hood, “fetches” the data using the excellent library pooch).

zblib = voc.example("zblib", return_path=True)
Downloading file 'Elie-Theunissen-2016-vocalization-library-zebra-finch-subset.zip' from 'doi:10.5281/zenodo.10685639/Elie-Theunissen-2016-vocalization-library-zebra-finch-subset.zip' to '/home/docs/.cache/vocalpy'.
Unzipping contents of '/home/docs/.cache/vocalpy/Elie-Theunissen-2016-vocalization-library-zebra-finch-subset.zip' to '/home/docs/.cache/vocalpy/Elie-Theunissen-2016-vocalization-library-zebra-finch-subset.zip.unzip'

Since this example data is more than one file, when we call vocalpy.example() we will get back an instance of the ExampleData class. This class is like a dict where you can get the values by using dot notation instead of keys, e.g. by writing zblib.sound instead of zblib["sound"].

print(zblib)
ExampleData(sound=[PosixPath('/h...11-DC-01.wav'), PosixPath('/h...18-DC-02.wav'), PosixPath('/h...21-DC-04.wav'), PosixPath('/h...21-DC-06.wav'), PosixPath('/h...21-DC-07.wav'), PosixPath('/h...21-DC-08.wav'), ...])

By default, the vocalpy.example() function gives us back an {} that contains VocalPy data types like vocalpy.Sound, but in this case we want the paths to the files. We want the paths to the files because the filenames contain the ID of the zebra finch that made the sound, and below we will need to extract those IDs from the filenames so we can train a model to classify IDs. To get back a list of pathlib.Path instances, instead of vocalpy.Sound instances, we set the argument return_path to True. We confirm that we got back paths by printing the first element of the list.

print(zblib.sound[0])
/home/docs/.cache/vocalpy/Elie-Theunissen-2016-vocalization-library-zebra-finch-subset.zip.unzip/Elie-Theunissen-2016-zebra-finch-song-library-subset/WhiLbl0010_110411-DC-01.wav

We make a helper function to get the bird IDs from the filenames.

We will use this below when we want to predict the bird ID from the extracted features.

def bird_id_from_path(wav_path):
    """Helper function that gets a bird ID from a path"""
    return wav_path.name.split('_')[0]

We run a quick test to confirm this works as we expect.

bird_id_from_path(zblib.sound[0])
'WhiLbl0010'

Then we use a list comprehension to get the ID from all 91 files.

bird_ids = [
    bird_id_from_path(wav_path)
    for wav_path in zblib.sound
]

Feature extraction#

Now we extract the acoustic features we will use to classify.
To extract the acoustic features, we will use the function vocalpy.feature.biosound().

We will extract a subset of the features used by Elie and Theunissen in the articles linked above (and related work), that are depicted schematically below (figure 1 from their 2016 paper):

Figure 1 from Elie Theunissen 2016

For this example we use only the features extracted from the temporal and spectral envelope, since those are relatively quick to extract. For an example that uses fundamental frequency estimation, see the notebook that this is adapted from: https://github.com/theunissenlab/BioSoundTutorial/blob/master/BioSound4.ipynb

Here we are going to use the FeatureExtractor class. This works like other pipeline classes in VocalPy, where we tell the FeatureExtractor what callback we want to use, and we explicitly declare a set of parameters params to use with the callback function. This design is meant to help us document the methods we use more clearly and concisely.

callback = voc.feature.biosound
params = dict(ftr_groups=("temporal", "spectral"))
extractor = voc.FeatureExtractor(callback, params)

We are going to only use channel of the audio to extract features. The function we will use to extract features, vocalpy.feature.biosound(), will work on audio with multiple channels, but for demonstration purposes we just need one. To just get the first channel from each sound, we can use indexing (for more detail and examples of how this works, see the API documentation for this class: vocalpy.Sound).

sounds = []
for wav_path in zblib.sound:
    sounds.append(
        voc.Sound.read(wav_path)[0]  # indexing with `[0]` gives us the first channel
    )

Now finally we can pass this list of (single-channel) sounds into the extract() method, and get back a list of Features, one for every sound.

features_list = extractor.extract(sounds, parallelize=True)
Hide code cell output
[                                        ] | 0% Completed | 155.60 us
[############                            ] | 30% Completed | 102.91 ms
[#########################               ] | 63% Completed | 203.89 ms
[####################################    ] | 92% Completed | 305.68 ms
[########################################] | 100% Completed | 406.74 ms

Data preparation#

Now what we want to get from our extracted features is two NumPy arrays, X and y.

These represent the samples \(X_i\) in our dataset with their features \(x\), and the labels for those samples \(y_i\). In this case we have a total of \(m=\)91 samples (where \(i \in 1, 2, ... m\)).

We get these arrays as follows (noting there are always multiple ways to do things when you’re programming):

  • Take the data attribute of the Features we got back from the FeatureExtractor and convert it to a pandas.DataFrame with one row: the scalar set of features for exactly one sound

  • Use pandas to concatenate all those DataFrames, so we end up with 91 rows

  • Add a column to this DataFrame with the IDs of the birds – we then have \(X\) and \(y\) in a single table we could save to a csv file, to do further analysis on later

  • We get \(X\) by using the values attribute of the DataFrame, which is a numpy array

  • We get \(y\) using pandas.factorize(), that converts the unique set of strings in the "id" column into integer class labels: i.e., since there are 4 birds, for every row we get a value from \(\{0, 1, 2, 3\}\)

df = pd.concat(
    [features.data.to_pandas()
    for features in features_list]
)
df.head()
mean_t std_t skew_t kurtosis_t entropy_t max_amp mean_s std_s skew_s kurtosis_s entropy_s q1 q2 q3
channel
0 0.086408 0.046053 0.082465 1.730271 0.989797 3398.840054 3445.838000 1128.554969 0.365371 6.485988 0.705099 2670.117188 3789.843750 4048.242188
0 0.106894 0.053288 -0.115429 1.800985 0.991144 3324.856389 3718.839185 900.491254 1.804076 15.690687 0.608491 3273.046875 3875.976562 3962.109375
0 0.117597 0.053064 -0.256712 1.898931 0.986517 2853.960603 3499.235181 1094.553821 1.233182 10.790179 0.652285 3057.714844 3402.246094 4091.308594
0 0.107487 0.055891 -0.015197 1.769766 0.992411 3784.152991 3798.428046 815.427984 0.979291 11.873697 0.656394 3445.312500 3488.378906 4478.906250
0 0.102632 0.057484 0.042124 1.749709 0.994080 4240.745245 3867.019529 788.750213 0.413044 8.460579 0.645909 3445.312500 3531.445312 4565.039062
df["id"] = pd.array(bird_ids, dtype="str")
y, _ = df["id"].factorize()
X = df.values[:, :-1]  # -1 because we don't want 'id' column

Fitting a Random Forest classifier#

Finally we will train a classifer from scikit-learn to classify these individuals. The original paper uses Linear Discriminant Analysis, but here we fit a random forest classifier, again simply because the random forest models are fast to fit.

import sklearn.model_selection

First we split the data into training and test splits using sklearn.model_selection.train_test_split().

X_train, X_val, y_train, y_val = sklearn.model_selection.train_test_split(
    X, y, stratify=y, train_size=0.8
)
from sklearn.ensemble import RandomForestClassifier

Finally we can instantiate our model and fit it.

clf = RandomForestClassifier(max_depth=2, random_state=0)
clf.fit(X_train, y_train)
RandomForestClassifier(max_depth=2, random_state=0)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

And we can evaluate the model’s performance on a test set.

print(
    f"Accuracy: {clf.score(X_val, y_val) * 100:0.2f}%"
)
Accuracy: 73.68%

Looks pretty good!

Now you have seen a simple example of how to extract acoustic features from your data with VocalPy, and fit a model to them with scikit-learn.