Quickstart: VocalPy 🐍 💬 in 15 minutes ⏲️#
This tutorial will introduce you to VocalPy, a core Python package for acoustic communication research.
Set up#
First we import vocalpy
.
import vocalpy as voc
Then we get some example data, from the Bengalese Finch song repository.
bfsongrepo = voc.example('bfsongrepo', return_path=True)
This gives us back an ExampleData
instance with sound
and annotation
attributes.
bfsongrepo
ExampleData(sound=[PosixPath('/h...2_0836.3.wav'), PosixPath('/h...2_0837.4.wav'), PosixPath('/h...2_0837.6.wav'), PosixPath('/h...2_0838.8.wav'), PosixPath('/h...2_0839.9.wav'), PosixPath('/h..._0840.10.wav'), ...], annotation=[PosixPath('/h...36.3.wav.csv'), PosixPath('/h...37.4.wav.csv'), PosixPath('/h...37.6.wav.csv'), PosixPath('/h...38.8.wav.csv'), PosixPath('/h...39.9.wav.csv'), PosixPath('/h...0.10.wav.csv'), ...])
The ExampleData
is just a Python dict
that lets us access the values through dot notation, by saying bfsongrepo.sound
as well as bfsongrepo["sound"]
, like the Bunch
class returned by functions in the scikit-learn datasets
module.
Since we set the argument return_path=True
, these attributes are each a list
of pathlib.Path
instances. The default for return_path
is False
, and when it is False
, we get back the data types built into VocalPy that we will introduce below.
Here we want the paths so we can show how to load data in with VocalPy.
Data types for acoustic communication#
One of the main goals of VocalPy is to make it easier to read and write code for bioacoustics and acoustic communication. One way VocalPy achieves that is by providing data types that map onto concepts from those research domains. The benefit of these data types is that they let as researchers write and read code with the same words we use when we talk to each other. Another benefit of the data types are that they make our code more succinct.
Before we walk through the data types, we show two snippets of code.
The first is written in standard scientific Python.
import soundfile
from scipy.signal import spectrogram
# we write a helper function to compute a spectrogram
def spect(data, fs, fft=1024, window='Hann'):
f, t, s = spectrogram(data, fs, fft=fft, window=window)
return f, t, s
# notice that we need two variables for one sound
data_bird1, fs_bird1 = soundfile.read('./path/to/bird1.wav')
# that turn into three more variables for the spectrogram
f_bird1, t_bird1, s_bird1 = spect(data_bird1, fs_bird1)
# and another two variables for another sound
data_bird2, fs_bird2 = soundfile.read('./path/to/bird2.wav')
# and that again turns into three more variables for the spectrogram
f_bird2, t_bird2, s_bird2 = spect(data_bird2, fs_bird2)
# these variables are cluttering up our code!
# of course, it's common for most audio signals in your data to have the same sampling rate
# but this is definitely not always true!
# and likewise it's common to generate spectrograms all with the same frequency bins
# but still we need to do all this book-keeping with variables
# definitions of functions below are not shown in snippet
ftrs_bird1 = extract_features(s_bird1, t_bird1, f_bird1)
ftrs_bird2 = extract_features(s_bird2, t_bird2, f_bird2)
rejected_h0, pval = stats_helper(ftrs_bird1, ftrs_bird2)
The second snippet is written with VocalPy.
import vocalpy as voc
from scipy.signal import spectrogram
# we write a helper function to compute a spectrogram
# but notice we get rid of one of the arguments
# instead of "data" and "sampling rate" we just have a Sound
# we'll see below that the sound "encapsulates" the `data` and `samplerate` attributes
def spect(sound, fft=1024, window='Hann'):
f, t, s = spectrogram(audio.data, audio.samplerate,
fft=fft, window=window)
# instead of returning three variables we just return one Spectrogram instance
# that again encapsulates the spectrogram matrix, the frequencies, and the times
return voc.Spectrogram(data=s, frequencies=f, times=t)
# we can also reduce some duplication using a dictionary that maps IDs to variables
ftrs = {}
for bird in ('bird1', 'bird2'):
# here we load the Sound with the data and samplerate attributes
# so we only have one variable instead of two
sound = voc.Sound.read(f'./path/to/{bird}.wav')
spect = spect(audio)
ftrs[bird] = extract_features(spect)
rejected_h0, pval = stats_helper(ftrs['bird1'], ftrs['bird2'])
As the comments indicate, using VocalPy makes the code more succinct, and more readable.
To learn more about the design and goals of VocalPy, please check out our Forum Acusticum 2023 Proceedings Paper, “Introducing VocalPy”, this PyData Global 2023 talk and this Python Exchange talk.
Now let’s look at the data types that VocalPy provides for acoustic comunication.
Data type for sound: vocalpy.Sound
#
The first data type we’ll learn about is one that represents a sound, not suprisingly named vocalpy.Sound
.
We start here since all our analyses start with sound.
We can load an audio signal from a file using the vocalpy.Sound.read()
method.
wav_path = bfsongrepo.sound[0] # we write this out just to make it explicit that we have a pathlib.Path pointing to a wav audio file
a_sound = voc.Sound.read(wav_path)
print(a_sound)
vocalpy.Sound(data=array([[-0.00...-0.00753784]]), samplerate=32000), channels=1, samples=290519, duration=9.079)
A sound has two attributes:
data
, the audio signal itself, with two dimensions: (channels, samples)
print(a_sound.data)
[[-0.00637817 -0.00668335 -0.00747681 ... -0.00714111 -0.00769043
-0.00753784]]
samplerate
, the sampling rate for the audio
print(a_sound.samplerate)
32000
A Sound
also has three properties, derived from its data:
channels
, the number of channelssamples
, the number of samples, andduration
, the number of samples divided by the sampling rate.
print(
f"This sound comes from an audio file with {a_sound.channels} channel, "
f"{a_sound.samples} samples, and a duration of {a_sound.duration:.3f} seconds"
)
This sound comes from an audio file with 1 channel, 290519 samples, and a duration of 9.079 seconds
One of the reasons VocalPy provides this data type, and the others we’re about to show you here, is that it helps you write more succinct code that’s easier to read: for you, when you come back to your code months from now, and for others that want to read the code you share.
When you are working with your own data, instead of example data built into VocalPy, you will do something like:
Get all the paths to the sound files in a directory using a convenience function that VocalPy gives us in its
paths
module,vocalpy.paths.from_dir()
Read all the sound files into memory using the method
vocalpy.Sound.read()
:
This is shown in the snippet below
data_dir = ('data/bfsongrepo/gy6or6/032312/')
wav_paths = voc.paths.from_dir(data_dir, 'wav')
sounds = [
voc.Sound.read(wav_path) for wav_path in wav_paths
]
We’ll demonstrate this now.
To demonstrate, we use the parent
attribute of one of the paths to the wav files in our example bfsongrepo
data.
In this case, the parent
is the directory that the wav file is in.
We can be sure that all the wav files are in this directory, because when you call vocalpy.example()
with the name of the example dataset, 'bfsongrepo'
, VocalPy uses the library pooch
(https://www.fatiando.org/pooch/latest/index.html) to “fetch” that dataset off of Zenodo and download it into a local “cache” directory.
data_dir = bfsongrepo.sound[0].parent
print(data_dir)
/home/docs/.cache/vocalpy/bfsongrepo.tar.gz.untar
We then use the vocalpy.paths.from_dir()
function to get all the wav files from that directory.
wav_paths = voc.paths.from_dir(data_dir, 'wav')
Not surprisingly, these are the wav files we already have in our bfsongrepo
example data.
sorted(wav_paths) == sorted(bfsongrepo.sound)
True
(We’re just showing how you would do this with a directory of your data.)
Finally we can load all these files, as was shown in the last line of the snippet.
sounds = [
voc.Sound.read(wav_path) for wav_path in wav_paths
]
Next we’ll show how to work with sound in a pipeline for processing data.
For more detail on how to use the vocalpy.Sound
class,
please see the “examples” section of the API documentation
(that you can go to by clicking on the name of the class
in this sentence).
Classes for steps in pipelines for processing data in acoustic communication#
In addition to data types for acoustic communication, VocalPy provides you with classes that represent steps in pipelines for processing that data. These classes are also written with readability and reproducibility in mind.
Let’s use one of those classes, SpectrogramMaker
, to make a spectrogram from each one of the wav files that we loaded above.
We’ll write a brief snippet to do so, and then we’ll explain what we did.
params = {'n_fft': 512, 'hop_length': 64}
callback = voc.spectrogram
spect_maker = voc.SpectrogramMaker(callback=callback, params=params)
spects = spect_maker.make(sounds, parallelize=True)
Show code cell output
[ ] | 0% Completed | 147.61 us
[ ] | 0% Completed | 175.01 ms
[ ] | 0% Completed | 276.09 ms
[ ] | 0% Completed | 377.14 ms
[ ] | 0% Completed | 490.85 ms
[ ] | 0% Completed | 595.50 ms
[ ] | 0% Completed | 699.50 ms
[ ] | 0% Completed | 806.24 ms
[ ] | 0% Completed | 907.33 ms
[ ] | 0% Completed | 1.01 s
[####### ] | 18% Completed | 1.11 s
[################## ] | 45% Completed | 1.22 s
[################################ ] | 81% Completed | 1.32 s
[########################################] | 100% Completed | 1.42 s
Notice a couple of things about this snippet:
In line 1, you declare the parameters that you use to generate spectrograms explicitly, as a dictionary. This helps with reproducibility by encouraging you to document those parameters
In line 2, you also decide what function you will use to generate the spectrograms. Here we use the helper function
vocalpy.spectrogram()
.In line 3, you create an instance of the
SpectrogramMaker
class with the function you want to use to generate spectrograms, and the parameters to use with that function. We refer to the function we pass in as acallback
, because theSpectrogramMaker
will “call back” to this function when it makes a spectrogram.In line 4, you make the spectrograms, with a single call to the method
vocalpy.SpectrogramMaker.make()
. You pass in the sounds we loaded earlier, and you tell VocalPy that you want to parallelize the generation of the spectrograms. This is done for you, using the librarydask
.
Data type: vocalpy.Spectrogram
#
As you might have guessed, when we call vocalpy.SpectrogramMaker.make()
, we get back a list of spectrograms.
This is the next data type we’ll look at.
We inspect the first spectrogram we loaded.
a_spect = spects[0]
print(a_spect)
vocalpy.Spectrogram(data=array([[[ -1....3.01612196]]]), frequencies=array([ 0....7.5, 16000. ]), times=array([0.000e... 9.078e+00]))
As before, we’ll walk through the attributes of this class. But since the whole point of a spectrogram is to let us see sound, let’s actually look at the spectrogram, instead of staring at arrays of numbers.
We do so by calling vocalpy.plot.spectrogram()
.
voc.plot.spectrogram(
a_spect,
tlim = [2.6, 4],
flim=[500,12500],
)
(<Figure size 640x480 with 1 Axes>, <Axes: >)

We see that we have a spectrogram of Bengalese finch song.
Now that we know what we’re working with, let’s actually inspect the attributes of the vocalpy.Spectrogram
instance.
There are three attributes we care about here.
data
: this is the spectrogram itself – as with the other data types,likevocalpy.Sound
, the attribute namedata
indicates this main data we care about
print(a_spect.data)
[[[ -1.80471366 -1.91454266 -6.29578053 ... -9.11453071 -5.11814549
-3.03580712]
[ -2.05547237 -1.14437889 -2.08431091 ... -5.57148069 -2.0905174
-1.95497683]
[ -3.15048083 -2.5644456 -4.74032384 ... -0.93339209 -0.76606516
-1.95909653]
...
[-53.01612196 -53.01612196 -53.01612196 ... -53.01612196 -53.01612196
-49.54066399]
[-52.93058939 -52.15275056 -51.2442843 ... -52.89431799 -51.76220215
-51.55653316]
[-51.90524123 -53.01612196 -53.01612196 ... -53.01612196 -53.01612196
-53.01612196]]]
Let’s look at the shape of data
. It’s really just a NumPy array, so we inspect the array’s shape
attribute.
print(a_spect.data.shape)
(1, 257, 4540)
We see that we have an array with dimensions (channels, frequencies, times). The last two dimensions correspond to the next two attributes we will look at.
frequencies
, a vector of the frequency for each row of the spectrogram.
print(a_spect.frequencies[:10])
[ 0. 62.5 125. 187.5 250. 312.5 375. 437.5 500. 562.5]
And we can confirm it has a length equal to the number of rows in the spectrogram.
print(a_spect.frequencies.shape)
(257,)
times
, a vector of the time for each column in the spectrogram.
print(a_spect.times[:10])
[0. 0.002 0.004 0.006 0.008 0.01 0.012 0.014 0.016 0.018]
We can likewise see it that the times
vector has a shape equal to the number of columns in the spectrogram.
print(a_spect.times.shape)
(4540,)
Just like with the Sound
class, VocalPy gives us the ability to conveniently read and write spectrograms from files. This saves us from generating spectrograms over and over. Computing spectrograms can be computionally expensive, if your audio has a high sampling rate or you are using methods like multi-taper spectrograms. Saving spectrograms from files also makes it easier for you to share your data in the exact form you used it, so that it’s easier to replicate your analyses.
To see this in action, let’s write our spectrograms to files.
import pathlib
DATA_DIR = pathlib.Path('./data')
DATA_DIR.mkdir(exist_ok=True)
for spect, wav_path in zip(spects, wav_paths):
spect.write(
DATA_DIR / (wav_path.name + '.spect.npz')
)
Notice that the extension is 'npz'
; this is a file format that NumPy uses to save mulitple arrays in a single file. By convention we include the file extension of the source audio, and another “extension” that incidicates this is a spectrogram, so that the file name ends with '.wav.spect.npz'
.
We can confirm that reading and writing spectrograms to disk works as we expect using the method vocalpy.Spectrogram.read()
spect_paths = voc.paths.from_dir(DATA_DIR, '.spect.npz')
spects_loaded = [
voc.Spectrogram.read(spect_path)
for spect_path in spect_paths
]
We compare with the equality operator to confirm we loaded what we saved.
all([
spect == spect_loaded
for spect, spect_loaded in zip(spects, spects_loaded)
])
True
Notice that we can be sure that spects
and spects_loaded
are in the same order, because vocalpy.paths.from_dir()
calls sorted()
on the paths that it finds, and our spectrogram files will be in the same order as the audio files because of the naming convention we used: the name of the audio file, plus the extension “.spect.npz
”. If you used a different naming convention, you’d need to make sure both lists are in the same order a different way (you can tell sorted()
how to sort using its key
argument).
Data type: vocalpy.Annotation
#
The last data type we’ll look at is for annotations. Such annotations are important for analysis of aocustic communication and behavior. Under the hood, VocalPy uses the pyOpenSci package crowsetta
(vocalpy/crowsetta).
import vocalpy as voc
annots = [voc.Annotation.read(notmat_path, format='simple-seq')
for notmat_path in bfsongrepo.annotation]
We inspect one of the annotations. Again as with other data types, we can see there is a data
attribute. In this case it contains the crowsetta.Annotation
.
print(annots[1])
Annotation(data=Annotation(annot_path=PosixPath('/home/docs/.cache/vocalpy/bfsongrepo.tar.gz.untar/gy6or6_baseline_220312_0837.4.wav.csv'), notated_path=None, seq=<Sequence with 65 segments>), path=PosixPath('/home/docs/.cache/vocalpy/bfsongrepo.tar.gz.untar/gy6or6_baseline_220312_0837.4.wav.csv'))
We plot the spectrogram along with the annotations.
voc.plot.annotated_spectrogram(
spect=spects[1],
annot=annots[1],
tlim = [3.2, 3.9],
flim=[500,12500],
);

This crash course in VocalPy has introduced you to the key features and goals of the library.
To learn more and see specific examples of usage, please see the how-to section of the user guide). For example, there you can find walkthroughs of how to use VocalPy to extract acoustic features that you fit a classifier to with scikit-learn, and how to use VocalPy to prepare your data for dimensionality reduction and clustering with UMAP and HDBSCAN.
We are actively developing the library to meet your needs and would love to hear your feedback in our forum.