Quickstart: VocalPy 🐍 💬 in 15 minutes ⏲️

Quickstart: VocalPy 🐍 💬 in 15 minutes ⏲️#

This tutorial will introduce you to VocalPy, a core Python package for acoustic communication research.

Set up#

First we import vocalpy.

import vocalpy as voc

Then we get some example data, from the Bengalese Finch song repository.

bfsongrepo = voc.example('bfsongrepo', return_path=True)

Show code cell output

Hide code cell output

---------------------------------------------------------------------------
JSONDecodeError                           Traceback (most recent call last)
File ~/checkouts/readthedocs.org/user_builds/vocalpy/envs/latest/lib/python3.13/site-packages/requests/models.py:976, in Response.json(self, **kwargs)
    975 try:
--> 976     return complexjson.loads(self.text, **kwargs)
    977 except JSONDecodeError as e:
    978     # Catch JSON-related errors and raise as requests.JSONDecodeError
    979     # This aliases json.JSONDecodeError and simplejson.JSONDecodeError

File ~/.asdf/installs/python/3.13.3/lib/python3.13/json/__init__.py:346, in loads(s, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
    343 if (cls is None and object_hook is None and
    344         parse_int is None and parse_float is None and
    345         parse_constant is None and object_pairs_hook is None and not kw):
--> 346     return _default_decoder.decode(s)
    347 if cls is None:

File ~/.asdf/installs/python/3.13.3/lib/python3.13/json/decoder.py:345, in JSONDecoder.decode(self, s, _w)
    341 """Return the Python representation of ``s`` (a ``str`` instance
    342 containing a JSON document).
    343 
    344 """
--> 345 obj, end = self.raw_decode(s, idx=_w(s, 0).end())
    346 end = _w(s, end).end()

File ~/.asdf/installs/python/3.13.3/lib/python3.13/json/decoder.py:363, in JSONDecoder.raw_decode(self, s, idx)
    362 except StopIteration as err:
--> 363     raise JSONDecodeError("Expecting value", s, err.value) from None
    364 return obj, end

JSONDecodeError: Expecting value: line 1 column 1 (char 0)

During handling of the above exception, another exception occurred:

JSONDecodeError                           Traceback (most recent call last)
Cell In[2], line 1
----> 1 bfsongrepo = voc.example('bfsongrepo', return_path=True)

File ~/checkouts/readthedocs.org/user_builds/vocalpy/envs/latest/lib/python3.13/site-packages/vocalpy/examples/_examples.py:398, in example(name, return_path)
    392     raise ValueError(
    393         f"No example data found with name: {name}. "
    394         "To see the names of all example data, call `vocalpy.examples.show()`"
    395     )
    397 example_: Example = REGISTRY[name]
--> 398 return example_.load(return_path=return_path)

File ~/checkouts/readthedocs.org/user_builds/vocalpy/envs/latest/lib/python3.13/site-packages/vocalpy/examples/_examples.py:257, in Example.load(self, return_path)
    255 if self.requires_download:
    256     try:
--> 257         POOCH.load_registry_from_doi()
    258     except requests.exceptions.ConnectionError as e:
    259         raise ConnectionError(
    260             "Unable to connect to registry to download example dataset. "
    261             "This may be due to an issue with an internet connection."
    262         ) from e

File ~/checkouts/readthedocs.org/user_builds/vocalpy/envs/latest/lib/python3.13/site-packages/pooch/core.py:704, in Pooch.load_registry_from_doi(self)
    701 repository = doi_to_repository(doi)
    703 # Call registry population for this repository
--> 704 return repository.populate_registry(self)

File ~/checkouts/readthedocs.org/user_builds/vocalpy/envs/latest/lib/python3.13/site-packages/pooch/downloaders.py:907, in ZenodoRepository.populate_registry(self, pooch)
    891 def populate_registry(self, pooch):
    892     """
    893     Populate the registry using the data repository's API
    894 
   (...)    905     This method supports both the legacy and the new API.
    906     """
--> 907     for filedata in self.api_response["files"]:
    908         checksum = filedata["checksum"]
    909         if self.api_version == "legacy":

File ~/checkouts/readthedocs.org/user_builds/vocalpy/envs/latest/lib/python3.13/site-packages/pooch/downloaders.py:810, in ZenodoRepository.api_response(self)
    804     import requests  # pylint: disable=C0415
    806     article_id = self.archive_url.split("/")[-1]
    807     self._api_response = requests.get(
    808         f"{self.base_api_url}/{article_id}",
    809         timeout=DEFAULT_TIMEOUT,
--> 810     ).json()
    812 return self._api_response

File ~/checkouts/readthedocs.org/user_builds/vocalpy/envs/latest/lib/python3.13/site-packages/requests/models.py:980, in Response.json(self, **kwargs)
    976     return complexjson.loads(self.text, **kwargs)
    977 except JSONDecodeError as e:
    978     # Catch JSON-related errors and raise as requests.JSONDecodeError
    979     # This aliases json.JSONDecodeError and simplejson.JSONDecodeError
--> 980     raise RequestsJSONDecodeError(e.msg, e.doc, e.pos)

JSONDecodeError: Expecting value: line 1 column 1 (char 0)

This gives us back an ExampleData instance with sound and annotation attributes.

bfsongrepo

The ExampleData is just a Python dict that lets us access the values through dot notation, by saying bfsongrepo.sound as well as bfsongrepo["sound"], like the Bunch class returned by functions in the scikit-learn datasets module.

Since we set the argument return_path=True, these attributes are each a list of pathlib.Path instances. The default for return_path is False, and when it is False, we get back the data types built into VocalPy that we will introduce below.

Here we want the paths so we can show how to load data in with VocalPy.

Data types for acoustic communication#

One of the main goals of VocalPy is to make it easier to read and write code for bioacoustics and acoustic communication. One way VocalPy achieves that is by providing data types that map onto concepts from those research domains. The benefit of these data types is that they let as researchers write and read code with the same words we use when we talk to each other. Another benefit of the data types are that they make our code more succinct.

Before we walk through the data types, we show two snippets of code.

The first is written in standard scientific Python.

import soundfile
from scipy.signal import spectrogram

# we write a helper function to compute a spectrogram
def spect(data, fs, fft=1024, window='Hann'):
    f, t, s = spectrogram(data, fs, fft=fft, window=window)
    return f, t, s

# notice that we need two variables for one sound
data_bird1, fs_bird1 = soundfile.read('./path/to/bird1.wav')
# that turn into three more variables for the spectrogram
f_bird1, t_bird1, s_bird1 = spect(data_bird1, fs_bird1)
# and another two variables for another sound
data_bird2, fs_bird2 = soundfile.read('./path/to/bird2.wav')
# and that again turns into three more variables for the spectrogram
f_bird2, t_bird2, s_bird2 = spect(data_bird2, fs_bird2)
# these variables are cluttering up our code!
# of course, it's common for most audio signals in your data to have the same sampling rate
# but this is definitely not always true!
# and likewise it's common to generate spectrograms all with the same frequency bins
# but still we need to do all this book-keeping with variables

# definitions of functions below are not shown in snippet
ftrs_bird1 = extract_features(s_bird1, t_bird1, f_bird1)
ftrs_bird2 = extract_features(s_bird2, t_bird2, f_bird2)
rejected_h0, pval = stats_helper(ftrs_bird1, ftrs_bird2)

The second snippet is written with VocalPy.

import vocalpy as voc
from scipy.signal import spectrogram

# we write a helper function to compute a spectrogram
# but notice we get rid of one of the arguments
# instead of "data" and "sampling rate" we just have a Sound
# we'll see below that the sound "encapsulates" the `data` and `samplerate` attributes
def spect(sound, fft=1024, window='Hann'):
    f, t, s = spectrogram(audio.data, audio.samplerate,
    fft=fft, window=window)
    # instead of returning three variables we just return one Spectrogram instance
    # that again encapsulates the spectrogram matrix, the frequencies, and the times
    return voc.Spectrogram(data=s, frequencies=f, times=t)

# we can also reduce some duplication using a dictionary that maps IDs to variables
ftrs = {}
for bird in ('bird1', 'bird2'):
    # here we load the Sound with the data and samplerate attributes
    # so we only have one variable instead of two
    sound = voc.Sound.read(f'./path/to/{bird}.wav')
    
    spect = spect(audio)
    ftrs[bird] = extract_features(spect)
rejected_h0, pval = stats_helper(ftrs['bird1'], ftrs['bird2'])

As the comments indicate, using VocalPy makes the code more succinct, and more readable.

To learn more about the design and goals of VocalPy, please check out our Forum Acusticum 2023 Proceedings Paper, “Introducing VocalPy”, this PyData Global 2023 talk and this Python Exchange talk.

Now let’s look at the data types that VocalPy provides for acoustic comunication.

Data type for sound: `vocalpy.Sound`#

The first data type we’ll learn about is one that represents a sound, not suprisingly named vocalpy.Sound. We start here since all our analyses start with sound. We can load an audio signal from a file using the vocalpy.Sound.read() method.

wav_path = bfsongrepo.sound[0]  # we write this out just to make it explicit that we have a pathlib.Path pointing to a wav audio file
a_sound = voc.Sound.read(wav_path)
print(a_sound)

A sound has two attributes:

data, the audio signal itself, with two dimensions: (channels, samples)

print(a_sound.data)

samplerate, the sampling rate for the audio

print(a_sound.samplerate)

A Sound also has three properties, derived from its data:

channels, the number of channels
samples, the number of samples, and
duration, the number of samples divided by the sampling rate.

print(
    f"This sound comes from an audio file with {a_sound.channels} channel, "
    f"{a_sound.samples} samples, and a duration of {a_sound.duration:.3f} seconds"
)

One of the reasons VocalPy provides this data type, and the others we’re about to show you here, is that it helps you write more succinct code that’s easier to read: for you, when you come back to your code months from now, and for others that want to read the code you share.

When you are working with your own data, instead of example data built into VocalPy, you will do something like:

Get all the paths to the sound files in a directory using a convenience function that VocalPy gives us in its paths module, vocalpy.paths.from_dir()
Read all the sound files into memory using the method vocalpy.Sound.read():

This is shown in the snippet below

data_dir = ('data/bfsongrepo/gy6or6/032312/')
wav_paths = voc.paths.from_dir(data_dir, 'wav')
sounds = [
    voc.Sound.read(wav_path) for wav_path in wav_paths
]

We’ll demonstrate this now. To demonstrate, we use the parent attribute of one of the paths to the wav files in our example bfsongrepo data. In this case, the parent is the directory that the wav file is in.

We can be sure that all the wav files are in this directory, because when you call vocalpy.example() with the name of the example dataset, 'bfsongrepo', VocalPy uses the library pooch (https://www.fatiando.org/pooch/latest/index.html) to “fetch” that dataset off of Zenodo and download it into a local “cache” directory.

data_dir = bfsongrepo.sound[0].parent
print(data_dir)

We then use the vocalpy.paths.from_dir() function to get all the wav files from that directory.

wav_paths = voc.paths.from_dir(data_dir, 'wav')

Not surprisingly, these are the wav files we already have in our bfsongrepo example data.

sorted(wav_paths) == sorted(bfsongrepo.sound)

(We’re just showing how you would do this with a directory of your data.)
Finally we can load all these files, as was shown in the last line of the snippet.

sounds = [
    voc.Sound.read(wav_path) for wav_path in wav_paths
]

Next we’ll show how to work with sound in a pipeline for processing data. For more detail on how to use the vocalpy.Sound class, please see the “examples” section of the API documentation (that you can go to by clicking on the name of the class in this sentence).

Classes for steps in pipelines for processing data in acoustic communication#

In addition to data types for acoustic communication, VocalPy provides you with classes that represent steps in pipelines for processing that data. These classes are also written with readability and reproducibility in mind.

Let’s use one of those classes, SpectrogramMaker, to make a spectrogram from each one of the wav files that we loaded above.

We’ll write a brief snippet to do so, and then we’ll explain what we did.

params = {'n_fft': 512, 'hop_length': 64}
callback = voc.spectrogram
spect_maker = voc.SpectrogramMaker(callback=callback, params=params)
spects = spect_maker.make(sounds, parallelize=True)

Notice a couple of things about this snippet:

In line 1, you declare the parameters that you use to generate spectrograms explicitly, as a dictionary. This helps with reproducibility by encouraging you to document those parameters
In line 2, you also decide what function you will use to generate the spectrograms. Here we use the helper function vocalpy.spectrogram().
In line 3, you create an instance of the SpectrogramMaker class with the function you want to use to generate spectrograms, and the parameters to use with that function. We refer to the function we pass in as a callback, because the SpectrogramMaker will “call back” to this function when it makes a spectrogram.
In line 4, you make the spectrograms, with a single call to the method vocalpy.SpectrogramMaker.make(). You pass in the sounds we loaded earlier, and you tell VocalPy that you want to parallelize the generation of the spectrograms. This is done for you, using the library dask.

Data type: `vocalpy.Spectrogram`#

As you might have guessed, when we call vocalpy.SpectrogramMaker.make(), we get back a list of spectrograms.

This is the next data type we’ll look at.

We inspect the first spectrogram we loaded.

a_spect = spects[0]
print(a_spect)

As before, we’ll walk through the attributes of this class. But since the whole point of a spectrogram is to let us see sound, let’s actually look at the spectrogram, instead of staring at arrays of numbers.

We do so by calling vocalpy.plot.spectrogram().

voc.plot.spectrogram(
    a_spect,
    tlim = [2.6, 4],
    flim=[500,12500],
)

We see that we have a spectrogram of Bengalese finch song.

Now that we know what we’re working with, let’s actually inspect the attributes of the vocalpy.Spectrogram instance.

There are three attributes we care about here.

data: this is the spectrogram itself – as with the other data types,like vocalpy.Sound, the attribute name data indicates this main data we care about

print(a_spect.data)

Let’s look at the shape of data. It’s really just a NumPy array, so we inspect the array’s shape attribute.

print(a_spect.data.shape)

We see that we have an array with dimensions (channels, frequencies, times). The last two dimensions correspond to the next two attributes we will look at.

frequencies, a vector of the frequency for each row of the spectrogram.

print(a_spect.frequencies[:10])

And we can confirm it has a length equal to the number of rows in the spectrogram.

print(a_spect.frequencies.shape)

times, a vector of the time for each column in the spectrogram.

print(a_spect.times[:10])

We can likewise see it that the times vector has a shape equal to the number of columns in the spectrogram.

print(a_spect.times.shape)

Just like with the Sound class, VocalPy gives us the ability to conveniently read and write spectrograms from files. This saves us from generating spectrograms over and over. Computing spectrograms can be computionally expensive, if your audio has a high sampling rate or you are using methods like multi-taper spectrograms. Saving spectrograms from files also makes it easier for you to share your data in the exact form you used it, so that it’s easier to replicate your analyses.

To see this in action, let’s write our spectrograms to files.

import pathlib

DATA_DIR = pathlib.Path('./data')
DATA_DIR.mkdir(exist_ok=True)

for spect, wav_path in zip(spects, wav_paths):
    spect.write(
        DATA_DIR / (wav_path.name + '.spect.npz')
    )

Notice that the extension is 'npz'; this is a file format that NumPy uses to save mulitple arrays in a single file. By convention we include the file extension of the source audio, and another “extension” that incidicates this is a spectrogram, so that the file name ends with '.wav.spect.npz'.

We can confirm that reading and writing spectrograms to disk works as we expect using the method vocalpy.Spectrogram.read()

spect_paths = voc.paths.from_dir(DATA_DIR, '.spect.npz')

spects_loaded = [
    voc.Spectrogram.read(spect_path)
    for spect_path in spect_paths
]

We compare with the equality operator to confirm we loaded what we saved.

all([
    spect == spect_loaded
    for spect, spect_loaded in zip(spects, spects_loaded)
])  

Notice that we can be sure that spects and spects_loaded are in the same order, because vocalpy.paths.from_dir() calls sorted() on the paths that it finds, and our spectrogram files will be in the same order as the audio files because of the naming convention we used: the name of the audio file, plus the extension “.spect.npz”. If you used a different naming convention, you’d need to make sure both lists are in the same order a different way (you can tell sorted() how to sort using its key argument).

Data type: `vocalpy.Annotation`#

The last data type we’ll look at is for annotations. Such annotations are important for analysis of aocustic communication and behavior. Under the hood, VocalPy uses the pyOpenSci package crowsetta (vocalpy/crowsetta).

import vocalpy as voc

annots = [voc.Annotation.read(notmat_path, format='simple-seq') 
          for notmat_path in bfsongrepo.annotation]

We inspect one of the annotations. Again as with other data types, we can see there is a data attribute. In this case it contains the crowsetta.Annotation.

print(annots[1])

We plot the spectrogram along with the annotations.

voc.plot.annotated_spectrogram(
    spect=spects[1],
    annot=annots[1],
    tlim = [3.2, 3.9],
    flim=[500,12500],
);

This crash course in VocalPy has introduced you to the key features and goals of the library.

To learn more and see specific examples of usage, please see the how-to section of the user guide). For example, there you can find walkthroughs of how to use VocalPy to extract acoustic features that you fit a classifier to with scikit-learn, and how to use VocalPy to prepare your data for dimensionality reduction and clustering with UMAP and HDBSCAN.

We are actively developing the library to meet your needs and would love to hear your feedback in our forum.

Quickstart: VocalPy 🐍 💬 in 15 minutes ⏲️

Contents