vocalpy.Sound

vocalpy.Sound#

class vocalpy.Sound(data: ndarray[tuple[Any, ...], dtype[_ScalarT]], samplerate: int)[source]#

Bases: object

Class that represents a sound.

Attributes:

datanumpy.ndarray: The audio signal as a numpy.ndarray, where the dimensions are (channels, samples).
samplerateint: The sampling rate the audio signal was acquired at, in Hertz.
channelsint: The number of channels in the audio signal. Determined from the first dimension of data.
samplesint: The number of samples in the audio signal. Determined from the last dimension of data.
durationfloat: Duration of the sound in seconds. Determined from the last dimension of data and the samplerate.

Methods

`clip`([start, stop])	Make a clip from this `Sound` that starts at time `start` in seconds and ends at time `stop`.
`read`(path[, dtype])	Read audio from `path`.
`segment`(segments)	Segment a sound, using a set of line `Segments`.
`to_mono`()	Convert a `Sound` to mono by averaging samples across channels.
`write`(path, **kwargs)	Write audio data to a file.

Examples

A Sound is read from a file.

>>> sound_path = voc.example("bl26lb16.wav", return_path=True)
>>> sound = voc.Sound.read(sound_path)
>>> sound
vocalpy.Sound(data=array([[-0.00... 0.00912476]]), samplerate=32000)

The Sound class is designed as a domain-specific data container with attributes that help us avoid cluttering up code with variables that track the sampling rate, number of channels, and duration of the file.

>>> sound = voc.example("bl26lb16.wav")
>>> print(sound.samplerate)
32000
>>> print(sound.channels)
1
>>> print(sound.duration)
7.254

You can print() a Sound to see all the properties that are derived from the sampling rate and the shape of the underlying data array: the number of channels, the number of samples, and the duration in seconds.

>>> sound = voc.example("bl26lb16.wav")
>>> print(sound)
vocalpy.Sound(data=array([[-0.00... 0.00912476]]), samplerate=32000), channels=1, samples=184463, duration=5.764)

The vocalpy package tries to provide functions that take Sound instances as inputs, and return other domain-specific types as outputs, such as Segments, Spectrogram, and Features. If instead you need to work with the digital audio signal directly as a numpy array, you can access it through the data attribute.

>>> sound = voc.example("bl26lb16.wav")
>>> sound_arr = sound.data

Sound can be written to a file as well, in any format supported by soundfile.

>>> sound = voc.example("bl26lb16.wav")
>>> sound.write("bl26lb16-copy.wav")

We can clip a sound to an arbitrary duration using the clip() method. This is useful if there are long, relatively silent periods before or after the animal sounds that we are interested in.

>>> sound = voc.example("bl26lb16.wav")
>>> sound_clip = sound.clip(0.1, 1.5)
>>> print(sound_clip.duration)
1.4

If we want to clip from a start time to the end of the sound, we can just specify a time for start.

>>> sound = voc.example("bl26lb16.wav")
>>> sound_clip = sound.clip(0.5)
>>> print(sound_clip.duration)
1.4

Likewise, if we want to clip from the start of the sound we can just specify a time for stop. Notice that we need to use a keyword argument here, since start is the first argument to clip().

>>> sound = voc.example("bl26lb16.wav")
>>> sound_clip = sound.clip(stop=0.5)
>>> print(sound_clip.duration)
0.5

If we want to segment an audio file into periods of animal sounds and periods of background, we can do that with one of the algorithms in vocalpy.segment. This will give us a Segments instance that we can then pass into the segment() method to get back a list of Sound instances, one for each segment.

>>> sound = voc.example("bl26lb16.wav")
>>> segments = voc.segment.meansquared(sound, threshold=1000, min_dur=0.0002, min_silent_dur=0.004)
>>> syllables = sound.segment(segments)
>>> len(syllables)
26

You can also index a Sound as you would a numpy.array and this will give you back a new Sound. One place where this is useful is when you have multi-channel audio, and you only want one channel, or you want to iterate over the channels.

>>> sound = voc.example("fruitfly-song-multichannel.wav")
>>> a_channel = sound[0, :]
>>> print(a_channel)
vocalpy.Sound(data=array([[-0.00...-0.00723267]]), samplerate=10000), channels=1, samples=15000, duration=1.500)
>>> for channel in sound:
...     print(channel)
vocalpy.Sound(data=array([[-0.00...-0.00723267]]), samplerate=10000), channels=1, samples=15000, duration=1.500)
vocalpy.Sound(data=array([[ 0.01... 0.00268555]]), samplerate=10000), channels=1, samples=15000, duration=1.500)
vocalpy.Sound(data=array([[ 0.00...-0.00100708]]), samplerate=10000), channels=1, samples=15000, duration=1.500)

This works with other methods of indexing, as shown below.

>>> sound = voc.example("bl26lb16.wav")
>>> print(sound.data.shape)
>>> decimated = sound[:, ::10]  # keep every 10th sample -- not true downsampling, we don't change the sampling rate

Note that we are just passing indexing directly to the underlying numpy.array, not re-implementing the API.

__init__(data: ndarray[tuple[Any, ...], dtype[_ScalarT]], samplerate: int)[source]#

Methods

`__init__`(data, samplerate)
`clip`([start, stop])	Make a clip from this `Sound` that starts at time `start` in seconds and ends at time `stop`.
`read`(path[, dtype])	Read audio from `path`.
`segment`(segments)	Segment a sound, using a set of line `Segments`.
`to_mono`()	Convert a `Sound` to mono by averaging samples across channels.
`write`(path, **kwargs)	Write audio data to a file.

Attributes

`channels`
`duration`
`samples`

clip(start: float = 0.0, stop: float | None = None) → Sound[source]#

Make a clip from this Sound that starts at time start in seconds and ends at time stop.

Parameters:

startfloat: Start time for clip, in seconds. Default is 0.
stopfloat, optional.: Stop time for clip, in seconds. Default is None, in which case the value will be set to the duration of this Sound.

Returns:

clipvocalpy.Sound: A new Sound with duration stop - start.

vocalpy.Sound

Contents

vocalpy.Sound#