vocalpy.Sound#

class vocalpy.Sound(data: ndarray[tuple[Any, ...], dtype[_ScalarT]], samplerate: int)[source]#

Bases: object

Class that represents a sound.

Attributes:
datanumpy.ndarray

The audio signal as a numpy.ndarray, where the dimensions are (channels, samples).

samplerateint

The sampling rate the audio signal was acquired at, in Hertz.

channelsint

The number of channels in the audio signal. Determined from the first dimension of data.

samplesint

The number of samples in the audio signal. Determined from the last dimension of data.

durationfloat

Duration of the sound in seconds. Determined from the last dimension of data and the samplerate.

Methods

clip([start, stop])

Make a clip from this Sound that starts at time start in seconds and ends at time stop.

read(path[, dtype])

Read audio from path.

segment(segments)

Segment a sound, using a set of line Segments.

to_mono()

Convert a Sound to mono by averaging samples across channels.

write(path, **kwargs)

Write audio data to a file.

Examples

A Sound is read from a file.

>>> sound_path = voc.example("bl26lb16.wav", return_path=True)
>>> sound = voc.Sound.read(sound_path)
>>> sound
vocalpy.Sound(data=array([[-0.00... 0.00912476]]), samplerate=32000)

The Sound class is designed as a domain-specific data container with attributes that help us avoid cluttering up code with variables that track the sampling rate, number of channels, and duration of the file.

>>> sound = voc.example("bl26lb16.wav")
>>> print(sound.samplerate)
32000
>>> print(sound.channels)
1
>>> print(sound.duration)
7.254

You can print() a Sound to see all the properties that are derived from the sampling rate and the shape of the underlying data array: the number of channels, the number of samples, and the duration in seconds.

>>> sound = voc.example("bl26lb16.wav")
>>> print(sound)
vocalpy.Sound(data=array([[-0.00... 0.00912476]]), samplerate=32000), channels=1, samples=184463, duration=5.764)

The vocalpy package tries to provide functions that take Sound instances as inputs, and return other domain-specific types as outputs, such as Segments, Spectrogram, and Features. If instead you need to work with the digital audio signal directly as a numpy array, you can access it through the data attribute.

>>> sound = voc.example("bl26lb16.wav")
>>> sound_arr = sound.data

Sound can be written to a file as well, in any format supported by soundfile.

>>> sound = voc.example("bl26lb16.wav")
>>> sound.write("bl26lb16-copy.wav")

We can clip a sound to an arbitrary duration using the clip() method. This is useful if there are long, relatively silent periods before or after the animal sounds that we are interested in.

>>> sound = voc.example("bl26lb16.wav")
>>> sound_clip = sound.clip(0.1, 1.5)
>>> print(sound_clip.duration)
1.4

If we want to clip from a start time to the end of the sound, we can just specify a time for start.

>>> sound = voc.example("bl26lb16.wav")
>>> sound_clip = sound.clip(0.5)
>>> print(sound_clip.duration)
1.4

Likewise, if we want to clip from the start of the sound we can just specify a time for stop. Notice that we need to use a keyword argument here, since start is the first argument to clip().

>>> sound = voc.example("bl26lb16.wav")
>>> sound_clip = sound.clip(stop=0.5)
>>> print(sound_clip.duration)
0.5

If we want to segment an audio file into periods of animal sounds and periods of background, we can do that with one of the algorithms in vocalpy.segment. This will give us a Segments instance that we can then pass into the segment() method to get back a list of Sound instances, one for each segment.

>>> sound = voc.example("bl26lb16.wav")
>>> segments = voc.segment.meansquared(sound, threshold=1000, min_dur=0.0002, min_silent_dur=0.004)
>>> syllables = sound.segment(segments)
>>> len(syllables)
26

You can also index a Sound as you would a numpy.array and this will give you back a new Sound. One place where this is useful is when you have multi-channel audio, and you only want one channel, or you want to iterate over the channels.

>>> sound = voc.example("fruitfly-song-multichannel.wav")
>>> a_channel = sound[0, :]
>>> print(a_channel)
vocalpy.Sound(data=array([[-0.00...-0.00723267]]), samplerate=10000), channels=1, samples=15000, duration=1.500)
>>> for channel in sound:
...     print(channel)
vocalpy.Sound(data=array([[-0.00...-0.00723267]]), samplerate=10000), channels=1, samples=15000, duration=1.500)
vocalpy.Sound(data=array([[ 0.01... 0.00268555]]), samplerate=10000), channels=1, samples=15000, duration=1.500)
vocalpy.Sound(data=array([[ 0.00...-0.00100708]]), samplerate=10000), channels=1, samples=15000, duration=1.500)

This works with other methods of indexing, as shown below.

>>> sound = voc.example("bl26lb16.wav")
>>> print(sound.data.shape)
>>> decimated = sound[:, ::10]  # keep every 10th sample -- not true downsampling, we don't change the sampling rate

Note that we are just passing indexing directly to the underlying numpy.array, not re-implementing the API.

__init__(data: ndarray[tuple[Any, ...], dtype[_ScalarT]], samplerate: int)[source]#

Methods

__init__(data, samplerate)

clip([start, stop])

Make a clip from this Sound that starts at time start in seconds and ends at time stop.

read(path[, dtype])

Read audio from path.

segment(segments)

Segment a sound, using a set of line Segments.

to_mono()

Convert a Sound to mono by averaging samples across channels.

write(path, **kwargs)

Write audio data to a file.

Attributes

channels

duration

samples

clip(start: float = 0.0, stop: float | None = None) Sound[source]#

Make a clip from this Sound that starts at time start in seconds and ends at time stop.

Parameters:
startfloat

Start time for clip, in seconds. Default is 0.

stopfloat, optional.

Stop time for clip, in seconds. Default is None, in which case the value will be set to the duration of this Sound.

Returns:
clipvocalpy.Sound

A new Sound with duration stop - start.

See also

Sound.segment

Notes

The clip() method is used to clip a Sound at arbitrary times. If you need to segment an audio file into periods of animal sounds and periods of background, use one of the functions in vocalpy.segment to get an instance of Segments, that you can then use with the :meth`Sound.segment` method.

Examples

>>> sound = voc.example('bl26lb16.wav')
>>> clip = sound.clip(1.5, 2.5)
>>> clip.duration
1.0
classmethod read(path: str | pathlib.Path, dtype: npt.DTypeLike = <class 'numpy.float64'>, **kwargs) Self[source]#

Read audio from path.

Parameters:
pathstr, pathlib.Path

Path to file from which audio data should be read.

**kwargsdict, optional

Other arguments to soundfile.read():, refer to :module:`soundfile` documentation for details. Note that :method:`vocalpy.Sound.read` passes in the argument always_2d=True, because we require Sound.data to always have a “channel” dimension.

Returns:
soundvocalpy.Sound

A vocalpy.Sound instance with data read from path.

segment(segments: Segments) list[Sound][source]#

Segment a sound, using a set of line Segments.

Parameters:
segmentsvocalpy.Segments.

A Segments instance, the output of a segmenting function in vocalpy.segment.

Returns:
soundslist

A list of Sound instances, one for every segment in Segments.

Notes

The :meth`Sound.segment` method is used with the output of functions from vocalpy.segment, an instance of Segments. If you need to clip a Sound at arbitrary times, use the clip() method.

Examples

>>> sound = voc.example("bells.wav")
>>> segments = voc.segment.meansquared(sound)
>>> syllables = sound.segment(segments)
>>> len(syllables)
10
to_mono()[source]#

Convert a Sound to mono by averaging samples across channels.

Notes

This method uses the librosa.to_mono() function.

Examples

>>> sound = voc.examples("WhiLbl0010")
>>> print(sound.channels)
2
>>> sound_mono = sound.to_mono()
>>> print(sound.channels)
1

Note that feature extraction functions operate on channels independently, so it may speed up your analysis to convert multi-channel audio to mono, if you do not need to consider channels indepedently.

>>> import timeit
>>> import numpy as np
>>> sound = voc.examples("WhiLbl0010")
>>> sound_mono = sound.to_mono()
>>> np.mean(timeit.repeat("voc.feature.biosound(sound)", number=5, globals=globals()))
np.float64(19.713963174959645)
>>> np.mean(timeit.repeat("voc.feature.biosound(sound_mono)", number=5, globals=globals()))
np.float64(9.917085491772742)
write(path: str | Path, **kwargs) AudioFile[source]#

Write audio data to a file.

Parameters:
pathstr, pathlib.Path

Path to file that audio data should be saved in.

**kwargs: dict, optional

Extra arguments to soundfile.write(). Refer to :module:`soundfile` documentation for details.