vocalpy.segment.meansquared#
- vocalpy.segment.meansquared(sound: Sound, threshold: int = 5000, min_dur: float = 0.02, min_silent_dur: float = 0.002, freq_cutoffs: Iterable = (500, 10000), smooth_win: int = 2, scale: bool = True, scale_val: int | float = 32768, scale_dtype: npt.DTypeLike = <class 'numpy.int16'>) Segments[source]#
Segment audio by thresholding the mean squared signal.
Converts audio to the mean squared of the signal (using
vocalpy.signal.audio.meansquared()). Then finds all continuous periods in the mean squared signal abovethreshold. These periods are considered candidate segments. Candidates are removed that have a duration less thanminimum_dur; then, any two segments with a silent gap between them less thanmin_silent_durare merged into a single segment. The segments remaining after this post-processing are returned as onset and offset times in two NumPy arrays.Note that
vocalpy.signal.audio.meansquared()first filters the audio, withvocalpy.signal.audio.bandpass_filtfilt(), usingfreq_cutoffs, and then computes a running average of the squared signal by convolving with a window of sizesmooth_winmilliseconds.- Parameters:
- sound: vocalpy.Sound
An audio signal.
- thresholdint
Value above which mean squared signal is considered part of a segment. Default is 5000.
- min_durfloat
Minimum duration of a segment, in seconds. Default is 0.02, i.e. 20 ms.
- min_silent_durfloat
Minimum duration of silent gap between segments, in seconds. Default is 0.002, i.e. 2 ms.
- freq_cutoffsIterable
Cutoff frequencies for bandpass filter. List or tuple with two elements, default is
(500, 10000).- smooth_winint
Size of smoothing window in milliseconds. Default is 2.
- scalebool
If True, scale the
sound.data. Default is True. This is needed to replicate the behavior ofevsonganaly, which assumes the audio data is loaded as 16-bit integers. Since the default forvocalpy.Soundis to load sounds with a numpy dtype of float64, this function defaults to multiplying thesound.databy 2**15, and then casting to the int16 dtype. This replicates the behavior of theevsonganalyfunction, given data with dtype float64. If you have loaded a sound with a dtype of int16, then set this to False.- scale_val
Value to multiply the
sound.databy, to scale the data. Default is 2**15. Only used ifscaleisTrue. This is needed to replicate the behavior ofevsonganaly, which assumes the audio data is loaded as 16-bit integers.- scale_dtypenumpy.dtype
Numpy Dtype to cast
sound.datato, after scaling. Default isnp.int16. Only used ifscaleisTrue. This is needed to replicate the behavior ofevsonganaly, which assumes the audio data is loaded as 16-bit integers.
- Returns:
- segmentsvocalpy.Segments
Instance of
vocalpy.Segmentsrepresenting the segments found.
Examples
>>> sounds = voc.examples('bfsongrepo', return_type='sound') >>> segments = voc.segment.meansquared(sounds[0]) >>> print(segments)