vocalpy.segment.AvaParams

vocalpy.segment.AvaParams#

class vocalpy.segment.AvaParams(nperseg: int = 1024, noverlap: int = 512, min_freq: float = 20000.0, max_freq: float = 125000.0, spect_min_val: float = 0.8, spect_max_val: float = 6.0, thresh_lowest: float = 0.3, thresh_min: float = 0.3, thresh_max: float = 0.35, min_dur: float = 0.015, max_dur: float = 1.0, min_isi_dur: float | None = None, use_softmax_amp: bool = False, temperature: float = 0.01, smoothing_timescale: float = 0.00025, scale: bool = True, scale_val: int | float = 32768, scale_dtype: type[~typing.Any] | ~numpy.dtype[~typing.Any] | ~numpy._typing._dtype_like._SupportsDType[~numpy.dtype[~typing.Any]] | tuple[~typing.Any, ~typing.Any] | list[~typing.Any] | ~numpy._typing._dtype_like._DTypeDict | str | None = <class 'numpy.int16'>)[source]#

Bases: Params

Data class that represents parameters for vocalpy.segment.ava().

Constants in this module are instances of this class that represent parameters used in papers.

Attributes:

npersegint: Number of samples per segment for Short-Time Fourier Transform. Default is 1024.
noverlapint: Number of samples to overlap per segment for Short-Time Fourier Transform. Default is 512.
min_freqint: Minimum frequency. Spectrogram is “cropped” below this frequency (instead of, e.g., bandpass filtering). Default is 30e3.
max_freqint: Maximum frequency. Spectrogram is “cropped” above this frequency (instead of, e.g., bandpass filtering). Default is 110e3.
spect_min_valfloat, optional: Expected minimum value of spectrogram after transforming to the log of the magnitude. Used for a min-max scaling: \((s - s_{min} / (s_{max} - s_{min})\) where spect_min_val is \(s_{min}\). Default is None, in which case the minimum value of the spectrogram is used.
spect_max_valfloat, optional: Expected maximum value of spectrogram after transforming to the log of the magnitude. Used for a min-max scaling: \((s - s_{min} / (s_{max} - s_{min})\) where spect_min_val is \(s_{min}\). Default is None, in which case the maximum value of the spectrogram is used.
thresh_maxfloat: Threshold used to find local maxima.
thresh_minfloat: Threshold used to find local minima, in relation to local maxima. Used to find onsets and offsets of segments.
thresh_lowestfloat: Lowest threshold used to find onsets and offsets of segments.
min_durfloat: Minimum duration of a segment, in seconds.
max_durfloat: Maximum duration of a segment, in seconds.
min_isi_durfloat, optional: Minimum duration of inter-segment intervals, in seconds. If specified, any inter-segment intervals shorter than this value will be removed, and the adjacent segments merged. Default is None.
use_softmax_ampbool: If True, compute summed spectral power from spectrogram with a softmax operation on each column. Default is True.
temperaturefloat: Temperature for softmax. Only used if use_softmax_amp is True.
smoothing_timescalefloat: Timescale to use when smoothing summed spectral power with a gaussian filter. The window size will be dt - smoothing_timescale / samplerate, where dt is the size of a time bin in the spectrogram.
scalebool: If True, scale the sound.data. Default is True. This is needed to replicate the behavior of ava, which assumes the audio data is loaded as 16-bit integers. Since the default for vocalpy.Sound is to load sounds with a numpy dtype of float64, this function defaults to multiplying the sound.data by 2**15, and then casting to the int16 dtype. This replicates the behavior of the ava function, given data with dtype float64. If you have loaded a sound with a dtype of int16, then set this to False.
scale_val: Value to multiply the sound.data by, to scale the data. Default is 2**15. Only used if scale is True. This is needed to replicate the behavior of ava, which assumes the audio data is loaded as 16-bit integers.
scale_dtypenumpy.dtype: Signed integer type, compatible with C short.

Methods

scale_dtype

alias of int16

keys

Examples

>>> jourjineetal2023paths = voc.example('jourjine-et-al-2023')
>>> wav_path = jourjine2023paths[0]
>>> sound = voc.Sound.read(wav_path)
>>> onsets, offsets = voc.segment.ava(sound, **voc.segment.ava.JOURJINEETAL2023)

__init__(nperseg: int = 1024, noverlap: int = 512, min_freq: float = 20000.0, max_freq: float = 125000.0, spect_min_val: float = 0.8, spect_max_val: float = 6.0, thresh_lowest: float = 0.3, thresh_min: float = 0.3, thresh_max: float = 0.35, min_dur: float = 0.015, max_dur: float = 1.0, min_isi_dur: float | None = None, use_softmax_amp: bool = False, temperature: float = 0.01, smoothing_timescale: float = 0.00025, scale: bool = True, scale_val: int | float = 32768, scale_dtype: type[~typing.Any] | ~numpy.dtype[~typing.Any] | ~numpy._typing._dtype_like._SupportsDType[~numpy.dtype[~typing.Any]] | tuple[~typing.Any, ~typing.Any] | list[~typing.Any] | ~numpy._typing._dtype_like._DTypeDict | str | None = <class 'numpy.int16'>) → None#

Methods

`__init__`([nperseg, noverlap, min_freq, ...])
`keys`()

Attributes

`max_dur`
`max_freq`
`min_dur`
`min_freq`
`min_isi_dur`
`noverlap`
`nperseg`
`scale`
`scale_val`
`smoothing_timescale`
`spect_max_val`
`spect_min_val`
`temperature`
`thresh_lowest`
`thresh_max`
`thresh_min`
`use_softmax_amp`

scale_dtype#: alias of int16

vocalpy.segment.AvaParams

Contents

vocalpy.segment.AvaParams#