vocalpy.segment.AvaParams#
- class vocalpy.segment.AvaParams(nperseg: int = 1024, noverlap: int = 512, min_freq: float = 20000.0, max_freq: float = 125000.0, spect_min_val: float = 0.8, spect_max_val: float = 6.0, thresh_lowest: float = 0.3, thresh_min: float = 0.3, thresh_max: float = 0.35, min_dur: float = 0.015, max_dur: float = 1.0, min_isi_dur: float | None = None, use_softmax_amp: bool = False, temperature: float = 0.01, smoothing_timescale: float = 0.00025, scale: bool = True, scale_val: int | float = 32768, scale_dtype: type[~typing.Any] | ~numpy.dtype[~typing.Any] | ~numpy._typing._dtype_like._SupportsDType[~numpy.dtype[~typing.Any]] | tuple[~typing.Any, ~typing.Any] | list[~typing.Any] | ~numpy._typing._dtype_like._DTypeDict | str | None = <class 'numpy.int16'>)[source]#
Bases:
ParamsData class that represents parameters for
vocalpy.segment.ava().Constants in this module are instances of this class that represent parameters used in papers.
- Attributes:
- npersegint
Number of samples per segment for Short-Time Fourier Transform. Default is 1024.
- noverlapint
Number of samples to overlap per segment for Short-Time Fourier Transform. Default is 512.
- min_freqint
Minimum frequency. Spectrogram is “cropped” below this frequency (instead of, e.g., bandpass filtering). Default is 30e3.
- max_freqint
Maximum frequency. Spectrogram is “cropped” above this frequency (instead of, e.g., bandpass filtering). Default is 110e3.
- spect_min_valfloat, optional
Expected minimum value of spectrogram after transforming to the log of the magnitude. Used for a min-max scaling: \((s - s_{min} / (s_{max} - s_{min})\) where
spect_min_valis \(s_{min}\). Default is None, in which case the minimum value of the spectrogram is used.- spect_max_valfloat, optional
Expected maximum value of spectrogram after transforming to the log of the magnitude. Used for a min-max scaling: \((s - s_{min} / (s_{max} - s_{min})\) where
spect_min_valis \(s_{min}\). Default is None, in which case the maximum value of the spectrogram is used.- thresh_maxfloat
Threshold used to find local maxima.
- thresh_minfloat
Threshold used to find local minima, in relation to local maxima. Used to find onsets and offsets of segments.
- thresh_lowestfloat
Lowest threshold used to find onsets and offsets of segments.
- min_durfloat
Minimum duration of a segment, in seconds.
- max_durfloat
Maximum duration of a segment, in seconds.
- min_isi_durfloat, optional
Minimum duration of inter-segment intervals, in seconds. If specified, any inter-segment intervals shorter than this value will be removed, and the adjacent segments merged. Default is None.
- use_softmax_ampbool
If True, compute summed spectral power from spectrogram with a softmax operation on each column. Default is True.
- temperaturefloat
Temperature for softmax. Only used if
use_softmax_ampis True.- smoothing_timescalefloat
Timescale to use when smoothing summed spectral power with a gaussian filter. The window size will be
dt - smoothing_timescale / samplerate, wheredtis the size of a time bin in the spectrogram.- scalebool
If True, scale the
sound.data. Default is True. This is needed to replicate the behavior ofava, which assumes the audio data is loaded as 16-bit integers. Since the default forvocalpy.Soundis to load sounds with a numpy dtype of float64, this function defaults to multiplying thesound.databy 2**15, and then casting to the int16 dtype. This replicates the behavior of theavafunction, given data with dtype float64. If you have loaded a sound with a dtype of int16, then set this to False.- scale_val
Value to multiply the
sound.databy, to scale the data. Default is 2**15. Only used ifscaleisTrue. This is needed to replicate the behavior ofava, which assumes the audio data is loaded as 16-bit integers.scale_dtypenumpy.dtypeSigned integer type, compatible with C
short.
Methods
alias of
int16keys
Examples
>>> jourjineetal2023paths = voc.example('jourjine-et-al-2023') >>> wav_path = jourjine2023paths[0] >>> sound = voc.Sound.read(wav_path) >>> onsets, offsets = voc.segment.ava(sound, **voc.segment.ava.JOURJINEETAL2023)
- __init__(nperseg: int = 1024, noverlap: int = 512, min_freq: float = 20000.0, max_freq: float = 125000.0, spect_min_val: float = 0.8, spect_max_val: float = 6.0, thresh_lowest: float = 0.3, thresh_min: float = 0.3, thresh_max: float = 0.35, min_dur: float = 0.015, max_dur: float = 1.0, min_isi_dur: float | None = None, use_softmax_amp: bool = False, temperature: float = 0.01, smoothing_timescale: float = 0.00025, scale: bool = True, scale_val: int | float = 32768, scale_dtype: type[~typing.Any] | ~numpy.dtype[~typing.Any] | ~numpy._typing._dtype_like._SupportsDType[~numpy.dtype[~typing.Any]] | tuple[~typing.Any, ~typing.Any] | list[~typing.Any] | ~numpy._typing._dtype_like._DTypeDict | str | None = <class 'numpy.int16'>) None#
Methods
__init__([nperseg, noverlap, min_freq, ...])keys()Attributes
max_durmax_freqmin_durmin_freqmin_isi_durnoverlapnpersegscalescale_valsmoothing_timescalespect_max_valspect_min_valtemperaturethresh_lowestthresh_maxthresh_minuse_softmax_amp- scale_dtype#
alias of
int16