vocalpy.metrics.segmentation.ir.find_hits#
- vocalpy.metrics.segmentation.ir.find_hits(hypothesis: ndarray[Any, dtype[_ScalarType_co]], reference: ndarray[Any, dtype[_ScalarType_co]], tolerance: float | int | None = None, decimals: int | None = None) tuple[ndarray[Any, dtype[_ScalarType_co]], ndarray[Any, dtype[_ScalarType_co]], ndarray[Any, dtype[_ScalarType_co]]] [source]#
Find hits in arrays of event times.
This is a helper function used to compute information retrieval metrics. Specifically, this function is called by
precision_recall_fscore()
.An element in
hypothesis
, is considered a hit if its value \(t_h\) falls within an interval around any value inreference
, \(t_0\), plus or minustolerance
\(t_0 - \Delta t < t < t_0 + \Delta t\)
This function only allows there to be zero or one hit for each element in
reference
, but not more than one. If the condition \(|ref_i - hyp_j| < tolerance\) is true for multiple values \(hyp_j\) inhypothesis
, then the value with the smallest difference from \(ref_i\) is considered a hit.Both
hypothesis
andreference
must be 1-dimensional arrays of non-negative, strictly increasing values. If you have two arraysonsets
andoffsets
, you can concatenate those into a single valid array of boundary times usingconcat_starts_and_stops()
that you can then pass to this function.- Parameters:
- hypothesisnumpy.ndarray
Boundaries, e.g., onsets or offsets of segments, as computed by some method.
- referencenumpy.ndarray
Ground truth boundaries that the hypothesized boundaries
hypothesis
are compared to.- metricstr
The name of the metric to compute. One of:
{"precision", "recall", "fscore"}
.- tolerancefloat or int
Tolerance, in seconds. Elements in
hypothesis
are considered a true positive if they are within a time interval around any reference boundary \(t_0\) inreference
plus or minus thetolerance
, i.e., if a hypothesized boundary \(t_h\) is within the interval \(t_0 - \Delta t < t < t_0 + \Delta t\). Default is None, in which case it is set to0
(either float or int, depending on the dtype ofhypothesis
andreference
). See notes for more detail.- decimals: int
The number of decimal places to round both
hypothesis
andreference
to, usingnumpy.round()
. This mitigates inflated error rates due to floating point error. Rounding is only applied if bothhypothesis
andreference
are floating point values. To avoid rounding, e.g. to compute strict precision and recall, pass in the valueFalse
. Default is 3, which assumes that the values are in seconds and should be rounded to milliseconds.
- Returns:
- hits_refnumpy.ndarray
The indices of hits in
reference
.- hits_hypnumpy.ndarray
The indices of hits in
hypothesis
.- diffsnumpy.ndarray
Absolute differences \(|hit^{ref}_i - hit^{hyp}_i|\), i.e.,
np.abs(reference[hits_ref] - hypothesis[hits_hyp])
.