vocalpy.metrics.segmentation.ir.precision

Contents

vocalpy.metrics.segmentation.ir.precision#

vocalpy.metrics.segmentation.ir.precision(hypothesis: ndarray[Any, dtype[_ScalarType_co]], reference: ndarray[Any, dtype[_ScalarType_co]], tolerance: float | int | None = None, decimals: int | bool | None = None) tuple[float, int, IRMetricData][source]#

Compute precision \(P\) for a segmentation.

Computes the metric from a hypothesized vector of boundaries hypothesis returned by a segmentation algorithm and a reference vector of boundaries reference, e.g., boundaries cleaned by a human expert or boundaries from a benchmark dataset.

Precision is defined as the number of true positives (\(T_p\)) over the number of true positives plus the number of false positives (\(F_p\)).

\(P = \\frac{T_p}{T_p+F_p}\).

The number of true positives n_tp is computed by calling vocalpy.metrics.segmentation.ir.compute_true_positives(). This function then computes the precision as precision = n_tp / hypothesis.size.

Both hypothesis and reference must be 1-dimensional arrays of non-negative, strictly increasing values. If you have two arrays onsets and offsets, you can concatenate those into a single valid array of boundary times using concat_starts_and_stops() that you can then pass to this function.

Parameters:
hypothesisnumpy.ndarray

Boundaries, e.g., onsets or offsets of segments, as computed by some method.

referencenumpy.ndarray

Ground truth boundaries that the hypothesized boundaries hypothesis are compared to.

tolerancefloat or int

Tolerance, in seconds. Elements in hypothesis are considered a true positive if they are within a time interval around any reference boundary \(t_0\) in reference plus or minus the tolerance, i.e., if a hypothesized boundary \(t_h\) is within the interval \(t_0 - \Delta t < t < t_0 + \Delta t\). Default is None, in which case it is set to 0 (either float or int, depending on the dtype of hypothesis and reference).

decimals: int

The number of decimal places to round both hypothesis and reference to, using numpy.round(). This mitigates inflated error rates due to floating point error. Rounding is only applied if both hypothesis and reference are floating point values. To avoid rounding, e.g. to compute strict precision and recall, pass in the value False. Default is 3, which assumes that the values are in seconds and should be rounded to milliseconds.

Returns:
precisionfloat

Value for precision, computed as described above.

n_tpint

The number of true positives.

metric_dataIRMetricData

Instance of IRMetricData with indices of hits in both hypothesis and reference, and the absolute difference between times in hypothesis and reference for the hits.

Notes

The addition of a tolerance parameter is based on [1]. This is also sometimes known as a “collar” [2] or “forgiveness collar” [3]. The value for the tolerance can be determined by visual inspection of the distribution; see for example [4].

References

[1]

Kemp, T., Schmidt, M., Whypphal, M., & Waibel, A. (2000, June). Strategies for automatic segmentation of audio data. In 2000 ieee international conference on acoustics, speech, and signal processing. proceedings (cat. no. 00ch37100) (Vol. 3, pp. 1423-1426). IEEE.

[2]

Jordán, P. G., & Giménez, A. O. (2023). Advances in Binary and Multiclass Sound Segmentation with Deep Learning Techniques.

[3]

NIST. (2009). The 2009 (RT-09) Rich Transcription Meeting Recognition Evaluation Plan. https://web.archive.org/web/20100606041157if_/http://www.itl.nist.gov/iad/mig/thyps/rt/2009/docs/rt09-meeting-eval-plan-v2.pdf

[4]

Du, P., & Troyer, T. W. (2006). A segmentation algorithm for zebra finch song at the note level. Neurocomputing, 69(10-12), 1375-1379.

Examples

>>> hypothesis = np.array([1, 6, 10, 16])
>>> reference = np.array([0, 5, 10, 15])
>>> prec, n_tp, ir_metric_data = vocalpy.metrics.segmentation.ir.precision(hypothesis, reference, tolerance=0)
>>> print(prec)
0.25
>>> print(ir_metric_data.hits_hyp)
np.array([2])
>>> hypothesis = np.array([0, 1, 5, 10])
>>> reference = np.array([0, 5, 10])
>>> fscore, n_tp, metric_data = vocalpy.metrics.segmentation.ir.precision(hypothesis, reference, tolerance=1)
>>> print(fscore)
0.75
>>> print(ir_metric_data.hits_hyp)
np.array([0, 2, 3])