Our objective is to detect anomalies in video while also automatically explaining the reason behind the detector's response, e.g. `these frames are anomalous, because people are fighting'.
We see explainability in this task as crucial, because systems used in practical setting need to be transparent to prevent any bias and the required response to an anomaly depends on its nature and severity. We introduce the first dataset for evaluating anomaly explanations, show how to build interpretable features for anomaly detection and illustrate that understanding interactions between objects in the scene is important for explainable anomaly detection.
See the video below for a demo of the dataset and of the results achieved by our method.
You can download the X-MAN dataset here: Download
The dataset consists of 3 files, one for each of the 3 common public anomaly detection datasets: UCSD Ped2, Avenue and ShanghaiTech. Each file is a json which contains explanations for each of the anomalous frame, with the outside keys in the json formed from the anomalous video folder name and the frame number. The associated value with each frame is a list of explanations of anomalies in the frame. Each of the explanations is a 2-element list containing (1) a coarse explanation
, for example 'action' and (2) a fine-grained explanation, for example 'run / jog'. To evaluate the explanations provided by a system, output a list of explanation scores for all frames in the dataset. Each list should consist of the scores for each of the possible anomalous classes. The score for class X for frame Y should signify the confidence of the system that X is the correct anomaly explanation for frame Y. The evaluation code will later appear on this page.
In the meantime, evaluate the per-class Average Precision (AP) for each of the anomalous classes, using the dataset as ground truth. Finally, average the per-class AP and obtain the mAP which you can compare to the results from our paper.