Abstract Archive
 
By Year:
    2003
    2004
    2005
    2006
 
Author List   

Title:
Suspicious Event Detection in Surveillance Video (SCISS 2005)
Authors:
Gal Lavee (UT Dallas)
Lei Wang (UT Dallas)
Latifur Khan (UT Dallas)
Bhavani Thuraisingham (UT Dallas)
Abstract:
Video surveillance is ubiquitous in today's society. Office buildings, schools and even busy intersections have numerous video cameras rolling at all times covering numerous scenes sometimes from different angles. Video surveillance has proved to be very effective in catching criminals after the crime (e.g. convenience store or bank robberies). However, due to the vast amount of surveillance data accumulated each day and the fact that it is usually monitored by very few "human eyes" (relative to the number of cameras) if at all, it becomes almost impossible to detect and respond to an abnormal event as it is happening.

Analysis without knowledge of when and where or even if an event has occurred also takes place quite often. In this kind of analysis the analyst is often interested in "something that deviates from the norm" Without the appropriate tools this can be a daunting task consisting of sequentially viewing all raw video data and using human judgment to determine if an event is peculiar and/or requires action.

This paper proposes a tool to aid in this process. Using user-defined events (both suspicious and normal) this system can determine if a new video sequence contains any events that might be deemed suspicious and require further attention from a human user. This should reduce the user's job to determining if machine-flagged segments indeed require action and take that action. The time spent browsing through raw footage would be greatly reduced though use of this tool and thus increase the analyst's efficiency.

The system proposed in this paper uses a combination of event (general behavior information) and object (specific actors and entities) to offer a robust description of a video sequence. Video sequences are broken up into key frames. From the frames we extract low-level features. We use these features to detect objects in the scene as well as represent the scene as a whole (event detection). These events are represented as a collection of normalized gradient histograms in the x, y and t dimension over several different temporal scales. This representation is compared with previously user-defined event by means of a histogram comparison function in order to classify the new event. This classification, along with the objects detected within the scene, is used to compile a video data ontology language description of the event within the video sequence. Comparison of these event descriptions with the dynamically defined description of suspicious activity will allow the system to annotate a new video sequence appropriately.

Bridging the semantic gap between machine-understandable low-level features of video data and the high-level semantic events taking place in that video is the great undertaking of this paper. Defining an event representation schema, and event comparison function that enables similar events to be assigned the same class as well as building a database of manually defined events for classification are major steps towards a solution to this complex problem.

Web site hosted by the UNT CoPS Lab.