The amount of information on the World Wide Web has grown enormously since its creation in 1990. By February 2000, the web had over one billion uniquely indexed pages and 30 million audio, video and image links . Since there is no central management on the web, duplication of content is inevitable. A study done in 1998 estimated that about 46% of all the text documents on the web have at least one “near-duplicate” - document which is identical except for low level details such as formatting . The problem is likely to be more severe for web video clips as they are often stored in multiple locations, compressed with different algorithms and bitrates to facilitate downloading and streaming. Similar versions, in part or as a whole, of the same video can also be found on the web when some web users modify and combine original content with their own productions. Identifying these similar contents is beneficial to many web video applications: 1. As users typically do not view beyond the first result screen from a search engine, it is detrimental to have all “near-duplicate” entries cluttering the top retrievals. Rather, it is advantageous to group together similar entries before presenting the retrieval results to users. 2. When a particular web video becomes unavailable or suffers from slow network transmission, users can opt for a more accessible version among similar video content identified by the video search engine. 3. Similarity detection algorithms can also be used for content identification when conventional techniques such as watermarking are not applicable. For example, multimedia content brokers may use similarity detection to check for copyright violation as they have no right to insert watermarks into original material.
|Title of host publication||Handbook of Video Databases|
|Subtitle of host publication||Design and Applications|
|Number of pages||32|
|State||Published - Jan 1 2003|
ASJC Scopus subject areas
- Computer Science (all)
- Engineering (all)