* Torrent: Torrent is a peer-to-peer download process that does not download content directly from a specific server, but instead provides a content file from another torrent user who has the same content. The torrent site therefore provides a seed file or magnet link to use the torrent protocol.
Torrent sites are those that provide torrent files or magnet addresses that are essential for sharing data through the BitTorrent client. These torrent sites provide the ability to illegally download various kinds of works such as music, films, TV content (such as programmes and drama), publications, games, and software.
* Download link (for seed file): It provides a seed file for using the torrent client. The site may have its own seed file, or indirectly provide a link to download the seed file through a file sharing site, such as filetender.
* Proper noun: Webtoons, unlike other content, are visual (letters and pictures) and have static characteristics. For example, in the case of a torrent site, the download is completed through a torrent client before it is available. You cannot check the contents until the download is completed. In addition, in the case of various streaming sites, the content is delivered to the user as the streaming player time passes. However, in the case of webtoons, every content is posted visually within the site. Therefore, proper nouns (such as names of characters) used only in the webtoon are included and can be easily identified.
In case of torrent site detection, the obtaind accuracy is approximately 95% and precision is 92%. In particular, as the recall value is 100%, it can be confirmed that no torrent sites can be detected. The false alarm also shows 8%.
In case of webtoon posting sites, the accuracy is 90% and the precision is 83%, which is slightly lower than the detection rate for torrent and video streaming sites. However, it can be seen that there is no detection of webtoon posting sites through 100% recall value. On the other hand, it can be seen that the false alarm for detecting a legal site as an illegal webtoon posting site is 20%.
In this study, we analyzed the trends of piracy sites that cause copyright infringement and analyzed the characteristics of those sites. Based on the analyzed features, we developed a detection crawler for torrent, video streaming, and webtoon posting sites that cause copyright infringement. As a result, we found that the total performance of the crawler in detecting of the torrent, video streaming, and webtoon posting sites is over 90% accuracy. In particular, since the recall value is 1.0 for the torrent and webtoon posting sites, it was confirmed that there is no undetection case. However, the results showed that the detection of the legal sites by the crawler as illegal was also somewhat higher for webtoon posting sites compared to other types. Therefore, in order to reduce such false positives, we are going to conduct research that performs the features of webtoon posting sites more precisely.