找回密码
 骑士注册

QQ登录

微博登录

搜索
❏ 站外平台:

tag 标签:

相关手册

  • simhash - file similarity hash tool

    simhash [ -s nshingles ] [ -f nfeatures ] [ file ]
           simhash [ -s nshingles ] [ -f nfeatures ] -w [ file ] ...
           simhash -c hashfile hashfile
    This program is used to compute and compare similarity hashes of files.
           A similarity hash is a chunk of data that has the  property  that  some
           distance  metric  between files is proportional to some distance metric
           between the hashes.  Typically the similarity hash will be much smaller
           than the file itself.

相关文章

  • 海量数据相似度计算之simhash和海明距离

    通过采集系统我们采集了大量文本数据,但是文本中有很多重复数据影响我们对于结果的分析。分析前我们需要对这些数据去除重复,如何选择和设计文本的去重算法?常见的有余弦夹角算法、欧式距离、Jaccard相似度、最长 ...

    2013-08-30 09:48     

返回顶部

分享到微信

打开微信,点击顶部的“╋”,
使用“扫一扫”将网页分享至微信。