经典协同过滤算法:Slope One 的原理及实现(Java)
本文首先介绍了Slope One算法的原理,然后给出了算法的Java版实现。
一. Slope One 算法的原理介绍
用户 对事物A打分 对事物B打分 X 3 4 Y 2 4 Z 4 ?
用户Z对事物B的打分可能是多少呢?股票上有个说法是平均值可以掩盖一切异常波动,所以股票上的各个技术指标收拾不同时间段的平均值的曲线图或者柱状图等。同样的,Slope One算法也认为:平均值也可以代替某两个未知个体之间的打分差异,事物A对事物B的平均很差是:((3 - 4) + (2 - 4)) / 2 = -1.5,也就是说人们对事物B的打分一般比事物A的打分要高1.5,于是Slope one算法就猜测Z对事物B的打分是4 + 1.5 = 5.5
是不是非常的简单?
加权算法
有n个人对事物A和事物B打分了,R(A->B)表示这n个人对A和对B打分的平均差(A-B),有m个人对事物B和事物C打分 了,R(B->C)表示这m个人对B和对C打分的平均差(B-C),注意都是平均差而不是平方差,现在某个用户对A的打分是ra,对C的打分是 rc,那么A对B的打分可能是:
rb = (n * (ra - R(A->B)) + m * (rc + R(B->C)))/(m+n)
二. Slope One 算法的实现
我自己实现的Slope One:
01 |
package com.liangtee.slopeone.recommender.impl;
|
02 |
03 |
import java.util.HashMap;
|
04 |
import java.util.List;
|
05 |
import java.util.Map;
|
06 |
import java.util.Set;
|
07 |
08 |
import org.slf4j.Logger;
|
09 |
import org.slf4j.LoggerFactory;
|
10 |
11 |
import com.liangtee.slopeone.recommender.DataSetModel;
|
12 |
import com.liangtee.slopeone.recommender.Recommender;
|
13 |
import com.liangtee.slopeone.vo.RecommendedItem;
|
14 |
15 |
/** |
16 |
* Weighted Slope One Algorithm
|
17 |
*
|
18 |
* Just for Fun
|
19 |
*
|
20 |
* @author LiangTE
|
21 |
*
|
22 |
*/
|
23 |
24 |
public class SlopeOneRecommender implements Recommender {
|
25 |
26 |
private static final Logger log = LoggerFactory.getLogger(SlopeOneRecommender. class );
|
27 |
|
28 |
private DataSetModel dataSet = null ;
|
29 |
|
30 |
public SlopeOneRecommender(DataSetModel dataSet) {
|
31 |
this .dataSet = dataSet;
|
32 |
}
|
33 |
34 |
@Override
|
35 |
public float preferenceOnItem( long userID, long ItemID) throws Exception {
|
36 |
|
37 |
if (!dataSet.containsUser(userID)) {
|
38 |
throw new Exception( "not contains this User : " + userID);
|
39 |
}
|
40 |
|
41 |
Set<Long> UIDs = dataSet.getUserIDs();
|
42 |
Map<Long, Float> targetUserRatings = dataSet.getUserRatings(userID);
|
43 |
Set<Long> ratedItems = targetUserRatings.keySet();
|
44 |
|
45 |
Map<Long, Float> diffCache = new HashMap<Long, Float>();
|
46 |
Map<Long, Integer> counter = new HashMap<Long, Integer>();
|
47 |
|
48 |
log.info( "Slope One start to work..." );
|
49 |
|
50 |
for ( long UID : UIDs) { //计算每个用户的差值
|
51 |
if (UID != userID) {
|
52 |
Map<Long, Float> ratings = dataSet.getUserRatings(UID);
|
53 |
for ( long iid : ratedItems) { // iid = item id
|
54 |
if (ratings.containsKey(ItemID) && ratings.containsKey(iid)) {
|
55 |
float diff = ratings.get(ItemID) - ratings.get(iid);
|
56 |
if (diffCache.containsKey(iid)) {
|
57 |
diffCache.put(iid, diffCache.get(iid)+diff);
|
58 |
} else {
|
59 |
diffCache.put(iid, diff);
|
60 |
}
|
61 |
if (counter.containsKey(iid)) {
|
62 |
counter.put(iid, counter.get(iid)+ 1 );
|
63 |
} else {
|
64 |
counter.put(iid, 1 );
|
65 |
}
|
66 |
}
|
67 |
}
|
68 |
}
|
69 |
|
70 |
}
|
71 |
|
72 |
int K = 0 ;
|
73 |
float total = 0F;
|
74 |
for ( long iid : ratedItems) {
|
75 |
if (diffCache.containsKey(iid)) {
|
76 |
float bias = diffCache.get(iid)/counter.get(iid);
|
77 |
float targetUserRating = targetUserRatings.get(iid);
|
78 |
float p = targetUserRating + bias;
|
79 |
total += p;
|
80 |
K++;
|
81 |
}
|
82 |
}
|
83 |
|
84 |
|
85 |
return total/K;
|
86 |
}
|
开源的Slope one的程序包
- Python
http://www.serpentine.com/blog/2006/12/12/collaborative-filtering-made-easy/ - Java
http://taste.sourceforge.net/
http://www.daniel-lemire.com/fr/documents/publications/SlopeOne.java
http://www.nongnu.org/cofi/ - PHP
http://sourceforge.net/projects/vogoo
http://www.drupal.org/project/cre
http://www.daniel-lemire.com/fr/documents/publications/webpaper.txt Slope one算法作者写的,简单明了,强烈推荐。 -