为什么这个功能不能用于我的输入?

问题描述:

我有一个默认的示例词典,看起来像这样:为什么这个功能不能用于我的输入?

critics = {'Lisa Rose': {'Lady in the Water': 2.5, 'Snakes on a Plane': 3.5, 
'Just My Luck': 3.0, 'Superman Returns': 3.5, 'You, Me and Dupree': 2.5, 
'The Night Listener': 3.0}, 
      'Gene Seymour': {'Lady in the Water': 3.0, 'Snakes on a Plane': 3.5, 
'Just My Luck': 1.5, 'Superman Returns': 5.0, 'The Night Listener': 3.0, 
'You, Me and Dupree': 3.5}, 
      'Michael Phillips': {'Lady in the Water': 2.5, 'Snakes on a Plane': 3.0, 
'Superman Returns': 3.5, 'The Night Listener': 4.0}, 
'Claudia Puig': {'Snakes on a Plane': 3.5, 'Just My Luck': 3.0, 
'The Night Listener': 4.5, 'Superman Returns': 4.0, 
'You, Me and Dupree': 2.5}, 
      'Mick LaSalle': {'Lady in the Water': 3.0, 'Snakes on a Plane': 4.0, 
'Just My Luck': 2.0, 'Superman Returns': 3.0, 'The Night Listener': 3.0, 
'You, Me and Dupree': 2.0}, 
'Jack Matthews': {'Lady in the Water': 3.0, 'Snakes on a Plane': 4.0, 
'The Night Listener': 3.0, 'Superman Returns': 5.0, 'You, Me and Dupree': 3.5}, 
      'Toby': {'Snakes on a Plane':4.5,'You, Me and Dupree':1.0,'Superman Returns':4.0}} 

我使用,使用Pearson相关系数,看起来像这样返回最相似的人在字典中的功能:

from math import sqrt 
def sim_pearson(prefs,p1,p2): 
# lista na zaednichki tochki 
    si={} 
    for item in prefs[p1]: 
     if item in prefs[p2]: si[item]=1 
# najdi go brojot na elementi 
    n=len(si) 
# ako nemaat zaednichki tochki vrati 0 
    if n==0: return 0 
# dodadi gi site 
    sum1=sum([prefs[p1][it] for it in si]) 
    sum2=sum([prefs[p2][it] for it in si]) 
# sumiraj gi kvadratite 
    sum1Sq=sum([pow(prefs[p1][it],2) for it in si]) 
    sum2Sq=sum([pow(prefs[p2][it],2) for it in si]) 
# sumiraj gi proizvodite 
    pSum=sum([prefs[p1][it]*prefs[p2][it] for it in si]) 
# presmetka na Pirsonoviot koeficient 
    num=pSum-(sum1*sum2/n) 
    den=sqrt((sum1Sq-pow(sum1,2)/n)*(sum2Sq-pow(sum2,2)/n)) 
    if den==0: return 0 
    r=num/den 
    return r 

它的工作原理。例如,对于呼叫print sim_pearson(critics, 'Toby', 'Lisa Rose'),我得到系数0.991240707162。

然而,当我尝试用我的字典相同的功能是:

tests = {'dzam': {'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxiKAgw': 5.0, 
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxjvAQw': 1.0, 
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxj3AQw': 1.0, 
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxiMAgw': 5.0, 
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxiBAgw': 1.0, 
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxjtAQw': 1.0, 
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxj_AQw': 1.0, 
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxiIAgw': 5.0, 
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxj9AQw': 1.0, 
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxiqAgw': 3.0, 
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxjzAQw': 5.0, 
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxikAgw': 3.0, 
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxiaAgw': 5.0, 
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxj1AQw': 1.0, 
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxjxAQw': 5.0, 
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxiYAgw': 5.0}, 
     'kex': {'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxiKAgw': 5.0, 
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxjvAQw': 1.0, 
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxj3AQw': 1.0, 
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxiMAgw': 5.0, 
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxiBAgw': 1.0, 
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxjtAQw': 1.0, 
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxj_AQw': 1.0, 
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxiIAgw': 5.0, 
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxj9AQw': 1.0, 
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxiqAgw': 3.0, 
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxjzAQw': 5.0, 
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxikAgw': 3.0, 
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxiaAgw': 5.0, 
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxj1AQw': 1.0, 
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxjxAQw': 5.0, 
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxiYAgw': 5.0}, 
     'rokoko': {'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxiKAgw': 5.0, 
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxjvAQw': 1.0, 
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxj3AQw': 1.0, 
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxiMAgw': 5.0, 
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxiBAgw': 1.0, 
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxjtAQw': 1.0, 
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxj_AQw': 1.0, 
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxiIAgw': 5.0, 
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxj9AQw': 1.0, 
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxiqAgw': 3.0, 
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxjzAQw': 5.0, 
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxikAgw': 3.0, 
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxiaAgw': 5.0, 
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxj1AQw': 1.0, 
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxjxAQw': 5.0, 
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxiYAgw': 5.0}, 
     '[email protected]': {'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxiKAgw': 5.0, 
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxjvAQw': 1.0, 
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxj3AQw': 1.0, 
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxiMAgw': 5.0, 
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxiBAgw': 1.0, 
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxjtAQw': 1.0, 
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxj_AQw': 1.0, 
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxiIAgw': 5.0, 
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxj9AQw': 1.0, 
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxiqAgw': 3.0, 
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxjzAQw': 5.0, 
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxikAgw': 3.0, 
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxiaAgw': 5.0, 
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxj1AQw': 1.0, 
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxjxAQw': 5.0, 
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxiYAgw': 5.0}, 
     'seljak': {'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxiKAgw': 5.0, 
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxjvAQw': 1.0, 
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxiKAgw': 5.0, 
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxjvAQw': 1.0, }} 

我总是得到1.0,不管我有匹配的字典,这是为什么呢?

顺便说一下,我使用散列,所以我的字典必须有这个长字符串。 :)

+0

整数和分组不相处得很好。请我们从__future__进口部门`看看是否是这个问题。 – 2011-02-02 12:32:10

+2

你所有的字符串在你的失败测试中都是一样的 - 是你想要的吗?如果是这样,这就是为什么你得到1.0,表明完美的相关性,因为一切都是相同的。 – payne 2011-02-02 12:33:29

你很可能被隐藏在眼睛里的长钥匙所迷惑,这些钥匙串是不同的。

尝试在测试'seljak'中将所有值设置为0并与其运行关联。你会看到一个0相关:

print sim_pearson(tests, '[email protected]', 'seljak') 

变化试验'seljak'的最后一个值1,你会看到一个负相关关系重新运行该脚本。