Starting off I added the chapter 2 files to my Lib directory and then downloaded feedparser to the Lib directory as well before finishing chapter 2 exercises.
pydelcious imported correctly and get_popular(tags='programming') provided results in the correct fashion. I then continued to build the dataset by adding the user tsegaran to the list, filling his username with items and recommending similar users:
- u'wlyndon
- u'sillyputty1967
- u'shashashasha
- u'rufous
- www.rollingstone.com.news.story.5939600/steve_jobs_the_rolling_stone_interview
- http://www.woopra.com/
- www.youtube.com/watch?v=eRkmhJR10Ec
- zenhabits.net
I then used the function calculateSimilarItems in Recommendations and the results for the movies Lady in theWater, Snakes on a Plane, You Me and Dupree, and The Night Listener where displayed properly.
I attempted to download the Movie Lens Data Sets but could not get the file to work correctly to be accessed by the loadMovieLens() function.
exercise 1:
"The Tanimoto similarity score is a method of calculating the similarity between two ligand fingerprints. It is determined as shown in the equation below where:T is the Tanimoto ScoreNa and Nb are the number of bits set to 1 in fingerprint of ligand a and b respectively and Nc is the total number of bits set to 1 found in fingerprints of both ligand a and b. The Tanimoto similarity score is a method of calculating the similarity between two ligand fingerprints. It is determined as shown in the equation below where:T is the Tanimoto ScoreNa and Nb are the number of bits set to 1 in fingerprint of ligand a and b respectively and Nc is the total number of bits set to 1 found in fingerprints of both ligand a and b. "
http://www-mitchell.ch.cam.ac.uk/pld/background_simil_lig.html
my similarity function:
def tanimoto(prefs,p1,p2):
si={}
for item in prefs[p1]:
if item in prefs[p2]: si[item]=1
if len(si)==0: return 0
distance=sum([prefs[person1][item]*prefs[person2][item] / (pow(prefs[person1][item],2)+pow(prefs[person2][item],2)-prefs[person1][item]*prefs[person2][item]) for item in prefs[person1] if item in prefs[person2]])
return 1/(1+distance)
Im still working on the weka section, but I wanted to publish this part before midnight!
the code looks to be pretty close to what I've seen on wikipedia, however please use some indenting , python code is all about indenting
ReplyDelete