Sunday, January 25, 2009

Portfolio Assignment 1

After installing Python and going through the first few pages of the on-line tutorial I began to read Colletive Intelligence and go through the code on pages 8-18.

I added recommendations.py to my Lib folder and then did the first few exercises:
critics['Lisa Rose']['Lady in teh Water'] resulted in 2.5, and critics['Toby'] resulted in {'Snakes on a Plane': 4.5, 'You, Me, and Dupree' : 1.0, 'Superman Returns' : 4.0}

After importing sqrt from math, the examples were correctly calculated:
1.1180339887498949
0.47213595499957939

I then edited recommendations.py to include a function that finds a similarity score for two different critics.

recommendations.sim_distance(recommendations.critics, 'Lisa Rose', 'Gene Seymour') initially resulted in 0.29429805508554946, which differs from the books answer.

I then changed the sim_distance function return line to return 1(1+(sum_of_squares)) and reloaded rocommendations.py to get the correct answer of 0.148148148148

Next I added the Pearson correlation coefficient function to recommendations.py and used the function for Lisa Rose and Gene Seymour resulting in 0.39605901719066977

The function top matches was added and the 3 top matches for Toby were identified as Lisa Rose (.99124...), Mike SaSalle (.92447...), and Claudia Puig (.893405...).

I added the function getRecommendations and searched for Toby, first using pearson's and then using Euclidean; both methods listed The Night Listener, Lady in the Water, and Just My Luck as the top three recomendations, with the similarity scores varying slightly from the first method to the second.

Finally I used the transformsPrefs function to find the top matches for Superman Returns and to recommend critics for the movie Just my Luck. All calculations matched those given in the book.

Lastly I wrote a function to determine the Manhattan Distance between two critics:

def sim_manhattan(prefs, p1, p2):
si = {}
for item in prefs[p1]:
if item in prefs[p2]:
si[item]=1
if len(si)==0:return 0
manhattan = [abs(prefs[p1][item]-prefs[p2][item]) for item in si ]
return (1/(1+sum(manhattan)))

The returned score for critics Lisa Rose and Gene Seymour was 0.18181818181818182

-----------------------------------------------------------------------------------------

After concluding this exerices I have become very interested in learning more about Python and am very encouraged by the style of teaching that Collective Intellgence employs (it is a much better read than our Data Mining textbook!)

This post would of gone up yesterday had it not been for my 6:30AM to 10:30PM trip to Liberty University as part of the Track & Field team. It was a loooong day! Looking forward to class on Thursday.

No comments:

Post a Comment