Skip to content

Ask a silly question

September 6, 2013


So I was just browsing translation companies with a view to getting my bachelors degree, which is in latin, translated into English. I’ve no clue which company is good so I just randomly mailed one from an ad posted online. I’m thinking they’re a safe bet for quality service:

Dear Claire,
Thanks for mail. We want to know what is exact language of you mentioned Latin, is it Spanish, French ,Italian or something else ? We have diffeent price for different language. So could you please let us know the exact language and how many pages for your certificate, and then we can give you exact quotation.
Best regards,

think I’m gonna go another way on this one…

Advertisements

Python & Machine learning problems with sparse matrices

May 31, 2013

I am currently working on an assignment for a course about Gestures and since I didn’t pay too much attention to the lectures (or do much of the reading for that matter) I decided to do a machine learning assignment as my project for this course. While it would have been great to do some experiments with multi modal agents/kinnect/cross cultural gesture usage, I didn’t really feel like the support was there from the department and the work involved would have taken time away from other projects I am currently working on (more on those later).

So, in a naive intermediate machine learning student way, I thought doing some gesture prediction experiment would be a doddle. Had I not just completed a course in Statistical methods for Machine Learning? Did I not have at my disposal the full range of doodads available from the sklearn library? Was I not provided with a labeled subset of the AMI corpus to mess around with?

My idea, or rather Jina Lee’s idea that I was heavily referencing, was to extract some features, plug em into some range of classifiers, report high accuracy of predicting nods and shakes and the job was oxo. Or so I thought.

It was quite fun initially to extract features from the annotated XML corpus and our lecturer had kindly given us a filtered set to work on that only contained around 6000 dialogue acts and only the head movement annotation. I played around with xml.minidom.dom and after much googling of guides on how to use the thing, I got out what I wanted. I had even more fun learning a bit about sql with this tutorial and sticking all the features into a database for handy viewing. Although, in retrospect, the time I spent doing that was probably wasted as A. I never remember how to do these things that I learn on the fly without referring back to guides and B. it really wasn’t necessary as I had to split up all the data again anyway.

The first problem I encountered, and what probably is the cause of my current problem, is that my features are a mix of text (the sentences that co-occur with the gestures) and features like duration of gesture, intension of gesture etc. Text was a no-brainer, fit transform it with the ol’ tf-idf vectorizer and throw it at a pile of classifiers. But I had never had to combine tf-idf vectors with other feature vectors and, as you can probably tell from my poor description of the problem, googling wasn’t helping.

My cry for help, even given my poor description of the problem, was answered by my professor by using this little trick. Basically you dump the tf-idf vectors and class labels in a file using dump_svmlight_file and then add the other features values to that file.

So I mapped my other features to unique integer values and dumped those into a file too so that they would be in the correct format, making sure nothing mapped to zero as scipy sparse matrices ignore zero values.

def getMapDict(X):
 """returns dictionary mapping list of strings to list of ints"""
 x=list(set(X))
 label=[i+1 for i in range(len(x))]#zero values ignored in sparse matrix
 x=np.sort(x)#sort list alphabetically
 dic=odict(zip(x,label))
 return dic

def mapToInt(X,dic):
 """convert a list of strings into a list of ints"""
 for j in range(len(X)):
 X[j]=int(dic[X[j]])
 return np.array(map(int,X))

(I return the dictionary as I need to use the same mapping for my training and test set)

Then with some very hacky and ugly code (well, the whole thing is really) I read in the text files, mashed the other features on to the end of the tf-idf vectors, wrote them back to file and all seemed well. Until I actually tried to plug em into my classification algorithm and get the following reply:

ValueError: Expected input with 2133 features, got 2132 instead

 

What??? Where the hell did one of my features go??? After a day of debugging I’m still stumped and hoping for one of these moments…

Image

sandwich

April 13, 2013

It’s a lovely sunny day and I am stuck in my silly office with no work to do.
Tick tock.

Here is a picture of some delicious birds to pass the time 🙂

 

 
 

  
 

  

 
 

om nom

yeah but no but yeah

April 12, 2013

I’m having a very slow brain couple of days. Left my swipe card in work on Friday, left my keys in my door yesterday and picked up a fork to eat my yoghurt at lunch today. I’m steering clear of any complicated tasks for the next 24hour period.

Here is a graphic demonstrating the internet in China

*cough*

“All reactionaries are plastic bottles”

April 11, 2013

I like recycling.

So did Mao. It was one of the main tenets of his philosophy as depicted in this well known propaganda poster.

 

Paint the town red

Awareness

April 11, 2013

I’m at work and skiving a bit under the guise of “researching website design and usability” and I came across this. I’m no expert on Thatcher’s policies or what went on under her iron fist but I have great respect for her strength. Various people have commented that she was the most important figure in British politics since Churchill, no small feat. And while there is a lot of funny stuff on the web at the moment about foreign press mistaking her for someone else, young people should take this opportunity to find out who she was and educate themselves a bit. Tweets like:

Don’t know or care who Margaret Thatcher is tbh

in my opinion, make you look like a bit of a tit.

Moo

April 10, 2013

According to the interweb, you see at least one cow everyday. YAY!
Here are several rude chinese cows, one on a Tshirt!

Enjoy cows responsibly.