Sunday 28 September 2008

Abandon Ship!

It is now over a month since I last posted and if you have been looking in you might have formed the impression that this blog has become the Marie Celeste - totally unmanned!

Actually my title for this post reflects abandonment of that great ship of research HMS Statistic otherwise known as SPSS. After spending considerable time trying to get anything sensible out of the software using cluster analysis I finally realised that the solution to my question doesn't come from analysis of values but from comparison. The reason is that all my data gathered from the various questionnaires are in the form of categorical data (or have been converted) so does not yield meaningful values.

I spent quite a lot of time casting around for other methods of comparing data values. For a while I was drawn towards the principles behind DNA and protein analysis. These are represented as strings of letters representing the different components (sorry exact description evades me). The software compares the strings, which can be lengthy, to arrive at matches between values. Needless to say the software that does the analysis is hugely complex and expensive, however, I did find a simple version that had been written for Excel. Sadly it was long out of circulation and the author (with the activation code) could not be traced. Actually this was a bit of a blind alley as the spreadsheet compared protein strings with published ones - not helpful for me but I was impressed with the basic principles.

What this highlighted for me is that what I want to know is whether sub-groups exist within the cohort of students. The sub-group need not share exactly the same characteristics but should be close enough to be distinct from other groups. Taking the principles of DNA analysis I then looked for solutions which compared letters as if these were words and found a simple Excel add-in that looks through words to spot typos. It is based on fuzzy logic and the approach is used in Internet search engines to find similar spellings.

The add-in, called Fuzzy Duplicate Finder, cost £20 (so not expensive) and will look through a list of words and group them into similar clusters. It can be set to identify groups where there are from one up to six letters different. I have now got the spreadsheet set up to sort students and their responses with the categorical values converted to letters and strung together into single 'words'. Using the add-in the spreadsheet will then create 7 sets of clusters for identical words, words with one letter different, two different and so on up to 6 letters different.

There is a caution in using this approach as the sequence in which the data is searched can influence the cluster. I need to do more work on this but as the students are listed in random order this should be a useful starting point. To start with I am looking at responses to related questions eg respones to questions about social life, domestic circumstances etc. Once the clusters are created it will be a case of looking at the individual students to find any shared features. To help with this I have also spent time setting up a spreadsheet to analyse pairs of response. These are collected and shown as values in a bubble chart. The advantage of bubble charts is that the size of the bubble indicates the scale of the values so is very visual.

The final piece of progress to report is that I circulated the last of the end of module questionnaires last week. This will bring to three the sets of data included in the longitudinal study and for this one I have included specific questions about time out for holidays and for preparing for exams (which come up in a couple of weeks). I have sent the questionnaire to 34 students and so far 25 have replied. If I can get 5 more to respond I will be happy!

As you can see the last month has not been without progress but I am at a point where I am starting to think about presenting the findings in 45,000 words. I suspect that I may already have more than I need so I hope to meet up with my supervisor in the next couple of weeks to discuss this.

All being well I aim to post the next entry sooner rather than later!