Tweetalysis Twitter Messages for 2009-12-11

December 12th, 2009
  • It turns out that going from the database to CSV rather than XLS->CSV works a lot better, WEKA recognizes my files now. 09:35:13
  • Got TagHelper to run an SMO test, still having a hard time with the SVMlib and SVMreg tests at this point. 10:19:41
  • Hoping to take some of the things I learned earlier this week about the options in TagHelper to get the model to work better. 10:20:20

Tweetalysis Twitter Messages for 2009-12-06

December 7th, 2009
  • Having trouble with the LibSVM WEKA/TagHelper filter. It's telling me "can not handle binary class" 10:55:40
  • Spent about 2 hours trying to figure out how to format my data so WEKA can understand it but no success. CSV->ARFF should be easy but isn't 12:37:10

Tweetalysis Twitter Messages for 2009-12-04

December 5th, 2009
  • Trying to figure out this .arff format that WEKA uses and how I can automate MySQL exports then convert them to .arffs for WEKA to process 10:22:46
  • Working on my paper a bit. 21:19:00

Tweetalysis Twitter Messages for 2009-11-28

November 29th, 2009

Tweetalysis Twitter Messages for 2009-11-27

November 28th, 2009
  • Found the issue with Tag Helper crashing on my .xls, I used the wrong .xls file format. Analyzing 750 entries now. 08:39:32
  • The 750 tweet training set got decent (75%+) accuracy but is making me re-think what the most useful classification is for now. 12:16:57
  • Suggestions vs. Not, may be more useful to a business than Negative vs. Not. Positive vs. Not would be easy but might not say much 12:18:01

Top 10 Widget

November 25th, 2009

I installed a widget that allows php code to run and stuck the Top10 script in there.  The data is reasonably accurate but sometimes my collect script stops for a few hours at a time.  In other news I’m running into some kind of error in Tag Helper, an array out of bounds exception.  It may be the case that my sample size is too big because other documents run through the trial fine.

Thanksgiving Update

November 25th, 2009

This post has nothing to do with Thanksgiving but I’m running out of post titles.  I’ve been doing most of my updates through Twitter because it is very convenient but I realize I need a real update.  Since last week I have collected 30k+ users and 40k+ tweets.  Early last week I got an interface for WEKA called Tag Helper and started playing with that.  My initial thought on rating posts was that there would be some type of scale, maybe 5 ratings.  It has come to my attention that most of the time these sorts of things are done with just two ratings, and if more are needed then you do layered filtering.  So if you wanted Positive, Neutral, Negative it would be a two stage process of separating out Negative from Not Negative and then filtering out the Positive and Neutral from the Not Negative set.  So at this point with the project due dating drawing closer I’ve decided that to keep in line with the main goal of this project I need to settle for something that works rather than what a full development team might end up with.  So I think I’ve chosen to pick out posts that are distinctly negative from the rest.  This set is particularly interesting to businesses because, at least the case with Starbucks, most negative comments tend to be packed with suggestions or alternatives to current practices.  If I could not only identify what % of tweets are negative, but pick out trends within those posts it would serve as a digital suggestion box.  I have 750 tweets labeled, but have not run it through anything yet.  That will be my goal for this week along with starting the paper.

Tweetalysis Twitter Messages for 2009-11-24

November 25th, 2009
  • My goal for today is to work on a MUCH bigger sample set. Instead of Positive/Negative I will do Positive/Not Positive. 08:49:17
  • I'd also like a search for user function and maybe get a google chart working with one of the various PHP libraries available. 08:50:12
  • It is quickly coming to my attention that "negative" is not as simple as it seems.Negative about Starbucks, or negative feelings in general? 11:38:32
  • Relevant to my intrests -> Social Media Analytics: Twitter: Quantitative & Qualitative Analysis – http://bit.ly/5ti2io 20:19:34
  • Tweetalysis now has over 30K user records and 40k tweet records. Today I labeled about 750 tweets, maybe I will run a quick test tonight… 20:22:18

Tweetalysis Twitter Messages for 2009-11-22

November 23rd, 2009
  • Working on a larger test set for Tag Helper/ WEKA, I need some help picking out what settings to use. 15:00:18
  • I feel that 140 characters makes sentiment analysis quite a bit more tricky than normal 15:00:23

Tweetalysis Twitter Messages for 2009-11-21

November 22nd, 2009
  • Playing with TagHelper, creating some test sets to see what kind of accuracy they can get on tweets. So far, not so good but prob my fault 11:04:36