Tweetalysis Twitter Messages for 2009-11-28

November 29th, 2009

Tweetalysis Twitter Messages for 2009-11-27

November 28th, 2009
  • Found the issue with Tag Helper crashing on my .xls, I used the wrong .xls file format. Analyzing 750 entries now. 08:39:32
  • The 750 tweet training set got decent (75%+) accuracy but is making me re-think what the most useful classification is for now. 12:16:57
  • Suggestions vs. Not, may be more useful to a business than Negative vs. Not. Positive vs. Not would be easy but might not say much 12:18:01

Top 10 Widget

November 25th, 2009

I installed a widget that allows php code to run and stuck the Top10 script in there.  The data is reasonably accurate but sometimes my collect script stops for a few hours at a time.  In other news I’m running into some kind of error in Tag Helper, an array out of bounds exception.  It may be the case that my sample size is too big because other documents run through the trial fine.

Thanksgiving Update

November 25th, 2009

This post has nothing to do with Thanksgiving but I’m running out of post titles.  I’ve been doing most of my updates through Twitter because it is very convenient but I realize I need a real update.  Since last week I have collected 30k+ users and 40k+ tweets.  Early last week I got an interface for WEKA called Tag Helper and started playing with that.  My initial thought on rating posts was that there would be some type of scale, maybe 5 ratings.  It has come to my attention that most of the time these sorts of things are done with just two ratings, and if more are needed then you do layered filtering.  So if you wanted Positive, Neutral, Negative it would be a two stage process of separating out Negative from Not Negative and then filtering out the Positive and Neutral from the Not Negative set.  So at this point with the project due dating drawing closer I’ve decided that to keep in line with the main goal of this project I need to settle for something that works rather than what a full development team might end up with.  So I think I’ve chosen to pick out posts that are distinctly negative from the rest.  This set is particularly interesting to businesses because, at least the case with Starbucks, most negative comments tend to be packed with suggestions or alternatives to current practices.  If I could not only identify what % of tweets are negative, but pick out trends within those posts it would serve as a digital suggestion box.  I have 750 tweets labeled, but have not run it through anything yet.  That will be my goal for this week along with starting the paper.

Tweetalysis Twitter Messages for 2009-11-24

November 25th, 2009
  • My goal for today is to work on a MUCH bigger sample set. Instead of Positive/Negative I will do Positive/Not Positive. 08:49:17
  • I'd also like a search for user function and maybe get a google chart working with one of the various PHP libraries available. 08:50:12
  • It is quickly coming to my attention that "negative" is not as simple as it seems.Negative about Starbucks, or negative feelings in general? 11:38:32
  • Relevant to my intrests -> Social Media Analytics: Twitter: Quantitative & Qualitative Analysis – http://bit.ly/5ti2io 20:19:34
  • Tweetalysis now has over 30K user records and 40k tweet records. Today I labeled about 750 tweets, maybe I will run a quick test tonight… 20:22:18

Tweetalysis Twitter Messages for 2009-11-22

November 23rd, 2009
  • Working on a larger test set for Tag Helper/ WEKA, I need some help picking out what settings to use. 15:00:18
  • I feel that 140 characters makes sentiment analysis quite a bit more tricky than normal 15:00:23

Tweetalysis Twitter Messages for 2009-11-21

November 22nd, 2009
  • Playing with TagHelper, creating some test sets to see what kind of accuracy they can get on tweets. So far, not so good but prob my fault 11:04:36

Tweetalysis Twitter Messages for 2009-11-20

November 21st, 2009
  • @Yurihosting, I'm working on a Twitter app. Sentiment Analysis and other analytics. I have webhosting for now but will keep you in mind! 07:03:27

Tweetalysis Twitter Messages for 2009-11-19

November 20th, 2009
  • Writing a query to see what % of users have posted more than once about @Starbucks 19:34:27
  • Not exactly what I had in mind but I got distracted by the newest issue of wired. http://ow.ly/DUg6 shows users with >10 @Starbucks posts 21:58:05
  • After a borderline hostile discussion with my new webhost it has come to my attention that background processes are commonly not allowed 21:59:11
  • The only solution,which is still technically against the tos,is to leave a machine on 24/7 with an SSH client open, running the collect code 22:00:51

Tweetalysis Twitter Messages for 2009-11-18

November 19th, 2009