Summary
Here you will find a summary of the work I have done so far…
- This is the PHP Wrapper class called Phirehose for the Twitter Streaming API I have been using, I did not write it but it is an important part of this project
- Here are my collection and processing php files. A live demo is not available but they do currently run and work.
- Here is a small example of one of the hourly log files the collection script outputs every hour. This one is much larger.
- Here is the schema diagram of the database (click for large)

- On the left of this page is the top10.php file I wrote. It is just a quick demonstration of the dynamic data I have, as long as the collection process runs those numbers should continue to change.
- Other sample code I have written is a simple user search and the Top 25 Twitter clients used, and users with more than 10 messages.
- Here are links to WEKA and TagHelper, the data mining software I am using to classify text.
- Here is a training set and to test sets(one & two). I’ve had the best success so far by running the training set with the settings below, but I have far from explored all of the possible combinations.

- That is where I am at now. I’ve done everything I had hoped to accomplish for this project, a lot uglier than I had imagined but all of the key functions work. I hope to continue work on this project past the end of the semester and develop it into something usable or integrate some key concepts in with other projects I have in mind. My progress was logged with this blog, all archives should be available on the main page. My final paper will be added to this page when it is complete.









