Thursday, November 8, 2012

Hadoop + Cisco SocialMiner = ?

What I am trying to do is to marry BigData Hadoop with Cisco SocialMiner, to collect the latest 1000 tweets on Twitter about Cisco, and analysis what is the most common words that people are using.

One of the beauty of SocialMiner is the API of it is very simple, just a HTTP request then you can get all your tweets in XML format.  You don't need to learn Twitter and Facebook API in order to collect the social contacts that you want.  This is how it works:


Several steps involved:
1. Setup Campaign in SocialMiner to collect Twitter stream with keywords Cisco, #Cisco and @Cisco
2. Write a little script on my Linux machine to get the XML file, and extracted the content of the tweets.
3. Copy it to the Hadoop HDFS, run the word count program.
4. Use Apache Pig, an abstracted level of Hadoop which allows you to use statement similar to SQL, to do the sorting.
5. Format the output in another XML file for graph plotting.  Done!


Looks like WebEx Meeting Server (on-premises WebEx) is the hottest topic now!

No comments: