Saturday, September 20, 2014

Mining the Social Web, by Mathew Russell, O'Reilly Media

"Mining the social web" is a book about how to access social data from the most popular social services today by using the services' public APIs, and analyzing the retrieved data to gain insights about it.

The book uses the Python programming language to access and manipulate the data, and provides code snippets of common tasks within the book, as well as full iPython notebooks on Github. The book is written as documentation for the freely available iPython notebooks, with the documentation providing context and background for the code, as well as describing the algorithms used to mine the social data.

The author tries to be as concise as possible, although he did not succeed in the first chapter, where the first three section were verbose, and relatively unnecessary,  describing what twitter is and why people use it as a microblogging platform. With that out of the way, the writing style improves as the book progresses, and is a mixture of code examples and step by step explanations.

The author follows the same formula throughout the book: for each of the popular social services examined, he starts with an overview of the API to access the required data and how to configure the requisite authorization tokens to access it. He then proceeds to explain how to make requests, followed by a brief description of the important APIs and data sets returned. The author then presents a couple of algorithms to mine the data, and extract valuable statistics from it, describing the algorithms without assuming prior knowledge on the reader's end. Finally the author presents a cool visualization of the insights, using either Python libraries and packages, or Google Earth APIs. The formula is quite useful, and provides the book with consistency across chapters, which can be read independently and out of order.

The author starts with Twitter. He explains the structure of Twitter API, how it uses OAuth, and how to connect with it using the python "twitter" library. The chapter progresses with example python notebooks that show how to retrieve trending topics, user timelines, search results, and manipulate the tweet contents and tweet locations to gather interesting statistics about them. The writing style is expository, showing the notebooks piecemeal and explaining them well.

In the next chapter, the author focuses on Facebook and the social graph API. The chapter starts with an exposition of the entities available through Facebook (timeline, likes, locations, etc), how to grant access tokens to each of these entities, and introduces the Facebook query language FQL. The author provides ample examples that analyzes social graph connections, Facebook pages (Pepsi vs Coke examples), statistics on friends likes, and Friend graph cliques using PrettyTables, Histograms, and graph plots.

The author then tackles LinkedIn in a similar manner, but starts introducing the more interesting data mining techniques with a brief introduction to data clustering clustering algorithms. The author talks about normalizing the data, using NLTK for language processing, and describes and uses a couple of clustering algorithms such as greedy and hierarchical clustering, and k-means to cluster LinkedIn connections. The chapter ends with cool visualization of where the talent is using Google Earth.

The author then proceeds to Google+, and describes an information retrieval example to cluster documents, using it to introduce concepts such as TF-IDF, document similarity, and analyzing language bigrams.

The next chapters are about understanding blog posts, with a brief interlude on how to crawl and scrape the web, and how to summarize documents, which comes in handy if you have no time to read the full content of web pages and you'd like to figure out the gist of the document.

The following chapter tackles mining user emails, including high level statistics on who connects to who, and how frequently do they send emails to each other. The author uses the Enron data as an example, and introduces a toolkit to do the same with Gmail accounts

The last couple of chapters deal with Github project analytics, and micro formats and RDF. The book ends with a cookbook of recipes that list the problem to be solved, offers a solution, and discusses the salient points of the solution to drive a point home. Most of the cookbook recipes are for twitter, with a couple of cases for Facebook.

Overall I recommend the book. It is decently written, and contains a wealth of introductory material on how to access content from the popular social websites, and a cornucopia of algorithms that can be used to analyze the data.


Tuesday, September 2, 2014

Can you make me a Cortado please?

A couple of months ago I stumbled upon an info graphic that depicts popular coffee drinks around the world, and thought I'd give some of these drinks a try. I started with the cortado: a drink that is popular in Spain, Portugal, and Columbia, and consists of one part espresso, and one part steamed milk.

Almost every coffee shop that I went to had no idea how to make the drink, and it became a great conversation starter with the coffee barista, describing where I stumbled upon the recipe, and what other coffees are popular in different regions of the world.  I was pleasantly surprised when one barista at Peet's coffee knew how to make the drink from his travels to Spain. He also suggested modifications to the drink that would make it more delicious including using whole milk instead of 2%, adding another shot of espresso, and sweetening the drink with one pack of brown sugar. The final combination is my current favorite.

Migrating from the Macbook Air to the Macbook Pro

My Macbook Air started to show wear and tear after a couple of years of heavy use. Five of the keys on the keyboard broke and had to be replaced, despite my light touch typing--honestly, and the battery has moved from the warning that it needs service to retaining electric charge for shorter and shorter periods of time, to not holding a charge at all.

For my next device I contemplated getting another Macbook Air with the highly enviable 12hr battery life, or switching back to a 15 inch Macbook Pro, and enjoying a more powerful machine, with a retina display and a respectable 8 hours of battery life. The prospect of more screen real-estate, and more processor power was too enticing, so I ended up getting the Macbook Pro despite the weight difference.

Since I accumulated 3+ years worth of data and software on the Air, I did not want to repeat the process of reinstalling apps from scratch and searched for an easy way to migrate the data to the new machine.  Most of the online recommendations were to use the Migration Assistant during the initial install phase. I decided to give it a try. I ran through the install process, created an account on the new new Macbook Pro, and started the Migration Assistant on the old Air and the new Pro. I chose migration over Wifi since I did not have a firewire cable handy, which turned out to be a mistake. Either due to a bug or an unreliable Wifi connection, the migration assistant would crash on both computers when it tried to migrate data between the machines.

I did not give up on the Migration Assistant, and decided to try another method: migration using a Time Machine backup. I took a full backup of the Macbook Air, which took roughly 8 hours to finish, and used the Migration Assistant to transfer that to the new Pro. The migration initially failed since my user name existed on the new machine--an easy fix by renaming the user, but then after starting the migration again,  the data copying was stuck for almost 11 hours with no progress. A quick search online revealed that this is a widespread problem, and not an isolated instance.

I decided to try my luck elsewhere. Although Apple does not recommend restoring a computer from a backup of another computer, I decided to give that a try,  especially since some of the applications I have installed including VPN and MacPorts store their data in non-standard locations.

I booted the new Macbook Pro in recovery mode, and  chose restore from Time Machine. I left the machine running overnight--the full restore took about 6 hours to finish, and by the end I had an identical setup to the one I had on the Macbook Air. I had to re-enter my license information for my applications, and apart from a couple of minor glitches with the Kindle App, and iCloud accounts which were easy to fix, everything worked like a charm, and I am very pleased with the result.

Hopefully I won't need to migrate computers anytime soon, but if I have to, now I know how to do it using Time Machine backups.

Monday, September 1, 2014

Fitbit bands

I am fairly disappointed in the quality of the Fitbit flex bands. After a couple of months of moderate use, the bands developed deep cracks and finally broke. There are a lot of accounts online and on Facebook from Fitbit users who have experienced similar issues, so the problem is widespread and not isolated.

What makes matters worse, is that after ordering and getting a replacement band from Fitbit, a rash developed on my skin. I did not experience a rash with the prior band color, and my guess is that the material is different from the original one I had.

Now I have two choices, either get another replacement band with the original color, and hope that it does not cause a rash, or give up on the Fitbit completely, and wait for some of the newer health tracking technologies around the corner. An iWatch perhaps?