Skip to main content

Posts

Showing posts from December, 2016

A hole in the wall

I am a big fan of good and delicious food, irrespective of where it is sold. That includes street vendors, and “holes in the wall,” which I have always associated with small nondescript places, with no signs on the venue, no place to sit, and a staff that exudes a slightly higher risk of contracting dysentery, typhoid, or other gastrointestinal diseases. That description might be a bit extreme, but I had some of the best meals in similar places, including the famous Hyderabadi Dum-Biryani in a place not so far from that description. So where did the phrase a “hole in the wall” come from? On another historical tour of Florence, our tour guide and language enthusiast pointed out some of the palaces where Italian nobility such as the Medici family lived long time ago. Invariably at the entrance there was a slit or a hole in the wall, and the tour guide told us the story that after the nobility hosted lavish dinner parties, instead of throwing the remaining food away, they would gi

A paper a day keeps the doctor away: Efficient Commit Protocols for the Tree of Processes Model of Distributed Transactions

The two-phase commit protocol is widely known in the database community, yet despite its notoriety, finding the paper that first described it proved pretty difficult. The closest find is the paper about commit protocols for distributed transactions , which describes the protocol in detail, and introduces two additional modifications. The paper references the original publications of the protocol by Gray, Lamport, and others, however I could not find these notes   online. The paper describes the 2PC protocol, both under normal operation, and when failures occur. For normal operation, when the user decides to commit a transaction, the coordinator initiates the first phase of the protocol--the prepare phase, by sending prepare messages to all its subordinates. Each subordinate that can commit the transaction, writes a prepare log record, and sends an ACK back to the coordinator. The subordinates that can't commit the transaction, write an abort log record, and   respond w

The wrong end of the stick

One of my favorite activities while traveling is to take a walking tour of the city I am visiting. The tour usually consists of a small group led by a tour guide, who invariably is a student of art or history studying abroad, or an expat humanities graduate who is living abroad and is augmenting their income by giving tours. The tours are always enjoyable, combining stories about the city and its history, architecture, and cultural spots with frequent stops to coffee and dessert shops. Sometimes you get a special tour guide, who in addition to being a history buff, is also a linguistics enthusiast. When that happens, you hear special stories about the historical origin of phrases: something I am very interested in. In Rome, I had such a tour guide, and the story stuck with me, although I could not verify its accuracy. I could find one website that has a similar reference to the story. It was hilarious and I remembered it to this day. It is the story of the origin of the phrase “

A paper a day keeps the doctor away: BlinkDB: Queries with Bounded Errors and Bounded Response Times on Very Large Data

The latest advances in Big Data systems have made storing and computing over large amounts of data more tractable than in the past. Users' expectations for how long a query should take to complete have not on the other hand   changed, and remain independent of the amount of data that needs to be processed. The expectation mismatch of query run time causes user frustration when iteratively exploring large data sets in search of an insight. How can we alleviate that frustration? BlinkDB offers users a way to balance result accuracy with query execution time : the users can either get quantifiably approximate answers very quickly, or they can elect to wait for a longer period of time to get more accurate results. BlinkDB accomplishes this tradeoff through the magic of dynamic sample selection, and an adaptive optimization framework. The authors start with an illustrative example of computing the average session time for all users in New York. If the table that stores users

A paper a day keeps the dr away: Dapper a Large-Scale Distributed Systems Tracing Infrastructure

Modern Internet scale applications are a challenge to monitor and diagnose. The applications are usually comprised of complex distributed systems that are built by multiple teams, sometimes using different languages and technologies. When one component fails or misbehaves, it becomes a nightmare to figure out what went wrong and where. Monitoring and tracing systems aim to make that problem a bit more tractable, and Dapper, a system by Google for large scale distributed systems tracing is one such system. The paper starts by setting the context for Dapper through the use of a real service: "universal search". In universal search, the user types in a query that gets federated to multiple search backends such as web search, image search, local search, video search, news search, as well as advertising systems to display ads. The results are then combined and presented back to the user. Thousands of machines could be involved in returning that result, and any poor p