Adventures in Tech

Posts

Randomized algorithms: Verifying matrix multiplication

Another great example of randomized algorithms in the introductory chapters of the book " Probability and Computing " is verifying matrix multiplication. Suppose we have 3 matrices of compatible dimensions $A, B$, and $C$, and we want to verify that \[A\cdot B = C\] For simplicity, let's assume that all the matrices are square, and of dimension $n \times n$. The straightforward way to do the verification is to explicitly multiple the matrices $A$ and $B$ together, an operation that is of the order of magnitude of $O(n^3)$. As an aside, we can do better than $O(n^3)$ for matrix multiplication for larger matrices. Wikipedia has an excellent writeup on Strassen's algorithm which accomplishes matrix multiplication in $O(n^{2.8074})$ through divide and conquer. First the matrices $A$, $B$, and $C$ are padded with zero columns and rows so that their dimensions are of the form $2^m \times 2^m$. The matrices are then divided into equally sized block matrices of the f...

Classic Algorithms: Page Rank and the importance of web pages

Think of a search query, and go to your favorite search engine and conduct the search. Of the millions of matching documents returned, it is most likely that the first page or two contain what you are looking for. Have you ever wondered how the search engines know how to present the most relevant results for us in the first couple of pages, when all the results returned match the query we were looking for? There are a lot of signals search engines use to determine the importance of the matched results: some are intrinsic to the page such as where in the page the match occurs, whether it is highlighted or not, and the importance of the web page; and some are extrinsic such as how many users clicked on the result for a similar query. The signals, their weights, and the formulas to rank the results are usually guarded secrets by search engines as they are the secret sauce for giving users the best results to their queries. One such signal is the importance of a web page. How would yo...

Randomized Algorithms: Polynomial equivalence

We are all accustomed to deterministic algorithms; we work with them every day, and feel comfortable in knowing that the results of running them are predictable, barring a coding error of course. The idea of randomized algorithms feels remote and uncomfortable, despite their usefulness and elegance. There are a couple of great examples in the introductory chapters of the book " Probability and Computing " that are an eye opener. One is verifying polynomial identities: how can you tell that two different representations of polynomials are the same? For example, if we have two polynomials $P(x)$ and $Q(x)$, both of degree $d$ described by the following formulas: \[ P(x) = \Sigma_{i=0}^{i=d} a_i x^i \\ Q(x) = \Pi_{i=1}^{i=d} (x-b_i) \] how can we determine that they are the same polynomial? Intuitively we first check that the degrees are the same, then we could try to transform one form into the other, either by multiplying out the terms for $Q(x)$, collecting like t...

Down memory lane: Approximate counting

At work we do a lot of counting. For some counts we need an accurate result, while for others we can get by with an approximate count. For these approximate counts, as long as the result is within the same order of magnitude of the true count, we're ok. There is a lot of literature on approximate counting techniques, and the blog post "Probabilistic Data Structures for Web Analytics and Data Mining" does a great job at explaining some of them in detail, with references to the original papers. One of the original approximate counting papers was the 1978 paper: Counting large numbers of events in small registers by Robert Morris. In the paper Morris explains the context for the problem, which might seem foreign today. Back in 1978 he was faced with the problem of counting a large number of different events, and because of memory restrictions he could only use 8-bit registers for each counter. Since he was not interested in exact counts, but with accurate enough c...

React conference in San Francisco 2014

Last week I attended the React conference in San Francisco . The conference is a two day event where speakers share their experiences on building reactive systems: ones that are resilient, elastic, responsive, and message driven. The reactive manifesto web page has more detailed information about reactive systems, and why they are useful. This year the conference was at Cobb's Comedy Club , a cozy venue for the roughly 300 people that attended the conference. Because of the tight space, power plugs were non existent, but the organizers were extremely thoughtful and provided every attendee with a rechargeable battery with iPhone and Android connectors. The sessions in the conference were great, but a couple stood out. The first was Netflix's presentation "Resilient by Design", where the speaker talked about how Netflix designs and deploys their services: from using microservices that do one thing and do it very well with well defined interfaces, to cloud servic...

Mining the Social Web, by Mathew Russell, O'Reilly Media

"Mining the social web" is a book about how to access social data from the most popular social services today by using the services' public APIs, and analyzing the retrieved data to gain insights about it. The book uses the Python programming language to access and manipulate the data, and provides code snippets of common tasks within the book, as well as full iPython notebooks on Github. The book is written as documentation for the freely available iPython notebooks, with the documentation providing context and background for the code, as well as describing the algorithms used to mine the social data. The author tries to be as concise as possible, although he did not succeed in the first chapter, where the first three section were verbose, and relatively unnecessary, describing what twitter is and why people use it as a microblogging platform. With that out of the way, the writing style improves as the book progresses, and is a mixture of code examples and step ...

Can you make me a Cortado please?

A couple of months ago I stumbled upon an info graphic that depicts popular coffee drinks around the world , and thought I'd give some of these drinks a try. I started with the cortado: a drink that is popular in Spain, Portugal, and Columbia, and consists of one part espresso, and one part steamed milk. Almost every coffee shop that I went to had no idea how to make the drink, and it became a great conversation starter with the coffee barista, describing where I stumbled upon the recipe, and what other coffees are popular in different regions of the world. I was pleasantly surprised when one barista at Peet's coffee knew how to make the drink from his travels to Spain. He also suggested modifications to the drink that would make it more delicious including using whole milk instead of 2%, adding another shot of espresso, and sweetening the drink with one pack of brown sugar. The final combination is my current favorite.