Adventures in Tech

Posts

Showing posts from December, 2014

Randomized algorithms: Verifying matrix multiplication

Another great example of randomized algorithms in the introductory chapters of the book " Probability and Computing " is verifying matrix multiplication. Suppose we have 3 matrices of compatible dimensions $A, B$, and $C$, and we want to verify that \[A\cdot B = C\] For simplicity, let's assume that all the matrices are square, and of dimension $n \times n$. The straightforward way to do the verification is to explicitly multiple the matrices $A$ and $B$ together, an operation that is of the order of magnitude of $O(n^3)$. As an aside, we can do better than $O(n^3)$ for matrix multiplication for larger matrices. Wikipedia has an excellent writeup on Strassen's algorithm which accomplishes matrix multiplication in $O(n^{2.8074})$ through divide and conquer. First the matrices $A$, $B$, and $C$ are padded with zero columns and rows so that their dimensions are of the form $2^m \times 2^m$. The matrices are then divided into equally sized block matrices of the f...

Classic Algorithms: Page Rank and the importance of web pages

Think of a search query, and go to your favorite search engine and conduct the search. Of the millions of matching documents returned, it is most likely that the first page or two contain what you are looking for. Have you ever wondered how the search engines know how to present the most relevant results for us in the first couple of pages, when all the results returned match the query we were looking for? There are a lot of signals search engines use to determine the importance of the matched results: some are intrinsic to the page such as where in the page the match occurs, whether it is highlighted or not, and the importance of the web page; and some are extrinsic such as how many users clicked on the result for a similar query. The signals, their weights, and the formulas to rank the results are usually guarded secrets by search engines as they are the secret sauce for giving users the best results to their queries. One such signal is the importance of a web page. How would yo...

Randomized Algorithms: Polynomial equivalence

We are all accustomed to deterministic algorithms; we work with them every day, and feel comfortable in knowing that the results of running them are predictable, barring a coding error of course. The idea of randomized algorithms feels remote and uncomfortable, despite their usefulness and elegance. There are a couple of great examples in the introductory chapters of the book " Probability and Computing " that are an eye opener. One is verifying polynomial identities: how can you tell that two different representations of polynomials are the same? For example, if we have two polynomials $P(x)$ and $Q(x)$, both of degree $d$ described by the following formulas: \[ P(x) = \Sigma_{i=0}^{i=d} a_i x^i \\ Q(x) = \Pi_{i=1}^{i=d} (x-b_i) \] how can we determine that they are the same polynomial? Intuitively we first check that the degrees are the same, then we could try to transform one form into the other, either by multiplying out the terms for $Q(x)$, collecting like t...

Down memory lane: Approximate counting

At work we do a lot of counting. For some counts we need an accurate result, while for others we can get by with an approximate count. For these approximate counts, as long as the result is within the same order of magnitude of the true count, we're ok. There is a lot of literature on approximate counting techniques, and the blog post "Probabilistic Data Structures for Web Analytics and Data Mining" does a great job at explaining some of them in detail, with references to the original papers. One of the original approximate counting papers was the 1978 paper: Counting large numbers of events in small registers by Robert Morris. In the paper Morris explains the context for the problem, which might seem foreign today. Back in 1978 he was faced with the problem of counting a large number of different events, and because of memory restrictions he could only use 8-bit registers for each counter. Since he was not interested in exact counts, but with accurate enough c...