Friday, December 18, 2015

A paper a day keeps the doctor away: FIT A Distributed Database Performance Tradeoff

In distributed systems, the CAP theorem provides a framework for thinking about the consistency, availability, and partition tolerance guarantees a system can provide. In their paper "FIT, a distributed database performance trandeoff", Faleiro and Abadi present a similar framework for thinking about distributed database performance.

The authors start with some intuition about distributed transactions: ones that rely on data that sits in different nodes in a distributed system. For the distributed transaction to guarantee atomicity, coordination between the participating nodes is required, The coordination offers systems designers a tradeoff choice between throughput and strong isolation. Guaranteeing strong isolation impacts the system throughput, and increasing throughput would imply allowing transactions to execute concurrently in spite of the presence of conflicts.

The authors introduce another variable, fairness, that interplays with the tradeoffs between strong isolation and throughput. The idea is that when the system is given license to selectively prioritize or delay transactions, it can improve throughput while still guaranteeing strong isolation. Instead of thinking about the tradeoff between strong isolation and throughput, the authors present the three way tradeoff between fairness, isolation, and throughput "FIT", and postulate that a system that forgoes one of them can guarantee the other two.

The authors provide some of examples of fairness play, such as "group commit" for in-memory databases, where the transaction cost is small, but the cost of writing the logs to durable storage is high and limits the throughput. In "group commit", the database accumulates log records from multiple transactions, and writes them to disk in one batch, working around the disk write bottleneck and increasing the system throughput at the cost of decreasing fairness, since the transactions can't commit until their buffered log records are flushed to disk.

Another example the authors provide is "lazy evaluation", where transactions are deferred to ensure that data dependent transactions are executed together, to amortize the cost of bringing the affected data into the processor cache and main memory across the transactions, improving throughput but decreasing fairness.

The authors categorize systems according to the interplay between fairness, isolation, and throughput, and present three classes of systems, with practical examples of each class:
  • Ones that guarantee strong isolation and fairness at the expense of throughput
    • Spanner--Google's geo-scale distributed database

  • Ones that guarantee strong isolation and good throughput at the expense of fairness
    • G-Store--a key value store with support for multi-key transactions
    • Calvin--a database system designed to reduce the impact of coordination in distributed transactions through imposing a total order on the transactions

  • Ones that guarantee good throughput and fairness at the expense of strong isolation
    • Eventually consistent systems--Cassandra for example
    • RAMP systems--read atomic multi partition transactions
The authors close by pointing that the FIT tradeoff interplay is also applicable to multi-core database systems such as Silo--a main memory database system designed to reduce contention on shared memory, and Doppel--a main memory database system that exploits commutativity to increase concurrency.

Friday, October 16, 2015

A paper a day keeps the doctor away: The 8 Requirements of Real-Time Stream Processing

In recent years there has been an explosion of data all around us. The data comes in from a variety of sources, such as financial real-time systems, cell phone networks, sensor networks--RFID and IoT, and GPS. Commensurate with this dramatic increase in data, is a corresponding unquenchable thirst for analysis and insights. The natural question arises: how do we build systems that process and makes sense of this vast amount of data, in as close to real-time as possible? What patterns of software and systems should we look at?

Michael Stonebraker of database fame et al. offer some advice on what to consider in their paper: "The 8 requirements of real-time stream processing" published a decade ago. In the paper, the authors list eight guiding principles that high-volume low-latency systems should follow to be able to process vast amounts of data in near real-time.

First, the systems have to keep the data moving, and do straight-through processing with minimal to no writes to disk to achieve the low-latency desired. The authors compare passive (polling) systems versus active (event driven systems) and recommend the latter.

Second, the authors recommend supporting a high-level language--dubbed StreamSQL, with built-in extensible stream oriented primitives and operators to process the data instead of writing custom code in languages such as C++ and Java.

Third, the system has to handle stream imperfections such as delayed data, missing data, or out of order data, and have timeouts for potentially blocking data to ensure system liveness.

Fourth, the system has to integrate stored and streaming data, to be able to reprocess data when necessary.

Fifth, the system has to generate predictable outcomes and repeatable results, such as when it needs to reprocess data for recovery, or handling duplicate data.

Sixth, the systems have to guarantee data safety and availability, with uninterrupted fail-over between primary and backup systems ala "Tandem-style" computing.

Seventh, the system has to partition and scale applications automatically, between cores and across machines to be able to seamlessly handle any increase in load.

Finally, the system has to be quick, process and respond instantaneously to streaming data, which requires careful planning and coding to minimize boundary crossing, and maximize the ratio of useful work to computation overhead.

The authors examine common architectures that fulfill parts of the requirements they listed above including databases (DBMS), rule engines that are built on condition/action pairs, and stream processing engines. They present in tabular form where the systems excel at, and where they don't. The table leans toward using stream processing engines instead of DBMS which are not optimized for the task.

Despite being a decade old, the paper is still relevant, and referenced in the modern literature. Moreover, it is well written and a pleasure to read.

Friday, October 2, 2015

Why do you need to warm diesel engines

In the Pacific Northwest, a lot of people use their trucks as everyday commute vehicles, which makes sense, since the climate is wet, and the terrain hilly, and in wet and cold conditions people feel safer in their four wheel drive vehicles. Some of these trucks are the heavy duty ones, with big diesel engines, and lately I have noticed at work a couple idling sans driver for at least 5 minutes. It got me curious about why would you need to idle a diesel engine especially since modern gasoline engines do not require idling before driving off and putting load on the engine. A web search helped piece the answer to this puzzle.

Diesels operate differently than gasoline engines. Instead of relying on spark plugs to light up the air and fuel mixture inside of the engine cylinders, diesels rely on high compression ratios that cause the air and fuel mixture inside of the cylinders to ignite.  Because of the high compression ratios, diesel engines are typically bulkier and more sturdy than their gasoline counterparts. Moreover diesels typically operate with a higher thermal efficiency than gasoline engines, which means less heat is dissipated to heat the engine block, and the lubricating fluids.

Both factors mean that the engines need a bit more time before they operate at their sweet spot. This translates to longer idling time before the engine can sustain load. The modern diesel engines have technological advances that help minimize the idling time, such as engine block heaters, and higher idling RPMs. The EPA website contains some useful information about modern diesel engines

As an aside, it turns out if you own a diesel truck, you have to be good at managing your time, since you cannot just turn on the engine in cold weather and get on with your life when you're running late; you have to wait for a little bit till the engine is ready.

Thursday, September 24, 2015

A week with Edge

I have been using Internet Explorer ever since I switched back to Windows, and have been satisfied with it. Apart from its end of life status, and a couple of annoying bugs when I have more than 10 tabs open, it has served me well. With the latest Windows 10 update, I wanted to try the next generation browser: Edge.

Going in, I knew that Edge is not a finished product, and that it has a long way until it competes with the other established browsers on the market. Nevertheless I decided to give it a try.

My first experiences with it were positive: it is light weight and very fast, and when I have many tabs open it does not suffer from the same feat as IE does, where the browser hangs randomly and the abominable recover web page ribbon appears at the bottom of the screen.

I was also surprised when I did not end up using the cool new features such as the readability view and web notes as much as I thought. I liked the integration with Cortana through the context menu, which I can use to define terms, and context search within the page.

Because of Edge's maturity level, there are many missing feature annoyances. I miss bookmark syncing between devices, as well as open tab syncing, something I got used to using Safari on the Mac long ago. I also miss the support for extensions, although I am sure these will come in some day.

Overall I like Edge, and I think with every release it will become better. I'll continue using it as my every day browser, however if I were not at Microsoft, I would have probably have gone back to IE and waited to make Edge my every day browser till it became a bit more mature.

Tuesday, September 22, 2015

Are oranges named after their color?

One of the great side effects of watching educational videos with my son is the wealth of seemingly innocent questions that arise afterwards, and the entertaining web searches to find the answer. One such question is for oranges which came first, the color or the name of the fruit?

Turns out there are a lot of theories online, and the one that rings true is this quora thread on the origins of the name. According to the online etymology dictionary the name of the fruit evolved through trade from the original Sanskrit name for the orange tree (naranga). to the Persian narang, to the Arabic naranj, to the Italian arancia, to the Latin orange, to the French orange, and finally to the current form circa 1300. The name of the color came after that.

Where would we be without the Internet and kids questions.

Friday, September 11, 2015

Why are barns red?

I have always wondered why most barns are painted red. It is always an aesthetic sight to see one while driving in the countryside, both during the lush green days of summer, or the yellow arid days of winter. There is of course a chance that farmers chose the color red for its aesthetics value, but I wondered if there was a more practical reason for the choice.

An online search produced a bevy of results with equally reasonable choices. One of the sites argues that in the older days, one of the practical methods to seal the barn wood and protect it from the elements was to paint it with a mixture of linseed oil, and  additions of milk and line. The red color would come from adding either the blood of a recent slaughter or from ferrous oxide--rust. As the paint would dry it would turn into a dark red color. I buy the rust theory, since there is a lot of rust to be had everywhere, and the blood theory is a bit weird.
The Smithsonian magazine adds a physics spin to the answer, by explaining why rust or ferrous oxide is an abundant material in the universe, and that this abundance is the most likely reason farmers used it in the barn paint mixture. The article explains why iron is abundant through the evolution of stars, as they go from collapse to explosion, and the reactions that combine protons and neutrons into heavier materials as the cycles progress, and finally stopping when the atomic mass becomes 56 (iron). I like the explanation, although it begs the question why the reactions stop at 56. But that's a question for another search.

Monday, August 10, 2015

Weird Sleep/Wakeup problems in Windows 10 preview laptop

My Lenovo X1 Carbon laptop experienced some weird problems earlier in the Windows 10 preview cycle; it would crash after a couple of sleep/wakeup cycles, and reboot afterwards. I was surprised that I was the only one experiencing that problem internally, especially since all the preview flighting was going smoothly. But when the problem persisted after a couple of internal upgrades, I decided to dig deeper and figure out what was going on.

When Windows crashes, it writes a memory dump to the C:\Windows\memory.dmp file, and you can examine the contents of that file, and figure out reasons for the crash through windows kernel debugger (Windbg). Windbg is available for download either separately or as part of the WDK. Once it is downloaded, the process is easy. First run Windbg with elevated permissions (Run as Administrator), and open the memory dump file (CTRL+D). If the debugger complains about the symbols, try to fix them and reload through:
.symfix; .reload

You can then look for what caused the crash. In my case, it was a problem with ndis.sys where the network driving was timing out and causing Windows to crash. I am not sure how the network driver got corrupted during my flighting upgrades, but uninstalling the driver (Winkey+X, run device manager, find the adapter, and uninstall) and reinstalling it solved the problem from that time onward.

Wednesday, July 15, 2015

IE11 and broken scrolling

When I first installed Windows 10 preview on my laptop, IE11 scrolling stopped working when using the touchpad to scroll. Interestingly scrolling worked great in all other programs: explorer, outlook, one note, and many others. For IE11, unless I click on the tab title, and avoid clicking on anything else in the tab area, scrolling did not work. It was quite an annoying behavior, but not a show stopper for trying out Windows 10. I ended up learning how to use the keyboard for scrolling through the web pages in lieu of touchpad goodness.

With preview updates, the problem did not get any better, so I searched on the web to see if the issue was widespread. It turns out that it was, and it was not restricted to Windows 10 preview either. There were a lot of solutions online, that did not make much sense, like resetting IE11, going to the advanced tab and disabling smooth scrolling, and a slew of others. The one that made sense was a problem in the Synaptics driver, which for older style applications such as IE11 sends the wrong scroll messages to the scrollbars. The fix was obvious, upgrade the driver to the latest version. After upgrading  to, touchpad scrolling worked again in IE11. I still use the keyboard shortcuts though.

Sunday, July 12, 2015

Limited Wi-Fi Internet connectivity

While using my preview build of windows 10, sometimes I face the dreaded "Limited Wi-Fi Internet" connectivity issues, where the Wi-Fi adapter seems to be connected to the Wi-Fi router, but full Internet access is not possible. I often attributed these issues to quirks in the preview builds, and a simple computer restart--reminiscent of the older Windows releases--seemed to fix the issue.

But not yesterday, where multiple restarts did not ease the pain. Even deleting the Wi-Fi network and recreating it again did not help. The dreaded "Limited Wi-Fi" banner under the Wi-Fi network name continued to rear its ugly head.

Luckily I had an Ethernet cable handy, so I hard wired the laptop and checked online to see if others have faced a similar issue. There was a considerable number of people experiencing the issue, with various solutions. The one that made sense for me was a bad wireless driver install, which was easy to fix. Before you attempt to replicate the solution, make sure you are connected to the Internet via an Ethernet cable since you'd need to download the latest drivers from the Internet.

First I deleted the Wi-Fi network definition, and did not create a new one. Then I went to the device manager--Win Key+X, selected the Device Manager, and selected the wireless adapter. For my computer, that was the Intel Dual Band wireless adapter.

I right-clicked on the adapter, and selected uninstall, and in the dialog boxes choose to remove the driver from the computer. This gave me a clean slate to reinstall the driver from the Internet.

After the uninstall was successful, from the Device Manager Action Menu, I selected scan for new hardware, which popped up the Intel Dual Band wireless adapter again. I then right-clicked on the adapter, and selected update the driver, and selected update from the Internet. After the download was complete, I recreated my Wi-Fi network, and things worked again like a charm.

I am not sure why the wireless driver got corrupt in the first place, but it is good to know that the "Limited Wi-Fi Internet Connectivity" issue can be easily fixed.

Friday, July 10, 2015

Thunderstorms and Lightening

The other day I heard a great educational segment on thunderstorms and lightening on NPR. The segment highlighted that since thunderstorms and lightening strikes were not very common on the west coast, a lot of the older buildings and houses are not equipped to handle them well like their counterparts on the east coast.

And because lightening strikes are relatively rare here, when one occurs it becomes news around the area. The segment mentioned that the most famous one was when lightening struck a tree in an arboretum and caused the tree to explode. The lightening passed through the core of the tree, and generated a lot of energy that heated up the moisture within the bark, and turned it into steam. The steam expanded and turned the tree into projectile shards that flew 30 yards away from the tree and got embedded in the soil. It must have been scary to witness such an event.

The segment ended by offering some practical advice on what to do if you're caught in a thunderstorm outside. Best to be in a car, since the car body will protect you, and if you are no where near, then seek the lowest area you can find and crouch closer to the ground. I don't want to ever use these tips.

Monday, July 6, 2015

The Lumia 640XL

Over the long weekend I got a Nokia Lumia 640XL phone. I decided to graduate to the new ridiculously large screen size phone after sticking with the more manageable screen sizes of the iPhone 5 and its predecessors. I would have stayed within the iOS/Android eco-systems, but I wanted to give a Windows phone a try, and see why the platform has not been successful in the past.

The phone is nice and relatively inexpensive ($240 without a contract), with a ridiculous screen size, great graphics and battery life. The screen size is a blessing when reading emails, Kindle books, and surfing the Internet, and I believe my usage has increased accordingly. The screen is sharp, and the sound quality of calls is great.

With heavy email and web browsing the battery lasted 2 days. The phone comes with crippled memory though (8GB which used be good, but after years of using iOS phones, it is not enough). Luckily the phone is expandable through Micro SD cards, and a 128 GB MicroSD would set you back around $70 from Amazon.

Windows phones have some usability idiosyncrasies compared with their iOS counterparts, and I am not sure if these are because of patents, or design choices. One is killing applications in the app center, where instead of swiping up as in iOS, you swipe down, and the other is the excessive reliance on the back button instead of swiping left to go back except in Internet Explorer. I also found that loading up web sites in IE takes a longer time unlike Safari or Chrome.

And despite the minimal set of apps that I use, I was surprised to see that a few of them were not available for Windows. For the platform to become successful, the Windows store has to attract a whole lot of developers than it has done so far, and perhaps that's the plan in Windows 10. For now since I rely a lot on my phone for work, I'll stick with the official 8.2 builds instead of trying a preview one until I hear what other people's experiences are.

Monday, June 29, 2015

Font rendering on Windows

There are a lot of articles on the Internet comparing font rendering philosophies between Windows and Mac, including Damien's,  Jeff Atwood's, and Joel Spolsky's. The articles come with a vibrant set of comments that advocate one rendering philosophy over the other based on aesthetics, readability, and eye comfort.

So far I have been oblivious to the difference, since I have been using the Mac exclusively for the last 15 years. But with my recent switch to Windows the rendering difference popped up, and it was not the font aesthetics since the font rendering on both platforms looked good to me.

Rather I noticed that I can read on Windows for a long period of time without my eyes getting tired. On the Mac I needed AntiRSI or Timeout to help me take work breaks every half hour to alleviate eye soreness. After I switched to Windows I have not had a need to search for their equivalents.

Sunday, June 28, 2015

Windows 10 "Cannot update system reserved partition"

I've been using an earlier version of Windows 10 preview for a couple of weeks, and have been pleased with it. However when I tried to upgrade to a new drop, I was greeted with a cryptic message: "Cannot update the system reserved partition".

A little bit of research internally and on the Web exposed that the message appears when the system partition is full. To see the details of your drive: partitions, volumes and all, the command "diskpart" is your friend.

First list the volumes on the disk you're interested in:

diskpart> list volume

Some of the volumes might not have a drive letter associated with them. You can assign drive letters to the volumes you'd like to explore through

diskpart> select volume=N

Then assigning a drive letter through

diskpart> assign letter=E

Now you can look around the drive and figure out how to create some free space for the install, and all will be well.

Saturday, June 27, 2015

Back to Windows

I joined Microsoft a couple of weeks ago, and after 15 years of only using Macs and OS X for my work and personal computers, I traded both for their Windows equivalent.

For my laptop, I got a Lenovo X1 Carbon that is growing on me. It reminds me of my Mac Book Air: it is very light, with a great design, and an excellent battery life—with a ton of applications running, I usually get 6 or 7 hours of battery life. And the icing on the cake is the touch screen, and the great tactile feel ThinkPad keyboards are well known for.

The laptop had Windows 8.1 on it, but since I have access to any Windows build, I decided to image it with a prebuild of Windows 10. After a couple of weeks of usage, I can safely say I like Windows 10. It has definitely come a long way since the last time I used it 15 years ago. 

For one, hibernation and wakeup are much faster now, and file operations—which I remember were slow and infuriating, are now acceptable. I also like Cortana, and the new Edge browser. But what I am enjoying the most is the 1st class platform support for all drivers and applications I can think of. Connecting my old and aging printer was no longer a painful process, and using the newest Logitech wireless headphones was a breeze.

There are a couple of things I miss about my Mac, but they are both minor. One is the native Unix environment, but Cygwin and Cygwin-X are a good substitute. Also running Linux under Hyper-V is a good alternative. The second is swapping the Caps Lock and Control keys—which makes using Emacs a lot easier. On OSX it is an easy task through the keyboard settings, while on Windows I recall it is a convoluted editing of a registry key.


Monday, June 15, 2015

Adventures restoring the Mac Book Air from a Time machine backup

Taking backups with Time Machine on Mac OS X is a breeze: you plug in the backup drive, and wait for the magic to happen. Restoring the backup to a misbehaving laptop though appears to be a different story. I had to go through multiple iterations before I finally got the data back on the laptop. Since my backup setup is not atypical with the exception of an encrypted drive and backups, I was surprised it took that many times to successfully restore the data.

Initially I tried restoring the backups by booting the Mac in recovery mode, and using the restore from Time Machine option. The restore started, but after roughly 12 hours it silently failed.

For my second attempt I decided to install Yosemite from scratch and use the user migration assistant to recover my data. After progressing for a long time, the restore silently failed as well.

My third attempt was a bit more drastic: I wiped out the drive, and attempted to restore the backup from Time Machine. That too failed after progressing for roughly 12 hours.

For my final attempt I decided to wipe out the drive, reformat the drive to a different file system--case-sensitive journaled unencrypted file system, install Yosemite from scratch, and use the user migration assistant to recover the data. For some reason that worked, and after the migration was complete, I turned on File Vault to encrypt the drive, and everything was back to normal again.

Sunday, May 10, 2015

My very old Samsung ML-1210 printer

I have an old Samsung ML-1210 printer that is more than 10 years old. Despite its age, the printer is still in an excellent condition: it prints high quality PDF documents reliably albeit slowly, and satisfies my needs. But because of its age, it is hard to find a working native driver for Mac OS X, as Apple stopped supporting the printer in Lion, and Samsung stopped supporting the drivers after 2005.

I could of course retire the printer, and buy a new one that is faster, and cheaper, but I tried to find another route where I continue using my old trusty printer. Luckily, one exists through CUPS on Mac OS X, by using ghostscript, foomatic-rip, and samsung-gdi. The packages on the page are for Mac OS X Lion, but I found them to work for later versions as well. The only inconvenience of using these packages is that you have to re-install them after every Mac OS X upgrade.

With the latest Yosemite upgrade,  the packages broke because of sandboxing,  where foomatic-rip fails to find its configuration, and printing fails. Luckily the Apple community discussion forums contain a thread with a bash script that fixes the problem, giving my printer an extended life, and making printing possible again.

By contrast, the printer configuration on my wife's Windows laptop was a breeze; the Samsung driver from 2013 worked like a charm, and printing was flawless.

Perhaps it is time to get a new printer, or switch back to Windows.

Tuesday, April 14, 2015

The corporate athlete

I stumbled upon an old article from the Harvard Business Review about the making of a corporate athlete. The title lured me in; it is always flattering to compare corporate leaders to professional athletes that are admired by the masses.

The thesis of the article is that performance demands on today's corporate leaders rival those on professional athletes, and while the latter get all the training and recovery and support they can before and after competing, corporate leaders do not. In fact leaders are required to perform under stress 24/7 year round with no time to recover or unwind.

The article argues that unwinding and recovery are crucial for peak performance.  The article lists four dimensions of capacity that the leader needs to worry about: physical, mental, emotional, and spiritual capacities. By strengthening each of these capacities, the corporate leader can draw on the separate strengths, and manage to handle work pressure, and performance demands.

The article lists some ways to strengthen these areas, citing coaching examples from the authors' experience. Some of the article's suggestions are: eating healthy meals, exercising, keeping a consistent sleep and wake up schedules, limiting tasks to 90--120 minutes, weight training three times a week, and continuous learning.

Sounds like reasonable advice to me.

Friday, April 3, 2015

Bauman rare books

This year's modern marketing experience conference took place in Las Vegas at the Venetian hotel. The hotel, like other Vegas hotels, contains a lot of fancy stores with prestigious brands. In between the sessions, the conference attendees would walk around the stores perusing the merchandise and enjoying the luxury of the stores.

One of the more eclectic stores in the hotel that attracted my attention was Bauman rare books store.  I have never seen a similar store before, so I decided to go in and check it out. The store specializes in rare first edition books and ones that are signed by the author. I was surprised to see first edition books by Charles Dickens, and John Steinbeck, as well as Winston Churchill and author famous authors.

The store appeals to rare book collectors, and the prices definitely reflect that.  Despite that, the staff were very friendly toward non collectors. They were very engaging and knowledgeable about each book's history and lineage. They explained that the book's high prices is largely based on the condition of the dust cover, and whether the author signed it or not. I wonder if I should try to treat my physical books better now, and get them signed by the author for future prosperity.

Thursday, April 2, 2015

Information asymmetry and how it is changing the world of selling

Daniel Pink's keynote at the modern marketing experience conference was great. In the keynote, he talked about how the world of selling is changing due to changes in information asymmetry--what information is available and to who.

In the older days, the salesperson knew a lot more about the product than the consumer, and when they interacted, the salesperson had to convey a lot of information about the product to the consumer in a very short period of time. This time crunch led to the perception that sales people were pushy, fast talkers, and sometimes sleazy. The perceptions were not helped by an experiment where people were asked what was the first word that came to their mind when they recalled an experience with a salesperson. The top 25 words were not flattering.

In the new world, where information is at everyone's fingertips--thanks in a large part to the Internet, and search engines--the information asymmetry shifted the other way, tipping more toward the consumers, who often end up knowing more about the product that they would like to purchase. Pink argues that this shift in information asymmetry would necessitate a corresponding shift in the selling strategy. The new strategy has to shift from the older days' mantra: "Always Be Closing" to one suitable for modern times.

Pink proposes keeping the acronyms ABC intact, but giving them new meaning: "Attunement, Buoyancy, and Clarity." In attunement, the salesperson needs to put the customer needs first, and take their perspective when thinking about products or services that will help them, and explain them clearly. And in the face of rejection, the salesperson has to remain buoyant, and optimistic.

Wouldn't we all as customers prefer that new style of selling.