Random notes mostly on Machine Learning

Exploiting Multiple Machines for Embarrassingly Parallel Applications

During work on my machine learning project I was needed to perform some quite computation-heavy calculations several times — each time with a bit different inputs. These calculations were CPU and memory bound, so just spawning them all at once would just slow down overall running time because of increased amount of context switches. Yet running 4 (=number of cores in my CPU) of them at a time (actually, 3 since other applications need CPU, too) should speed it up.

Fortunately, I have an old laptop with 2 cores as well as an access to somewhat more modern machine with 4 cores. That results in 10 cores spread across 3 machines (all of`em have some version of GNU Linux installed). The question was how to exploit such a treasury.

On Sorting Complexity

It's well known that lower bound for sorting problem (in general case) is $\Omega(n \log n)$. The proof I was taught is somewhat involved and is based on paths in "decision" trees. Recently I've discovered an information-theoretic approach (or reformulation) to that proof.

Namespaced Methods in JavaScript

Once upon a time I was asked (well, actually a question wasn't for me only, but for whole habrahabr's community) is it possible to implement namespaced methods in JavaScript for built-in types like:

5..rubish.times(function() { // this function will be called 5 times
  console.log("Hi there!");

"some string".hask.map(function(c) { return c.hask.code(); });
// equivalent to
"some string".split('').map(function(c) { return c.charCodeAt(); });

"another string".algo.lcp("annotation"); 
// returns longest common prefix of two strings

As you can see at the link, it's possible using ECMAScript 5 features. And that's how:

Crazy Expression Parsing

Suppose we have an expression like (5+5 * (x^x-5 | y && 3)) and we'd like to get some computer-understandable representation of that expression, like:

ADD Token[5] (MUL Token[5] (AND (BIT_OR (XOR Token[x] (SUB Token[x] Token[5])) Token[y]) Token[3])

In case if you don't know how to do that or are looking for the solutin right now, you should know that I'm not going to present a correct solution. This post is just a joke. You should use either a Shunting-yard algorithm or a recursive descent parser.

So if you're ready for madness... Let's go!

Memoization Using C++11

Recently I've read an article Efficient Memoization using Partial Function Application. Author explains function memoization using partial application. When I was reading the article, I thought "Hmmm, can I come up with a more general solution?" And as suggested in comments, one can use variadic templates to achieve it. So here is my version.

Resizing Policy of std::vector

Sometime ago when Facebook opensourced their Folly library I was reading their docs and found something interesting. In section "Memory Handling" they state

In fact it can be mathematically proven that a growth factor of 2 is rigorously the worst possible because it never allows the vector to reuse any of its previously-allocated memory
I haven't got it first time. Recently I recalled that article and decided to deal with it. So after reading and googling for a while I finally understood the idea, so I'd like to say a few words about it.