Search This Blog

Sunday, December 5, 2010

Blind men with jetpacks and altimeters...

I've been running MrBayes practically non-stop since I cleaned the last of the viruses off my PC.  For those who don't know, MrBayes is a phylogenetic software program that uses Markov Chain Monte Carlo methods in parallel to locate the most likely phylogenies for a given data set.

Most of you probably just went: "Markov...  Monte...  what?"

I don't blame you, so this is my explanation of what happens.

Imagine all of the possible phylogenies--ways a series of gene sequences might fit together--are actually a piece of land with mountains on it.  The mountains--the height above the ground--is how probable THAT phylogeny is given the sequence.  The higher the mountain, the more likely it's the right phylogeny.

So, the simple thing is to just look out over the landscape and pick out the highest peak, right?

One problem, it's pitch black and you can't see your nose in front of your face.

So, how do you find that peak?

Well, you recruit yourself some people--blind men--and you give them altimeters, radios, and jetpacks.  Their job is to go out, push the button on their altimeter and get a height.  Then, they take a step in a random direction and check their altitude again.  If they went up, there's a small chance they hit the gas and bounce away on their jet pack.  If they go downhill, the chance gets larger.

So, simple solution, you send out your 4 man team, they stumble around in the dark like blind men do and keep radioing in their location and altitude.  Except that would be a huge number of positions.

Instead, you have them only call in every thousand steps.  I mean, hey!  you jump more if you do downhill, jump a whole lot less when you're uphill, and that means--ideally--these guys will be walking around the hilltops in no time and stay there.

Only, how do you know THIS hilltop is the highest when you can't see?

Easy, you do two things.

One is, every 1000, the team members compare how high they are and the lowest swaps places with the highest.  This means--as far as the ex-lowest guy is concerned, he's now higher on his next step.  The ex-highest guy, however, is guaranteed to jump because he's now lower.  Sort of mixes things up, see?

The second thing is to bring two teams.  Both teams wander around on their own, totally oblivious of each other.  However, what you can do, is track and compare the heights they are reporting.  If Team A keeps reporting really different altitudes than Team B, you know that one or both just hasn't found the tallest peaks.  If that happens, you just send them more fuel, some hot chow, and tell them to keep trucking.

If instead, after a long, long time, both teams are reporting almost exactly the same heights...  You got a good chance they actually found the highest peak because that's the only place on the ground where there is an absolute upper limit to how high they can go.  If there were someplace higher, more than likely, given enough time and steps, even a blind man can find and climb Everest.

So, when you're done, what do you have?

You have several hundred or more locations and altitudes.  The first few--usually about 25%--are going to be crappy.  They'll be all over the place, but probably not that far up the mountain.  This is called the "burn-in" where your teams are getting their bearings.  So, you just rip those out and ignore them.  The rest, the last 75% or so, those are what you're after.  Those are the combinations of trees and other factors that "make the most sense" together and are most probably. 

But, with that many--say, 750 on a 1,000,000 step run--what can you do?  Why can't you just pick "the ONE" and be with it?

First, there's no guarantee your teams found the highest of the high.  Maybe they found K2 instead of Everest.  Maybe they just didn't get that lucky or Everest is on a diet and got really, really skinny and hard to find.  So, if you pick only one, you have no idea how much of the terrain you're actually looking at.

Instead, with 750 reports, you can identify where your team spent most of their time.  If there is--in fact--an Everest around and it's significantly higher than K2, you'll probably find reports telling you about one or more of your blind bouncers roaming around the shoulders of Everest.  Additionally, if all you find is that one, single major peak and all of your junior jetmen spent ridiculously large amounts of time and effort crawling around on the top of it, you can argue with a straight face you got the best and only one... or damn close to it.

So... that's why MrBayes makes me think of blind men with jetpacks and altimeters...

No comments:

Post a Comment