Wednesday 29 May 2013

The World through Geoguessr

Some time before the Last War, I went on a field trip to gather population genetic data in the Republic of Cabo Verde with two American colleagues. This ex-Portuguese colony is an archipelago 500km West of Senegal off the coast of Africa.  In the evenings two of us would play patience - the one everyone plays.  It passed the time and saved us from talking to each other.  Apparently you can play this in some casinos by paying $10 for a shuffled pack of cards and winning $1 for each card that you can pile on the aces.  We had a discussion about what the odds were on the casino game - which are rather more difficult to calculate than the probabilities of the roulette wheel delivering red (18/37 giving the house an edge of just under 3%).  You can't leave that sort of question unanswered, so when I got home I played the game 100 times on the trot and recorded my score.  In those 100 games, I got the whole pack 'out' for a payola of $52 only 3 times and the other 97 games paid out in dribs and drabs, so I finished up losing about $120 on my investment of $1,000 in play-money. I'd calculated the odds, to my own satisfaction, by simulation; and a  perfectly good and widely employed scientific method it is too.  The take-home is: stick to roulette at casinos, you get better odds and,  especially if you wear a tuxedo, you look more like James Bond and less like Johnny No-Pals playing cards with yourself in a corner.

Geoguessr is, like Klondike solitaire, a way of passing the time. Try it before reading on? A friend of mine had one hit early on at 240m from target (6500points!!) and got a bit hooked.  After his 4th report of his latest score, I told him "STOP now. That way madness lies, you'll have to stop or the wind will change when you're geoguessring and you'll find yourself sitting on a rock in the Atacama Desert wearing a sack."  But there is a mort of educational value in the pass-time.  It really helps to be able to read cyrillic characters, for example, so when you see a road sign saying Владивосток, you know it's not near Athlone.  You get a higher score if the distance is small between the picture displayed and your guess about its location.  So the two are inversely related.  The largest distance possible on the planet is half the circumference of the earth - 20,000km.   I don't know what score you get for that but my worse punt was 16,600km adrift and clocked a pathetic 45 points so maybe 20Mm is rated as 0.  Playing a few games with Dau.II was good fun and our best single guess was 160m (that's 0.16km) from target and netted (whoohoo) 6466 points.  Being a scientist obliges you to try to turn footling anecdotes like that into DATA.  Accordingly, like my sad-sack card-turning with Klondike after the Cabo Verde trip, I sat down and recorded the results from several games:
Us population geneticists are quite interested in the distribution of data and this graph shows an interestingly discontinuous pattern.  The points at the bottom left are all bad luck when presented with a paved road cutting through an uninhabited red laterite landscape and coin-tossing Australia rather than South Africa or vice-versa.  The ones at the top left are when you are beside the Lincoln Memorial in Washington DC, or outside the Walla Walla Vinyard Inn in the other Washington.  The middle bunch represent guesses in the 100 - 1000km range.  You know the country from the road or shop signs, you can guesstimate the latitude and some other facts from the vegetation.  You know that if the picture is pancake flat you're either in Ohio or near the Oder River.  Any mention of silver and you can bet real silver that you're in Nevada or Colorado.  Flags (North Americans are all mad about the ould flags) are, of course, a dead give away.  So that's quite a lot of information about the political and physical geography of our planet, that can narrow the sensible range
Notice the other weird thing about the distribution of points (apart from the patchiness due to the nature of the game)?  It's not linear.  You get way more points than you deserve for getting up close and personal with the target.  The lads at Geoguessr must have a formula to assign scores, and curve above looks logarithmic.  So let's transform the data by taking the natural log of the score:
That certainly makes the relationship look a lot more linear.  Although there is still a distinct fillip up to your score when you're less than 5 or 10 km from target.  I knew I needed more data and particularly data in the range 2,000-12,000km, so I went back to the game.  I had to make some absurd guesses - mid-Atlantic or halfway between ZA and Oz - to get some points in the desired range:
Which when the score is log-transformed shows:
The wonky double bulge in this line indicates clearly that the algorithm/formula is not straight logarithmic with a bonus for really-close. It must be more complex. And I'm sure the answer has been posted somewhere. A final picture shows a zoom into the top left corner of the last picture
What does that all amount to?  My advice, if you care about really high scores, is to work hard to get your distances down from 100km to 10km or 1km, because that has more impact than the much larger change from 1000km to 100km.

Now I'm back off to Кыргыз Республикасы.


  1. Moral of the story: if it looks flat, dry, hot, red and harsh, it must be Australia. Africa is far more interesting.

    I should never have picked the former.


  2. I am doing a report on this topic and am wondering why you chose to put distance on the x-axis and points on the y-axis???

    1. Because distance is the independent variable (over which geoguessr has no control) and score is the dependent variable (the algorithm of which I am here attempting to deduce). There is a convention that the indep var is X and depend is Y. But I don't feel very strongly about this could be the other way around.- same story.

  3. I took 233 data points from goeguessr games with distances ranging from less than 1 m up to 14000 km. I found that it follows a nice exponential decrease. In fact the expression: points = 5000*exp(-6.707329E-5*d) gave the score to the nearest integer. To turn it into an integer adding 0.5 and taking the greatest integer value in the result gave the exact score in every case. The Excel form looks like this: points = INT(5000*exp(-6.707329E-5*d)+.5)

    1. That's a super-weird exponent. You have to wonder whether the designers are counting in Octal. Me, I use the RAND function a lot in Excel to generate 'real' datasets for my students to plot a noisy straight line.