Monday, March 14, 2011

Wanted: A mystery science theater for stats and programming books.

Wanted: A mystery science theater for stats and programming books.

Looking for good introductory R books for non-programmers, (Know any? Books, I mean, not non-programmers.) I ran across this wonderful review of an apparently ludicrous book.

Statistical Analysis with R: Beginner’s Guide by John M. Quick

Summary: If you can get past the strange underlying story, then this gives a good introduction to R to someone with no programming experience.

[...]

[B]efore describing the data analysis section of the book, I should explain the underlying story used throughout the book. The introductory chapter gives a bit of ancient Chinese history, and states that you, the reader, have been chosen to succeed the famous military leader Zhuge Liang and need to learn how to use R to analyse his data and plan the future of the military campaign.

The rest of the book takes on this theme, both in the data analysis (comparing the Shu and Wei armies, and predicting battle outcomes using regression) and the general phrasing (headings like “Have a go hero!” and emphasis that if you fail the Chinese kingdom will collapse).

Wow.

Friday, March 11, 2011

Scalable models for education

via my good friend Steve Stay:

The White House has a new initiative out: Winning the Education Future - The Role of ARPA-ED (16 pages)
To address the under-investment in learning technology R and D, the President’s FY2012 budget proposes to invest $90 million to create an Advanced Research Projects Agency for Education (ARPA-ED). ARPA-ED will fund projects performed by industry, universities, or other innovative organizations, selected based on their potential to create a dramatic breakthrough in learning and teaching.
ARPA-ED is trying to catalyze development of:
* Digital tutors as effective as personal tutors.
* Courses that improve the more students use them.
* Educational software as compelling as the best videogame.

The intuition here is really interesting -- building technologies for education that scale. Our ed system is based on a 150-year-old chalk-and-talk delivery system. One teacher, ~20 students, one lecture at a time, interspersed with homework, projects, and some small group discussion. I'm stylizing here, but you know I'm not too far off.

The major bottleneck in this system is effective teachers. Roughly speaking, the number of students getting a good classroom education is equal to the number of good teachers times twenty. Given the tremendous difficulty of recruiting, training, and retaining teachers, this is a serious constraint.

Therefore, I am a huge fan of the idea of deploying better education via scalable technology. But I don't think we've seen this done right yet. Some also-rans:
  • i-tunes U and similar platforms strike me as a partial first step -- maybe a good replacement for lectures. Suppose I'm a middling public speaker with reasonably good subject knowledge in an area. Why should I lecture when my students can hear from a top-of-the field virtuoso? But education is a lot more than content delivery. itunes can't give you feedback, answer questions, or hold office hours.

  • In the long run, private online universities may be able step up to provide scalable education. But so far I've been pretty unimpressed. For instance, University of Phoenix seems to be running a pump-and-dump model based on government subsidies through student loans. (Maybe I'm out of line on this claim, but that's my impression.)
I think we're still waiting for the real thing in scalable education. What possibilities am I missing?

Wednesday, March 9, 2011

Three cheers for crowd source/ open source

Google is offering $10k in prizes for visualizations of the U.S. Federal budget.

R has overtaken Matlab and SAS in popularity

HP is now including a Linux-based OS with every PC it ships.

Information visualization and the Battle of the Atlantic

I've been reading Churchill's 6-volume history of WWII. Fascinating reading if you're into this kind of stuff. Some scattered thoughts on history, war, and information:

WWII was arguably the first war fought through information as much as weaponry. One of Neal Stephenson's characters in Cryptonomicon has a great monologue on this point. He claims that Nazi Germany typifies the values of Ares (you know, the Greek god of war), and the U.S./U.K. typify the values of Athena. In this telling, WWII Germany had an advantage in guns and regimentation, but the proto-hacker cryptographers of Bletchley Park, etc. ran rings around them with information. I recommend the monologue, but not the whole book.

This comes out in Churchill's narrative. Exhibit A is a set of FlowingData-esque maps of merchant ships sunk by U-boats throughout the war.

A little background: in the middle part of the war (once France had been defeated, but before Russia and the U.S. had entered) the "Battle of the Atlantic" was probably the single most important "front" in the war. As long as England was connected to her colonies by convoys of merchant ships, she could continue to fight. If bombing and U-boat action could constrict this flow of trade sufficiently, the little island would have no chance.

Exhibit A: (scanned on the cheap with my pocket digital camera)
















Charts like these make it clear that Churchill was interacting on wartime data on a day-to-day basis, and that that flow of information was crucial to war effort. Churchill likes to attribute success to the bulldog-like grit and willpower of the British people, but it's clear from his narrative that the flow of information was at least as important. In war, grit doesn't matter much without gunpowder.

In addition to maps, Churchill gives statistics and monthly trends for various gains and losses in shipping. They remind me of post-game trend plots in Starcraft II. The general tension between military and economy is the same. They also remind me of the dashboards that are all the rage in business process management these days. 50 years ago, you had to be a superpower at war to devote these kinds of resources to information gathering. Now, any reasonable-sized business has them. Heck, even this blog is hooked up to sitemeter. Map of the world, populated with little dots? Check.

Tuesday, March 8, 2011

AI for rock, paper, scissors

A really simple and clever demonstration of another place AI is surprisingly good at beating people: a rock-paper-scissors game on NYTimes. Don't laugh. The computer analyzes your past play and looks for weaknesses to exploit. I played it 100 times and lost 23 to 35, with 42 ties. (I said don't laugh!)

Nifty application. Nice design. The "see what the computer is thinking" is a great way of explaining how the AI works. Much better than saying it's a 4th-order Markov process with backing off.

The small print on the side of the screen is revealing (emphasis mine):
Note: A truly random game of rock-paper-scissors would result in a statistical tie with each player winning, tying and losing one-third of the time. However, people are not truly random and thus can be studied and analyzed. While this computer won't win all rounds, over time it can exploit a person's tendencies and patterns to gain an advantage over its opponent.
This is nice evidence that people are surprisingly bad at being unpredictable.

PS. I ran the numbers. The odds of me doing so poorly against the computer in an even game (one third wins, one third losses, one third ties) are about one in 60, a p value of 0.015. Not impossible, but extremely unlikely. So the smart money is on the computer.