theether.org: September 2004 Archives

September 20, 2004

LA

New LA pictures posted! While in LA, I highly recommend the seven-course prix fixe menu at Hayakawa! :D Courses 1, 2, 3, and 6, which include such delicacies as ankimo (monkfish liver) with caviar and halibut fin sushi, are shown in the pictures.

Posted by bnc at 04:25 AM

September 11, 2004

Engagement

Engagement pictures from Pacific Grove posted!

Posted by bnc at 09:24 PM

September 05, 2004

Metreon


Escaping the heat at $60/hour at the Metreon.

Posted by bnc at 06:22 PM

amazon.com hack

Here's a useful recipe for automatically obtaining the entire text of many books on amazon.com:

     look up desired book
     click on search inside
     search until last useful page p (e.g., before the index)
     text = ocr(page p)
     str = least probable n-gram(text)
     while pages left
         url = search on str (full-text search - YEAH!)
         get pages p, p+1, p+2 
         text = ocr(page p + 2)
         str = least probable n-gram(text)

What's needed is a good, free OCR program and a decent text data set to help compute least probable n-grams in written works. Anyone know of a good OCR program for Linux? I've tried a few and they all seem to do really badly. Isn't this a standard machine learning 101 programming assignment? :P

PS On a completely unrelated note, "Elements of Information Theory" is now selling for 60% off ($37.80 as opposed to $94.50 previously) for whatever reason! Given the number of bugs on amazon lately, I wouldn't be surprised if this is unintentional (couldn't find anything on an imminent 2nd edition for example). :D (Note: There apparently is a 2nd edition to be released in May 2005.)

Posted by bnc at 01:12 AM | Comments (3)