[mdlug] [OT] Yahoo web server detects my java bot HTTP request

R KANNAN rk111810 at gmail.com
Sun Feb 5 09:17:46 EST 2012


Hello,

I have a java code that makes HTTP requests to finance.yahoo.com and parses
the returned page for net asset value and write to a text file in a format
that can be read into Quicken. So basically it is a crude way of getting
stock prices.

It has been working fine since 2004 with a few tweaks for search strings
whenever yahoo makes a change in their web pages.

It stopped working on Wed (2/1) and I thought it is a matter changing
search string to fix it.  But on debugging the code, I found that the web
server was not returning most of the page when requested from the Java
code. It was returning HTTP comments like

<!--> robot0 <-->

in web page segments where the price was supposed to be, whereas the page
source from the browser looks fine.

I switched to money.cnn.com which doesn't detect my bot and it seems to
work for now.

I am just curious how the webserver detected that the request (perhaps it
was looking for browser properties: Firefox/IE/Opera , version etc.) and
how it can be included in my HTTP requests.

Thanks



More information about the mdlug mailing list