English Deutsch Français Italiano Español Português 繁體中文 Bahasa Indonesia Tiếng Việt ภาษาไทย
All categories

When you go to CNN for example the browser is updated every few minutes. I would like to automate saving the page every so often throughout the day in order to parse the text and aggregate the data. If I can do it from the command line, I can automate it.

Thanks for any suggestions.

2007-02-05 10:13:56 · 1 answers · asked by the_pharaoh109 4 in Computers & Internet Other - Computers

1 answers

You could build a script and use telnet to get the data. For example, open a command prompt window and enter the following:

telnet www.yahoo.com 80
GET


and what will be displayed next is the HTML code for the Yahoo! home page. So, if you can get the home page that way, then the line that says GET can include a specific page you want to capture. For example, to capture the page on Yahoo! Finance that shows the Dollar to Yen conversion information, you would do the following:

telnet finance.yahoo.com 80
GET /s=USDJPY=X


and voila! you've got the page coming back to you. You can redirect that into a file and parse out what you need. Wrap the whole thing in a times loop and your golden.

Oh, and how did I know the /s=USDJPY=X would get the USD to Yen conversion? It's the portion of the URL after host name.

Note that this doesn't work for all pages; Some pages are constructed so that you have to use a POST (and construct a web form) and some use ports other than 80.

Good luck!

2007-02-06 18:52:29 · answer #1 · answered by BigRez 6 · 1 0

fedest.com, questions and answers