English Deutsch Français Italiano Español Português 繁體中文 Bahasa Indonesia Tiếng Việt ภาษาไทย
All categories

I would like to make a web crawler that will save URL's in a text file. And avoid websites with the robots.txt file.

I do have PHP enabled web space.

I would like a web crawler that starts with a list of sites, and finds URL's in those sites, and then goes to the URL's of the sites it found, and then go to the URL's that it found after that, and so on...
All put into a text file.

2006-07-17 22:40:08 · 2 answers · asked by TS2 3 in Computers & Internet Internet

2 answers

Web site crawling is an heavy task. It is better to rely on a dedicated program for that purpose. I recommend Ht://Dig . http://www.htdig.org/

If you want to use it from PHP, you can use this class that simplifies the configuration, crawling and searching interface of sites using Ht://Dig: http://www.phpclasses.org/htdiginterface

2006-07-18 08:32:57 · answer #1 · answered by Manuel Lemos 3 · 0 0

become very experienced in php or c++ or python or some other advance language

2006-07-18 07:11:28 · answer #2 · answered by mathiasmj2003 2 · 0 0

fedest.com, questions and answers