English Deutsch Français Italiano Español Português 繁體中文 Bahasa Indonesia Tiếng Việt ภาษาไทย
All categories
0

Hi
Trying to extract partial data from html page i saved this as text.html and opend it in internet explorer., doesnt work:

"""
Packages: text.html
"""
""" Rough functions for extracting data from HTML documents.
Glyn Webster 1999-04-27
"""
import re
_mode = re.IGNORECASE | re.DOTALL
_body = re.compile(r'(.*)', _mode)
_title = re.compile(r'(.*)', _mode)
_meta = re.compile(r' def body(html):
" Returns the body of an HTML document. "
m = _body.search(html)
if m:
return m.group(1)
else:
#If there's no tags then whole thing is a body:
return html

def title(html):
" Returns the title of an html document. "
m = _title.search(html)
if m:
return m.group(1)
else:
return ""

def meta(html):
" Returns a dictionary of tag data that maps NAME onto
CONTENT. "
tags = {}
pos = 0
while 1:
m = _meta.search(html, pos)
if m:
tags[m.group(1)] = m.group(2)
pos = m.end()
else:
break
return tags

kind regards

Tovia

2007-08-25 07:41:21 · 2 answers · asked by jam 5 in Computers & Internet Programming & Design

2 answers

I dont think u could extract a data successfully from an internet webpage always. As all webpages do not allow opening of their page source. Also some webpages are not created by users rather they use web editing softwares.

2007-08-25 07:46:50 · answer #1 · answered by Ajay-webmaster 2 · 0 0

I'm not quite sure what you're trying to do here. This is a collection of utilities that you would use in a Python script, not an HTML page. It's not even a complete script - there is no main function.

To use this properly, you would need to write a Python script that loads the web page, uses the utilities above to extract the various parts of the page, then does something with the result.

Perhaps you could provide a bit more detail about what you're trying to do, then we can help further.

2007-08-28 11:56:11 · answer #2 · answered by Daniel R 6 · 0 0

fedest.com, questions and answers