I would like to strip strings between html tags in a string using linux commands. Eg <name>Simon</name> ???

0 ⤊

I would like to strip strings between html tags in a string using linux commands. Eg <name>Simon</name> ???

Anyone know how to do this easily with awk, grep, sed or the likes.... ?

2007-11-23 02:27:45 · 2 answers · asked by Simon Dodd 1 in Computers & Internet ➔ Software

2 answers

The easiest would be using html2text
I have it as a pipable command on my linux (debian) It might also be htm2txt or some variant.

Or here is a python script
http://www.aaronsw.com/2002/html2text/

Here is a Perl script
http://www.greenend.org.uk/rjk/2000/10/html2text.html

Here is one in C
http://www.mbayer.de/html2text/

Altho I agree that awk and sed should be able to do it, and I prefer them also, for some reason I didnt find one in either scripting language

2007-11-23 03:13:48 · answer #1 · answered by Gandalf Parker 7 · 0⤊ 0⤋

I suppose you could use grep and pipe the results to cut.

grep \ xml.list | cut -d\> -f2 >outlist1

The blackslashes are used to escape the < and >.
Cut uses the delimiter > and so it should print Simon to outfile.

You then repeat the procedure (second pass) to get rid of
grep \<\/name outlist1 | cut -d\< -f1 >outlist2

2007-11-23 11:10:25 · answer #2 · answered by ray_diator 7 · 0⤊ 0⤋