Bash scripting 101: Chapter 6: can you html (putting some things together)





ever wanted to make a txt file into a html one...kinda like FAST???

this script is a very very simple html'izer

#!/bin/bash

# make sure <TITLE></TITLE> are in capitals in the header
template_head="template.head" #html template header
template_foot="template.foot" #html template footer

#grab date in a nice pretty format yyyy-mm-dd and time hh:mm:ss
DATE_OUT=`date +%Y-%m-%d`
TIME_OUT=`date +%H:%M:%S`
filein="convert.txt" #file to put with P tags in between header and footer
fileout="output.html" #file to write the full new html page file to

#read the title
echo -e "title of the page will be?: "
read -e TITLE
#read the template header, search it for title tags, input title
# and dump output into file defined in fileout variable
cat $template_head | sed s/"<TITLE><\/TITLE>"/"<TITLE>$TITLE<\/TITLE>"/ >> $fileout

#search for line ends put in a paragraph tag, no need to add a newline, there already is one
cat $filein | sed s/'$'/'<P>'/g >> $fileout
#dump template footer into the html file
# while we are at it look for our custom tag "date" and format it using "tt"
cat $template_foot |sed s/"<CURRENT-DATE><\/CURRENT-DATE>"/"<TT>$DATE_OUT $TIME_OUT<\/TT>"/g >> $fileout
exit 0

  1. #!/bin/bash
    i absolutely positively refuse to explain this one by now!!!

  2. template_head="template.head" #html template header
    set template_head variable so it is the filename template.head

    use an inline comment (#) to explain what it is to the user

  3. template_foot="template.foot" #html template footer

    set template_foot variable to filename (why don't you guess?)

  4. DATE_OUT=`date +%Y-%m-%d`
    grab the date formated as we want (do a "man date")

  5. TIME_OUT=`date +%H:%M:%S`
    ditto for time

    we grab time and date here so they will remain the same, if we embed the date command the seconds might differ

  6. filein="convert.txt" #file to put with P tags in between header and footer
    set the file we read text from

  7. fileout="output.html" #file to write the full new html page file to
    set the file we make put the finished "html product" into

  8. echo -e "title of the page will be?: "
    ask what the title should be

  9. read -e TITLE
    grab the title and put it into variable TITLE

  10. cat $template_head | sed s/"<TITLE><\/TITLE>"/"<TITLE>$TITLE<\/TITLE>"/ >> $fileout
    read file $template_head (which in this script is "template.head")

    pipe it through sed

    s/"<TITLE><\/TITLE>"/"<TITLE>$TITLE<\/TITLE>"/

    the "bolded" frontslashes are the ones that belong to sed, notice that there are more? those also have a back slash in front of them (as you can see) ..this is because if we DON'T do this, sed will think those frontslashes "/" are part of its command line, and they aren't, they are part of html, so..we backslash escape em

    so "sed s/xx/yy/g" in other words s for substitute(search replace kinda(do a "man" on "sed")) and g for global (otherwise sed will only take the first "hit" it finds)

    ie. dump template.head, look through it for <TITLE></TITLE> all alone on a single line and put our TITLE variable in between them if we find them, plop it all into output.html (fileout variable refers to that file)

  11. cat $filein | sed s/'$'/'<P>'/g >> $fileout
    dump the contents of filein (convert.txt), pipe it through sed, look for '$'

    WHOAA whaddayamean dollar??? ahhh '$' is "sed speak" (regex) for end of line (EOL)

    put in a html P tag along with the EOL and dump it to output.html

  12. cat $template_foot |sed s/"<CURRENT-DATE><\/CURRENT-DATE>"/"<TT>$DATE_OUT $TIME_OUT<\/TT>"/g >> $fileout

    dump the footer , pipe it through sed, search for our "custom tag" <CURRENT-DATE><\/CURRENT-DATE> on its own line, replace it with TT tag formatted date and time, and dump it all to output.html

    the sed line might need some highlighting, i'll just mark(underline (don't mistake the 2 underscores for underlining)) the parts belonging to sed (not the backslash escaped html stuff)

    s/"<CURRENT-DATE><\/CURRENT-DATE>"/"<TT>$DATE_OUT $TIME_OUT<\/TT>"/g

    so the s/ / /g parts belong to sed, the rest are just to make sed not interpret and to make sed find what we want and put into the area what we want

  13. exit 0

now if you were to add it all up, enter "my wild page" when asked about title and have these:

template.head:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd"> <HTML> <HEAD> <TITLE></TITLE> <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1"> <META NAME="Generator" CONTENT="EditPlus"> <META NAME="Author" CONTENT=""> <META NAME="Keywords" CONTENT=""> <META NAME="Description" CONTENT=""> </HEAD> <BODY BACKGROUND="" TEXT="" LINK="" VLINK="" ALINK="" BGCOLOR="" BGPROPERTIES="">
template.foot:
the current date is:
<CURRENT-DATE></CURRENT-DATE>
<P>
</BODY>
</HTML>
convert.txt:
so what do i do here?

and why should i?

wow strange huh?
output.html will become something like:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" 
"http://www.w3.org/TR/html4/loose.dtd">
<HTML>
<HEAD>
<TITLE>my wild page</TITLE>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">
<META NAME="Generator" CONTENT="EditPlus">
<META NAME="Author" CONTENT="">
<META NAME="Keywords" CONTENT="">
<META NAME="Description" CONTENT="">
</HEAD>

<BODY BACKGROUND=""  TEXT="" LINK="" VLINK="" ALINK="" BGCOLOR="" BGPROPERTIES="">

so what do i do here?<P>
<P>
and why should i?<P>
<P>
wow strange huh?<P>
the current date is:
<TT>2003-07-29  14:14:51</TT>
<P>
</BODY>
</HTML>
so, is this a clever way to do it?

umm YUPS and NOPES *lol* (heh did i make ya type it in?...did i did i?)

actually there is learning in entering it too, the reason being hardcoded filenames CAN be good

if you for instance wanted this to be for users to convert only one file in their home directory to html, then it might be good

on the other hand if you want the full power of it, then it's NOT good

it does one file at a time, it forces you to empty and re-edit convert.txt between files and to move output.html to avoid it being added to

mmmKAY so what WOULD be powerful?

aha..sometimes one can be too clever...the above script is such an example

#!/bin/bash

# make sure <TITLE></TITLE> are in capitals in the header
template_head="template.head" #html template header
template_foot="template.foot" #html template footer

#grab date in a nice pretty format yyyy-mm-dd and time hh:mm:ss
DATE_OUT=`date +%Y-%m-%d`
TIME_OUT=`date +%H:%M:%S`

#read the title
# we comment out echo to make sure it doesn't echo to the html file, this costs us the prompt, but affords us power
#echo -e "title of the page will be?: "
read -e TITLE

#read the template header, search it for title tags, input title
# and dump output into file defined in fileout variable
cat $template_head | sed s/"<TITLE><\/TITLE>"/"<TITLE>$TITLE<\/TITLE>"/

#search for line ends put in a paragraph tag, no need to add a newline, there already is one
cat $filein | sed s/'$'/'<P>'/g
#dump template footer into the html file
# while we are at it look for our custom tag "date" and format it using "tt"
cat $template_foot |sed s/'<CURRENT-DATE><\/CURRENT-DATE>'/"<TT>$DATE_OUT $TIME_OUT<\/TT>"/g
exit 0

HEY! dude..i'm SO on to you..you CHEAT you just cut down the other script!!!

umm yeah..i did..good you noticed, now i don't have to explain its workings ;-)

what i will explain is the commandline behaviour of it instead

say we call this new script "htmlmaker2"

htmlmaker2 < convert.txt 
would convert convert.txt and output it to the screen

bu bu bu bu but!!! whatta bout the "read -e title" bit!!!

aha!!!..first line gets read into that..so put the title in the first line and yer covered

htmlmaker2 < convert.txt >> output.html
would use convert.txt for content and put it in a file called output.html

but it's so much simpler to type "htmlmaker" and have it do it all!!!

sure..for ONE file..now do a directory of 50 txt files?...OUCH huh?

remember we said early on most of this stuff works on commandlines too?..i didn't say that? well i thought i did..and now i did!

for FILENAME in `ls *.txt`;do ./htmlmaker2 < $FILENAME >> $FILENAME.html;done
will convert all files ending in .txt to html (in the folder you are in), using the first line as title, date stamp them AND save them as .txt.html, ie add .html to the existing filename

ie a content of blabla.txt and bloblo.txt and hottatotta.txt will after running this also contain:

blabla.txt.html, bloblo.txt.html and hottatotta.txt.html with html'ized versions

yes this is a very simple html'izer

can we make it "work prettier"..sure can!

if done like below it will put html copies of all txt files in a folder, filename.txt becoming filename.html

for I in `ls *.txt`;do ./htmlmaker2 < $I >> `echo $I | sed s/'\.txt'//g`.html ;done
notice, we are using a for/do/done on the commandline and actually using 2 commands to be executed before the main line is done

first the ls *.txt is done, then it's fed through "for", which feeds the full filename into htmlmaker2, and then executes a small "sub program" / "extra command" again and filters the .txt out of the filename it's fed and uses that with a .html for output, rinse and repeat *smiles*

i'm leaving it as an exercise for the student to put this INTO the script if they want to

now i'll end this lesson before someone smacks me and go think about what i'll do in lesson seven




bash scripting 101: Chapter 5: WHILE we are at it and UNTIL we are done (more loopy loops) Bash scripting 101: Chapter 7:just in CASE you need it



w0nderer