PHP – Convert Plain HTML into XHTML

The following php example helps you to convert invalid HTML to XHTML:

 * Tries to make out of the given string a xhtml-compatible string, 
 * that means that the return-string could be wrapped within a xml. 

For Example:

$string = "<tag><I>phpmoot<br></tag> blabla" 
return = "<tag><i>phpmoot<br /></i></tag> blabla" 

As you can see, the following will be changed: 
- Converted "<I>" to "<i>", as all html-nodes will be converted to lowercase 
- Converted "<br>" to "<br />" to make br xhtml-conform (same would be done with "img") 
- The closing of "</big>" will have do closing of "<i>" because <tag> surrounds

Code Example:

function autoAdjustToXHTML($string) { 
    while ($string!="") { 
        if ($pos===false) { 
            // no "<" found, return full string and exit while 

        // copy anythink up to the "<" into return string and reduce string 

        // examine tag-name 
        $i=0; $c=""; 
        for (;$i<strlen($string);$i++) { 
            if (strpos(";<",$c)!==false) continue;    // some chars we ignore 
            if ($c==" "|| $c==">") break; 

        $tagName=strtolower($tagName);    // convert uppercase  

        // is there a closing tag? 
        if (substr($tagName,0,1)=="/") { 
            // search for the closing ">" and ignore all before 
            if ($pos===false) $pos=strlen($string); 
            // close as many tags up to the given tag 
            while ($stackPoint>0) { 
                if ($stackElement==$tagName) break;    // nothing more to do in this while 
            continue;    // this element-Processing is finished so far, continue at 'while ($string!="") {' 

        // if we are here, we are within a tag (opening tag) 
        // Push tag on stack 

        // add tag to returnString 
        $returnString.="<$tagName "; 
        // search up to ">" 
        $inApo=false;    // within Apostrophes (" or ') 
        for (;$i<strlen($string);$i++) { 
            if ($fakeApo && strpos(" \t",$c)) { 
            if (!$inApo && !$fakeApo && $c=='=' && !strpos(" \t'\"",substr($string,$i+1,1))) { 
                echo "test2=".substr($string,$i+1,1)."---"; 
            if ($c==$apoChar && $inApo) { $inApo=false; $returnString.=$c." "; $c=""; } 
            if ($inApo && $c=="&" && strpos(substr($string,$i).";",";")>5) $c="&amp;"; 
            else if (($c=="'" || $c=='"') && !$inApo) { $inApo=true; $apoChar=$c; } 
            if ($c==">") break; 

        // new $string is the rest 
        // check if this has a "/>" at the end 

        // some elements must have a "/" at the end --> Fake it if needed 
        if (($tagName=="br" || $tagName=="img") && !$endSlash) {  
        // check if it is a remark-line (<!-- --> trade this as with-endSlash-Tag) 
        if (substr($tagName,0,3)=="!--") { 


            if (substr($returnString,strlen($returnString)-2)!="--") 
               $returnString.=" --"; // make sure remark-line ends with "-->" 


        // again, do we have a end-slash? (or a faked one?) 
        if ($endSlash) $stackPoint--;    // just remove element from stack 
        // and now add the closing ">" 

    // ok, we are allmost finish, just clean up the elementStack (from last to first) 
    while ($stackPoint>0) { 
    return $returnString; 

Post to Twitter Post to Digg Post to Facebook Post to Google Buzz Send Gmail