Hi! While using excerpt tag with truncate chars turned on i've noticed that html entities(like " and similar) stored in DB get cut in half. There is a solution but couch source code needs to be modified. Maybe KK decides to put it in next update if he finds it useful. In admin/functions.php there is function excerpt:
Code: Select all
// Truncates a string to the given length
function excerpt( $str_utf, $count, $trail='' ){
    if( @preg_match('/^.{1}/us', $str_utf, $matches) == 1 ){ // quick check for UTF-8 well-formedness
        if( function_exists('mb_strlen') && function_exists('mb_substr') ){
            $strlen = mb_strlen( $str_utf, 'UTF-8' );
            if( $count < $strlen ){
                $substr = mb_substr( $str_utf, 0, $count, 'UTF-8' ) . $trail;
            }
            else{
                $substr = $str_utf;
            }
        }
        else{
            $strlen = strlen( utf8_decode($str_utf) );
            if( $count < $strlen ){
                $pattern = '#^(.{'.$count.'})#us';
                @preg_match( $pattern, $str_utf, $matches );
                $substr = $matches[1] . $trail;
            }
            else{
                $substr = $str_utf;
            }
        }
    }
    else{
        $strlen = strlen( $str_utf );
        if( $count < $strlen ){
            $substr = substr( $str_utf, 0, $count ) . $trail;
        }
        else{
            $substr = $str_utf;
        }
    }

    return $substr;

}


The DB encodes all admin side inputs so the problem exists. We can do the following(line 3):

Code: Select all
// Truncates a string to the given length
function excerpt( $str_utf, $count, $trail='' ){
    $str_utf = html_entity_decode($str_utf); // ADD THIS TO DECODE ENTITIES
    if( @preg_match('/^.{1}/us', $str_utf, $matches) == 1 ){ // quick check for UTF-8 well-formedness
        if( function_exists('mb_strlen') && function_exists('mb_substr') ){
            $strlen = mb_strlen( $str_utf, 'UTF-8' );
            if( $count < $strlen ){
                $substr = mb_substr( $str_utf, 0, $count, 'UTF-8' ) . $trail;
            }
            else{
                $substr = $str_utf;
            }
        }
        else{
            $strlen = strlen( utf8_decode($str_utf) );
            if( $count < $strlen ){
                $pattern = '#^(.{'.$count.'})#us';
                @preg_match( $pattern, $str_utf, $matches );
                $substr = $matches[1] . $trail;
            }
            else{
                $substr = $str_utf;
            }
        }
    }
    else{
        $strlen = strlen( $str_utf );
        if( $count < $strlen ){
            $substr = substr( $str_utf, 0, $count ) . $trail;
        }
        else{
            $substr = $str_utf;
        }
    }

    return $substr;

}


The problem is gone now, but it alters the output of the tag. So we can encode some chars to html entities after sub_str method is done(wrap all outputs of the function with "htmlentities" method):

Code: Select all
// Truncates a string to the given length
function excerpt( $str_utf, $count, $trail='' ){
   $str_utf = html_entity_decode($str_utf); // ADD THIS TO DECODE ENTITIES
    if( @preg_match('/^.{1}/us', $str_utf, $matches) == 1 ){ // quick check for UTF-8 well-formedness
        if( function_exists('mb_strlen') && function_exists('mb_substr') ){
            $strlen = mb_strlen( $str_utf, 'UTF-8' );
            if( $count < $strlen ){
                $substr = htmlentities(mb_substr( $str_utf, 0, $count, 'UTF-8' )) . $trail; // ADD THIS TO ENCODE CHARS TO ENTITIES
            }
            else{
                $substr = htmlentities($str_utf); // ADD THIS TO ENCODE CHARS TO ENTITIES
            }
        }
        else{
            $strlen = strlen( utf8_decode($str_utf) );
            if( $count < $strlen ){
                $pattern = '#^(.{'.$count.'})#us';
                @preg_match( $pattern, $str_utf, $matches );
                $substr = htmlentities($matches[1]) . $trail; // ADD THIS TO ENCODE CHARS TO ENTITIES
            }
            else{
                $substr = htmlentities($str_utf); // ADD THIS TO ENCODE CHARS TO ENTITIES
            }
        }
    }
    else{
        $strlen = strlen( $str_utf );
        if( $count < $strlen ){
            $substr = htmlentities(substr( $str_utf, 0, $count )) . $trail; // ADD THIS TO ENCODE CHARS TO ENTITIES
        }
        else{
            $substr = htmlentities($str_utf); // ADD THIS TO ENCODE CHARS TO ENTITIES
        }
    }

    return $substr;

}

I don't know which way is better for CouchCMS. Maybe KK can give us more information. Thank you!

Attachments