Hi! While using excerpt tag with truncate chars turned on i've noticed that html entities(like " and similar) stored in DB get cut in half. There is a solution but couch source code needs to be modified. Maybe KK decides to put it in next update if he finds it useful. In admin/functions.php there is function excerpt:
The DB encodes all admin side inputs so the problem exists. We can do the following(line 3):
The problem is gone now, but it alters the output of the tag. So we can encode some chars to html entities after sub_str method is done(wrap all outputs of the function with "htmlentities" method):
I don't know which way is better for CouchCMS. Maybe KK can give us more information. Thank you!
- Code: Select all
// Truncates a string to the given length
function excerpt( $str_utf, $count, $trail='' ){
if( @preg_match('/^.{1}/us', $str_utf, $matches) == 1 ){ // quick check for UTF-8 well-formedness
if( function_exists('mb_strlen') && function_exists('mb_substr') ){
$strlen = mb_strlen( $str_utf, 'UTF-8' );
if( $count < $strlen ){
$substr = mb_substr( $str_utf, 0, $count, 'UTF-8' ) . $trail;
}
else{
$substr = $str_utf;
}
}
else{
$strlen = strlen( utf8_decode($str_utf) );
if( $count < $strlen ){
$pattern = '#^(.{'.$count.'})#us';
@preg_match( $pattern, $str_utf, $matches );
$substr = $matches[1] . $trail;
}
else{
$substr = $str_utf;
}
}
}
else{
$strlen = strlen( $str_utf );
if( $count < $strlen ){
$substr = substr( $str_utf, 0, $count ) . $trail;
}
else{
$substr = $str_utf;
}
}
return $substr;
}
The DB encodes all admin side inputs so the problem exists. We can do the following(line 3):
- Code: Select all
// Truncates a string to the given length
function excerpt( $str_utf, $count, $trail='' ){
$str_utf = html_entity_decode($str_utf); // ADD THIS TO DECODE ENTITIES
if( @preg_match('/^.{1}/us', $str_utf, $matches) == 1 ){ // quick check for UTF-8 well-formedness
if( function_exists('mb_strlen') && function_exists('mb_substr') ){
$strlen = mb_strlen( $str_utf, 'UTF-8' );
if( $count < $strlen ){
$substr = mb_substr( $str_utf, 0, $count, 'UTF-8' ) . $trail;
}
else{
$substr = $str_utf;
}
}
else{
$strlen = strlen( utf8_decode($str_utf) );
if( $count < $strlen ){
$pattern = '#^(.{'.$count.'})#us';
@preg_match( $pattern, $str_utf, $matches );
$substr = $matches[1] . $trail;
}
else{
$substr = $str_utf;
}
}
}
else{
$strlen = strlen( $str_utf );
if( $count < $strlen ){
$substr = substr( $str_utf, 0, $count ) . $trail;
}
else{
$substr = $str_utf;
}
}
return $substr;
}
The problem is gone now, but it alters the output of the tag. So we can encode some chars to html entities after sub_str method is done(wrap all outputs of the function with "htmlentities" method):
- Code: Select all
// Truncates a string to the given length
function excerpt( $str_utf, $count, $trail='' ){
$str_utf = html_entity_decode($str_utf); // ADD THIS TO DECODE ENTITIES
if( @preg_match('/^.{1}/us', $str_utf, $matches) == 1 ){ // quick check for UTF-8 well-formedness
if( function_exists('mb_strlen') && function_exists('mb_substr') ){
$strlen = mb_strlen( $str_utf, 'UTF-8' );
if( $count < $strlen ){
$substr = htmlentities(mb_substr( $str_utf, 0, $count, 'UTF-8' )) . $trail; // ADD THIS TO ENCODE CHARS TO ENTITIES
}
else{
$substr = htmlentities($str_utf); // ADD THIS TO ENCODE CHARS TO ENTITIES
}
}
else{
$strlen = strlen( utf8_decode($str_utf) );
if( $count < $strlen ){
$pattern = '#^(.{'.$count.'})#us';
@preg_match( $pattern, $str_utf, $matches );
$substr = htmlentities($matches[1]) . $trail; // ADD THIS TO ENCODE CHARS TO ENTITIES
}
else{
$substr = htmlentities($str_utf); // ADD THIS TO ENCODE CHARS TO ENTITIES
}
}
}
else{
$strlen = strlen( $str_utf );
if( $count < $strlen ){
$substr = htmlentities(substr( $str_utf, 0, $count )) . $trail; // ADD THIS TO ENCODE CHARS TO ENTITIES
}
else{
$substr = htmlentities($str_utf); // ADD THIS TO ENCODE CHARS TO ENTITIES
}
}
return $substr;
}
I don't know which way is better for CouchCMS. Maybe KK can give us more information. Thank you!