Problems, need help? Have a tip or advice? Post it here.
15 posts Page 1 of 2
Hi! I'm in a process of building a couch based cron image compression utility.

First task is to read image directory contents and add some info about images to db which will help me to fetch and mark compressed images for compression utility. I'm using db_persist tag to add a new image record to db.

After each image db record creation I output php memory_get_usage and can see that it rises with each record created.
I wonder if there is any unset variables left or something as I really don't want the script to fail when it will scan a folder with several hundreds of images.

I'm not very strong at php at the moment so after looking at source code of the persist tag - i've found nothing suspicious. Only return $html at the end bothers me. Why db_pesist tag needs to return html or is it not the case? Hope KK or someone more experienced than me can lead me to solution of the problem. Thank you!
Just tested image record remove functionality when image gets deleted from a folder and can see that db_delete causes same huge memory consumption :(
It's a pity you didn't bother to help us with some code to investigate.

I have cooked myself the following code, which potentially could reproduce your experience. However it does not show a memory leak. I have tried various iterations - 3 to 100 - and result is always the same.
Anyone can also try with following more couchified code:

Code: Select all
<cms:call 'byte2units' size='123456' />
@andreyzagoruy, any operation that is done in a loop and processes possibly hundreds of Couch 'pages' is likely to hit the memory and/or the execution limits.

It is precisely for such cases that we came up with the 'staggered' processing approach - please see viewtopic.php?f=5&t=8803 for one particular use-case.

That solution can be modified to cater to any other kind of operation (like the ones you are undertaking).
Process the items in manageable chunks or batches (say only 100 pages per run). Save the last milestone in database for the next run to pick forward from.

Hope this helps.
Hi, trendoman and KK! I have not provided any code as ANY loop with db operations needs more and more memory. As of trendoman's observation: memory consumption with each iteration rises not so much. Try looping through several hundred pages and you will get ~70mb or something and also you are using get peak_usage - it will always show same result(peak usage of particular script). There is memory_get_ussage() that shows memory rise.

KK, thank you for advice, i can use the 'staggered' processing approach, but it would cause some problems. It will be a cron job - so no JS with page reload(another batch execution). Cron on most shared hosts is not intended for too often use. My main concern is why memory consumption rises with each iteration? I always thought that there is garbage collector in PHP and these operations(DB) don't set any variables so where memory goes?

I tried Googling and only problem that I found is that garbage collector can't deal with circular object referencing and it's not good, because I can't fix such problems myself at the moment without help. I just don't get why these functions don't free up memory they use as i've gone though source code and can see that you unset variables when they are not needed. Couch works very fast and 1000+ files processing takes only several seconds. It would be great to deal with them in one batch to get current folder situation before compression utility kicks in.

Some sample code(it is not finished at the moment, just for testing purposes):
Code: Select all
<cms:php>
    global $CTX;
    class ExtensionsFilterIterator extends RecursiveFilterIterator {

      public static $ALLOWED_EXTENSIONS = array(
          'jpeg',
          'jpg',
          'png'
      );

      public function accept() {
          if (substr($this->current()->getFilename(), 0, 1) == '.') {
            return FALSE;
          }
          if ($this->current()->isDir()) {
            return TRUE;
          }
          return in_array(
              strtolower($this->current()->getExtension()),
              self::$ALLOWED_EXTENSIONS,
              true
          );
      }

    }
    $number_of_files = 0;
    $files = array();
    $path = K_SITE_DIR . 'admin/uploads/image/test/';
    $entities = new RecursiveIteratorIterator(new ExtensionsFilterIterator(new RecursiveDirectoryIterator($path, RecursiveDirectoryIterator::SKIP_DOTS)));
    foreach($entities as $entity ) {
      if ($entity->isFile()) {
        $number_of_files++;
        $files[md5($entity->getRealPath())] = array(
          'filename' => $entity->getFilename(),
          'path' => $entity->getRealPath(),
          'size' => $entity->getSize()
        );
      }
    }
    $CTX->set('files', $files, 'global');
    unset($files);
  </cms:php>
  <h1>Array of files</h1>
  <cms:each files as='file' >
    <cms:show key />:<br/>
    <cms:show file.filename /><br/>
    <cms:show file.path /><br/>
    <cms:show file.size />
    <cms:if "<cms:page_exists "<cms:show key />" masterpage='scheduled/images/images.php' />" >
      <cms:if "<cms:get_custom_field 'is_compressed' masterpage='scheduled/images/images.php' />" >
        <p style="color: orange;">Checking if file still compressed</p>
        <cms:else />
          <p style="color: red;">File not compressed</p>
      </cms:if>
      <cms:else />
        <cms:php>echo "Used memory: " . memory_get_usage();</cms:php>
        <cms:db_persist
          _masterpage='scheduled/images/images.php'
          _mode='create'
          _invalidate_cache='0'
          _autotitle='0'
          k_page_title="<cms:show file.filename />"
          k_page_name="<cms:show key />"
          filename="<cms:show file.filename />"
          path="<cms:show file.path />"
          original_size="<cms:show file.size />"
          compressed_size="<cms:show file.size />"
        />
    </cms:if>
    <hr/>
  </cms:each>
  <cms:pages masterpage="scheduled/images/images.php" skip_custom_fields='1' >
    <cms:if "<cms:not "<cms:arr_key_exists "<cms:show k_page_name />" in=files />" />" >
      <cms:db_delete masterpage="scheduled/images/images.php" page_id="<cms:show k_page_id />" /> 
    </cms:if>
  </cms:pages>
Hi, I couldn't use your code, because RecursiveFilterIterator is not there..
I did change the limit from my first sample from '3' to '1000' and also used memory_get_usage(). I didn't have any memory issue - my used memory stalled about 27Mb, which, I suspect, is relevant to the size of SQL query result (fetching info of a 1000 pages).
andreyzagoruy wrote: ANY loop with db operations needs more and more memory

I believe this statement is incorrect. Could you maybe run my sample with your masterpage? I hope you see the same result.

Apart from our php-battle, I could suggest to remove "cms:get_custom_field" tag from your loop and, perhaps, replace it with SQL query, because it would definitely improve memory and speed.

Edit: Also get_custom_field has a parameter missing, because your template must be a clonable, so page or id is expected.. Anyway, better use cms:query there.
trendoman wrote: Hi, I couldn't use your code, because RecursiveFilterIterator is not there..
I did change the limit from my first sample from '3' to '1000' and also used memory_get_usage(). I didn't have any memory issue - my used memory stalled about 27Mb, which, I suspect, is relevant to the size of SQL query result (fetching info of a 1000 pages).
andreyzagoruy wrote: ANY loop with db operations needs more and more memory

I believe this statement is incorrect. Could you maybe run my sample with your masterpage? I hope you see the same result.

Apart from our php-battle, I could suggest to remove "cms:get_custom_field" tag from your loop and, perhaps, replace it with SQL query, because it would definitely improve memory and speed.


Hi! RecursiveFilterIterator is just at the beginning after global $CTX(it is php built in class, I only extend it):
Code: Select all
class ExtensionsFilterIterator extends RecursiveFilterIterator {

      public static $ALLOWED_EXTENSIONS = array(
          'jpeg',
          'jpg',
          'png'
      );

      public function accept() {
          if (substr($this->current()->getFilename(), 0, 1) == '.') {
            return FALSE;
          }
          if ($this->current()->isDir()) {
            return TRUE;
          }
          return in_array(
              strtolower($this->current()->getExtension()),
              self::$ALLOWED_EXTENSIONS,
              true
          );
      }

    }


It is possible that your db records have less information or something. Also as you tested it with more pages you can see that now it consumes 27mb. I don't get why it would be so because of sql queries as I don't save any pages information. It should be almost same consumption with each iteration as all the variables are rewritten on following iteration.
andreyzagoruy wrote: Also as you tested it with more pages you can see that now it consumes 27mb.
It is actually my fault as I forgot to add skip_custom_fields='1' for cms:pages loop. With it, consumption is only 6 mb constant for all 1000 pages.
Hi, everyone! I've found memory consumption management solution. If I understood correctly, when variables get unset or unused they keep staying in memory. There is a PHP Garbage Collector which should free memory for us. BUT it starts cleaning process not by observing current memory consumption but when sort of a stack gets filled by certain number of items to process. So it may happen that your script may be terminated because of memory consumption issues. Luckily you can monitor consumed memory and start garbage collector yourself by using gc_collect_cycles(). Here's my current code sample:
Code: Select all
<cms:php>
    global $CTX;
    class ExtensionsFilterIterator extends RecursiveFilterIterator {

      public static $ALLOWED_EXTENSIONS = array(
          'jpeg',
          'jpg',
          'png'
      );

      public function accept() {
          if (substr($this->current()->getFilename(), 0, 1) == '.') {
            return FALSE;
          }
          if ($this->current()->isDir()) {
            return TRUE;
          }
          return in_array(
              strtolower($this->current()->getExtension()),
              self::$ALLOWED_EXTENSIONS,
              true
          );
      }

    }
    $number_of_files = 0;
    $files = array();
    $path = K_SITE_DIR . 'admin/uploads/image/';
    $entities = new RecursiveIteratorIterator(new ExtensionsFilterIterator(new RecursiveDirectoryIterator($path, RecursiveDirectoryIterator::SKIP_DOTS)));
    foreach($entities as $entity ) {
      if ($entity->isFile()) {
        $number_of_files++;
        $files[md5($entity->getRealPath())] = array(
          'filename' => $entity->getFilename(),
          'path' => $entity->getRealPath(),
          'size' => $entity->getSize()
        );
      }
    }
    $CTX->set('files', $files, 'global');
    unset($files);
  </cms:php>
  <cms:each files as='file' >
    <cms:if "<cms:page_exists "<cms:show key />" masterpage='scheduled/images/images.php' />" >
      <cms:if "<cms:get_custom_field "is_compressed" masterpage="scheduled/images/images.php" page="<cms:show key />" />" >
        <cms:set db_compressed_size="<cms:get_custom_field 'compressed_size' masterpage='scheduled/images/images.php' page="<cms:show key />" />" />
        <cms:if db_compressed_size ne file.size >
          <cms:db_persist
            _masterpage="scheduled/images/images.php"
            _page_id="<cms:pages masterpage="scheduled/images/images.php" page_name="<cms:show key />" ids_only='1' limit='1' ></cms:pages>"
            _mode='edit'
            original_size=file.size
            compressed_size=''
            is_compressed='0'
          />
        </cms:if>
        <cms:else />
          <cms:set db_original_size="<cms:get_custom_field 'original_size' masterpage='scheduled/images/images.php' page="<cms:show key />" />" />
          <cms:if db_original_size ne file.size >
            <cms:db_persist
              _masterpage="scheduled/images/images.php"
              _page_id="<cms:pages masterpage="scheduled/images/images.php" page_name="<cms:show key />" ids_only='1' limit='1' ></cms:pages>"
              _mode='edit'
              original_size=file.size
            />
          </cms:if>
      </cms:if>
      <cms:else />
        <cms:db_persist
          _masterpage='scheduled/images/images.php'
          _mode='create'
          _invalidate_cache='0'
          _autotitle='0'
          k_page_title="<cms:show file.filename />"
          k_page_name="<cms:show key />"
          path="<cms:show file.path />"
          original_size="<cms:show file.size />"
        />
    </cms:if>
    Memory used: <cms:php>echo memory_get_usage();</cms:php><br/>
    <cms:php>
      if (memory_get_usage() > 20000000) {
        gc_collect_cycles();
        echo "Memory cleanup";
      }
    </cms:php>
  </cms:each>
  <cms:pages masterpage="scheduled/images/images.php" skip_custom_fields='1' limit='10000' >
    <cms:if "<cms:not "<cms:arr_key_exists "<cms:show k_page_name />" in=files />" />" >
      <cms:db_delete masterpage="scheduled/images/images.php" page_id="<cms:show k_page_id />" />
    </cms:if>
    Memory used: <cms:php>echo memory_get_usage();</cms:php><br/>
    <cms:php>
      if (memory_get_usage() > 20000000) {
        gc_collect_cycles();
        echo "Memory cleanup";
      }
    </cms:php>
  </cms:pages>
  <cms:php>echo "Peak memory: " . memory_get_peak_usage();</cms:php>


And here's output when I feed a script with several thousands of images:
Code: Select all
Memory used: 5943744
Memory used: 6059288
Memory used: 6174832
Memory used: 6290376
....
....
....
....
....
Memory used: 19994400
Memory used: 20110104
Memory cleanup <--
Memory used: 6192888
Memory used: 6310368
Memory used: 6427240
Memory used: 6544008
Memory used: 6660808


As you can see, now it flushed unused memory when I need it. In a matter of fact i've tested this script with a folder with ~6000 images on a host with 32mb memory limit and it created 6000 pages (total of more than 100k queries(!)) in something about 50 seconds on PHP 5.6 without any further optimization.
Now, I want to deal with a time limit. I'm able to get current script execution time no problem but what ever I try to gracefully exit script execution it never commits changes to DB. Is there a way to commit processed data when I want it to do so($DB->commit(1) does nothing :( ) ? Also, there is php builtin function which gets executed when script reaches server set time limit so if we commit changes to DB we'll have another approach of getting such routines done :)
15 posts Page 1 of 2