CouchCMS • View topic - Pages tag in Cron Jobs

Forum for discussing general topics related to Couch.

11 posts Page 1 of 2

Post a reply

Previous 1, 2 Next

Pages tag in Cron Jobs

: orbital; Registered User

: Posts: 189; Joined: Thu Aug 13, 2015 4:53 am

by orbital » Sat Feb 09, 2019 11:24 pm

Hi,
I have an template that is executed once a day by Cron. This page is very simple but this is enough for the test:

Code: Select all

<?php require_once('panel/cms.php'); ?>

<cms:template hidden='1' />

   <cms:pages masterpage='marketing_prices_products.php' limit='1' paginate='1' order='desc' >
   
            <cms:log "Product №: <cms:show k_current_record />, ID <cms:show k_page_id />, <cms:show k_page_title />" file='test.log' />
               
            <cms:if k_paginate_link_next >
               <cms:show k_paginate_link_next/>            
   
               <cms:php>         
                  $ch = curl_init();
                  curl_setopt($ch, CURLOPT_URL, "<cms:show k_paginate_link_next />" );
                  curl_setopt($ch, CURLOPT_USERAGENT, 'CouchCMS <cms:show k_cms_version />');
                  curl_setopt($ch, CURLOPT_HEADER, 0);
                  curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
                  $output = curl_exec($ch);
                  curl_close($ch);
                  echo $output;            
               </cms:php>         
            
            </cms:if>         
      
   </cms:pages>

<?php COUCH::invoke(); ?>

The task is tag pages to load all pages one by one in the usual way.
Interestingly, all this works up to 30 pages.
The remaining pages (31 and over) are not being loaded. There seems to be a problem between Pages & Curl.

I know that if I do this in a browser, there will be no problem, but this is impossible because it is more complex queries that need to be done automatically by Cron Jobs.

My question is can we use the Pages with Pagination feature run by the server not in browser?

I did a lot of tests without success, I do not know if this can help, but I noticed that if I load the page in a browser (which is incorrect given the PHP code is still there) an error message is displayed:

Warning: mysqli_error() expects parameter 1 to be mysqli, boolean given in /home/evropest/domains/evropest/panel/includes/mysql2i/mysql2i.class.php on line 139

Thanks

Re: Pages tag in Cron Jobs

: trendoman; Registered User

: Posts: 2903; Joined: Thu May 22, 2014 3:57 pm

by trendoman » Sun Feb 10, 2019 12:30 am

It's not the Couch's problem, it's the way you access the pages. Google for "exceeded 30 redirects" error.

Join COUCH:TALK channel here https://t.me/couchcms_chat
Ryazania — a framework to boost productivity with Add-ons viewtopic.php?f=2&t=13475
Support my efforts to help the community https://boosty.to/trendo/donate

Re: Pages tag in Cron Jobs

: orbital; Registered User

: Posts: 189; Joined: Thu Aug 13, 2015 4:53 am

by orbital » Sun Feb 10, 2019 12:56 am

I do not say it is Couch problem.
My question is can you use the Pages tag with paginate without a browser.
Perhaps Curl will not help, I do not know ...

Re: Pages tag in Cron Jobs

: trendoman; Registered User

: Posts: 2903; Joined: Thu May 22, 2014 3:57 pm

by trendoman » Sun Feb 10, 2019 1:08 am

orbital wrote: can you use the Pages tag with paginate without a browser.

Of course you can, Ivo. The mere fact that the first 30 requests go well proves it.

orbital wrote: There seems to be a problem between Pages & Curl.

I would like to exclude Pages from the equation. They always work fine.

Re: Pages tag in Cron Jobs

: orbital; Registered User

: Posts: 189; Joined: Thu Aug 13, 2015 4:53 am

by orbital » Sun Feb 10, 2019 1:29 am

Perhaps that's right, it would be great if someone could share their experience on this topic.
One way out of this situation is to do a lot of crown tasks, each task causing a Pages tag to start from a different point - for example:
curl https://site.com/marketing_prices_spider/?key=11710
curl https://site.com/marketing_prices_spide ... 10&page=31
curl https://site.com/marketing_prices_spide ... 10&page=61

... ... ...

This solves the problem, but I know this is the wrong way.

There are other working options but they are heavy and complex. I will be happy if you can keep the idea in the principle of the Pages tag

Re: Pages tag in Cron Jobs

: trendoman; Registered User

: Posts: 2903; Joined: Thu May 22, 2014 3:57 pm

by trendoman » Sun Feb 10, 2019 2:07 am

I think the failure of the script could be not the 'pages' tag itself. When anyone (browser or curl) requests a page, then CouchCMS executes tags one by one - in your sample it would be like

- cms:pages makes a request to database and keeps the result in memory, then tries to build HTML from what is enclosed within the pages tag outputting editable fields etc, keeping the db connection open in case other tags need it.
- while building that HTML, Couch next meets cms:php tag, which creates a request to the next page, and it all becomes a pyramid of Matryoshkas

So, those opened connections stack on top of each other till memory goes bust or database refuses to work or PHP stalls or session files becomes unreadable or Apache timeouts due to no data transferred.. who knows what else.

If you open the core code for cms:abort tag, then it has a line where db connection is forcibly closed. Study it and use in PHP script. Maybe this information helps you build a working version.

Last edited by trendoman on Sun Feb 10, 2019 2:29 am, edited 1 time in total.

Re: Pages tag in Cron Jobs

: trendoman; Registered User

: Posts: 2903; Joined: Thu May 22, 2014 3:57 pm

by trendoman » Sun Feb 10, 2019 2:26 am

I would like to add to my previous post and comment this -

orbital wrote: One way out of this situation is to do a lot of crown tasks, each task causing a Pages tag to start from a different point - for example:
This solves the problem, but I know this is the wrong way.

Maybe it's the only right way. If the number of paginated pages is large, then eventually the connection to the server will be cutoff due to timeout (no HTML passed back).

So, what you will need to do is to devise a system which will run your curl every x minutes loading a single url (entrypoint). On that url dynamically request consecutively 5-10 pages (with limit '30' it would be 150-300 cloned pages processed) and then shut it down. Write last page number into some file/database. The next time curl accesses the entrypoint - read the last page number and continue with the next set of pages.

So it will be like a 'paginated' approach with curl without timeout errors.

Re: Pages tag in Cron Jobs

: trendoman; Registered User

: Posts: 2903; Joined: Thu May 22, 2014 3:57 pm

by trendoman » Sun Feb 10, 2019 3:01 am

I finally had some time to try my mind on this interesting task. I was able to fire up a cron via console in localhost and it initiated the process of requesting paginated links. I am terminating it now after it has successfully went through 400 pages without any errors in logs.

I used this code (modified yours a bit)

Code: Select all

<cms:if "<cms:is_ajax />">

    <cms:call 'sleep' timeout='2' />
    <cms:pages limit='10' paginate='1'>

        <cms:log "Product №: <cms:show k_current_record />, ID <cms:show k_page_id />, <cms:show k_page_title />" file='test.log.txt' />

        <cms:if k_paginated_bottom>
            <cms:if k_paginate_link_next >
                <cms:call 'remote_url' url=k_paginate_link_next fake_ajax='1'/>
            </cms:if>
        </cms:if>

    </cms:pages>


<cms:else />
    <!-- First request goes here  -->
    <cms:call 'remote_url' url=k_page_link fake_ajax='1'/>
</cms:if>

Re: Pages tag in Cron Jobs

: orbital; Registered User

: Posts: 189; Joined: Thu Aug 13, 2015 4:53 am

by orbital » Sun Feb 10, 2019 6:20 pm

Anton thanks for the help! I will carefully consider your suggestions.
I think you have a sense of commitment to different tasks with Couch. I will make a donation of money for you, I see the name is different from yours - I hope this is not a problem?

Last night, I implemented a temporary solution that works well now.
Now I have a Cron who every 3 minutes loads my script file without specifying a different starting point.
Tag Pages displays only one page at each load.
Each page already has a special region with loaded/ unloaded status.
After running on any page db_persist change this status to loaded - this way, with every request from Cron, the Pages tag displays one page that is not yet loaded.

Finally, it is necessary to return the status to all pages in unloaded status
Right now I do this with a second Cron task (only once a day) - it launches another tag Pages that traverses all pages and changes their status.
But I think this second Cron is superfluous because its role can be done when k_paginate_link_next is without content - that is, when a tag page has already loaded all pages - and then their status can be reset.

Finally, let me make a little suggestion ...

I constantly see sites where I need to make queries and exchange data with another sites.
It will be very good if Couch has some tags that help with the queries.
Perhaps the KK should think about it, and he will read this

Re: Pages tag in Cron Jobs

: trendoman; Registered User

: Posts: 2903; Joined: Thu May 22, 2014 3:57 pm

by trendoman » Sun Feb 10, 2019 7:34 pm

orbital wrote: Anton thanks for the help! I will carefully consider your suggestions.
I think you have a sense of commitment to different tasks with Couch. I will make a donation of money for you, I see the name is different from yours - I hope this is not a problem?

Thank you very much, Ivo. Your back supporting means a lot to me and helps making CouchCMS better, because I play with CMS and often send small tweaks and fixes to @KK and Github Repo. Everybody benefits from it in the long run.

I have PMd you the 'remote_url' function (in case you don't have the latest version already).

orbital wrote: Each page already has a special region with loaded/ unloaded status. .. Finally, it is necessary to return the status to all pages in unloaded status

This surprised me, because it's the easiest and most reliable way to keep status

You can mod your approach to avoid a second "clearing" cron run. Let me explain the idea - do not use a radio editable for status, but just a text / datetime editable and save current date there as "Y-m-d" (if you don't care about 'publish_date' you may use it instead). This way you need only one cron job that looks for pages which date is less than today. If all pages are processed daily and all fit within a single day, then it will be a reliable method. Use custom_field="processed_date < <cms:date format 'Y-m-d' />".

orbital wrote: Finally, let me make a little suggestion ...

I agree, maybe some form of API comes to CouchCMS. Coincidentally, @genxcoders also expressed their need of a similar feature recently. It all comes to making a reliable method of authenticating such requests from 3rd parties. They say, token-based authentication is a secure one, so it all comes to people like you (and other donors) to help Couch get those features sooner than later.

Post a reply

Previous 1, 2 Next

11 posts Page 1 of 2

Return to General Discussion