9 Replies Latest reply on Jun 21, 2012 11:19 PM by Chetanya Jain

    Extract html content from jcr node

    Shelly Goel Level 1

      I have a requirement to export the html view source of the CQ pages to the file system. Is there any way I can get the html from the jcr node?

      Something like node.getProperty("Some property here").getString and this will return something like

       

      <div><a href="/abs.html" title="title"/></div>

       

      Thanks in advance.

        • 1. Re: Extract html content from jcr node
          hypnotec Adobe Employee

          the "HTML view" of a CQ page is not stored anywhere, but the summation of requesting a URI and it being served by the various components/scripts denoted by the resource type(s) of the page requests.

           

          you will thus have to request the HTML via HTTP and store it on the file system. if external from the CQ system, use any of the abundant web spider software. if you'd like to do it programmatically from within CQ, you can use the

           

          com.day.cq.retriever.RetrieverService (OSGi service)

          => retriever.retrieve(String uri, String baseUri, RetrieverStorage storage);

           

          the retriever storage is a simple class you can implement according to the RetrieverStorage interface, allowing you to store the content retrieved.

           

          consult the javadoc for mor details.

           

          dom.

          • 2. Re: Extract html content from jcr node
            justin_at_adobe Adobe Employee

            You can also use the static replication agent for this. This would be appropriate if the exporting to the file system should be done on page activation.

             

            Justin

            • 3. Re: Extract html content from jcr node
              Shelly Goel Level 1

              Hi Justin,

               

              Yes I need to export it on page activation. Can you please elaborate more on static replication agent or provide some code snippet?

               

              Thanks

              Shelly

              • 4. Re: Extract html content from jcr node
                hypnotec Adobe Employee

                quoting from http://dev.day.com/docs/en/cq/current/deploying/configuring_cq/replication.html

                 

                This is an "Agent that stores a static representation of a node into the filesystem.".

                For example with the default settings, content pages and dam assets are stored under /tmp, either as HTML or the appropriate asset format. See the Settings and Rules tabs for the configuration.

                This was requested so that when the page is requested directly from the application server the content can be seen. This is a specialized agent and (probably) will not be required for most instances.

                 

                you can use the replication service with the static agent:

                 

                com.day.cq.replication.Replicator replicator = sling.getService(Replicator.class) // in a JSP

                 

                OR

                 

                @Reference

                com.day.cq.replication.Replicator replicator = null; // in an OSGi component

                 

                THEN

                 

                com.day.cq.replication.ReplicationOptions opts = new ReplicationOptions();

                opts.setFilter(new com.day.cq.replication.AgentFilter() {

                        public boolean isIncluded(com.day.cq.replication.Agent agent) {

                            return agent.getId("static"); // the ID of the agent is the node name, e.g. "static" for /etc/replication/agents/static

                        }

                    }

                ); // filter by replication agent

                 

                replicator.replicate(javax.jcr.Session, ReplicationActionType.ACTIVATE, path, opts); // activate

                 

                HTH

                dom.

                • 5. Re: Extract html content from jcr node
                  justin_at_adobe Adobe Employee

                  While what Dom wrote above is true, what I was more suggesting is to simply enable a static replication agent and then use the normal Activate button(s). With CQ 5.4, there was a sample static replication agent configure out-of-the-box, albeit in a disabled state. You just need to configure and enable it.

                  1 person found this helpful
                  • 6. Re: Extract html content from jcr node
                    Shelly Goel Level 1

                    I tried both the options but I am getting an error:

                     

                    06.02.2012 09:47:03 - ERROR - static : Target is not a directory: /tmp

                     

                    I tried changing it but no use. Any idea what should the value in Directory parameter of Rules tab?

                    • 7. Re: Extract html content from jcr node
                      hypnotec Adobe Employee

                      the specified directory is checked via File#isDirectory(): "true if and only if the file denoted by this abstract pathname exists and is a directory; false otherwise"

                       

                      thus your directory doesn't exist (or the CQ process user lacks permission) or it is not a directory. are you operating on windows or a *nix variant? on windows it would be C:\Temp IIRC.

                       

                      dom.

                      • 8. Re: Extract html content from jcr node
                        Shelly Goel Level 1

                        Thanks Justin & Hypnotec.

                        Really appreciate your imely help.

                        • 9. Re: Extract html content from jcr node
                          Chetanya Jain

                          Hi,

                           

                          The static replication exports in a specific manner like /content/<app>/abc. Can this be changed to start from /<app>/abc or /abc or any other custom folder?

                           

                          Please help.

                           

                          Thanks,

                          Chetanya