Hi,
I would like to search elements to remove Gravit metadata and its base64 (<![CDATA[...==]]>) to optimise the SVG files. Here are:
<gravitDesigner:gravitElementRef xmlns:gravitDesigner="ns.gravit.io" xlink:href="#...."/>
<gravitDesigner:gravitGraphicSource xmlns:gravitDesigner="ns.gravit.io" id="..." version="1">
<![CDATA[...==]]>
</gravitDesigner:gravitGraphicSource>
I think XSLT, or Extensible Stylesheet Language and Transformation, might be that tool that you need.
It's been a while since I last used XSLT, but essentially what you would do is to make a template that will match the CDATA node and then output an empty node. Something like this (not guaranteed to work, I haven't run it myself):
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="w3.org/1999/XSL/Transform" xmlns:gravitDesigner="ns.gravit.io" version="1.0">
<xsl:output method="xml" indent="yes"/>
<xsl:template match="//gravitDesigner:gravitGraphicSource/text()">
<xsl:comment>Data removed</xsl:comment>
</xsl:template>
</xsl:stylesheet>
There's a famous answer on StackOverflow saying you can't, in the general sense, parse XML with a regex: stackoverflow.com/questions/1732348/regex-match-o…
Now possibly, this case is restricted enough that it might work in most cases. But is there a reason to not just parse the XML? Is it causing a measurable performance problem?
Peter Scheler
JS enthusiast
Try this:
/<gravitDesigner(?:[^<]*?\/>|(?:.|\s)*?<\/gravitDesigner.*?>)/gm