<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Do You Save HTML in Your Relational Database?</title>
	<atom:link href="http://robrohan.com/2007/02/09/do-you-save-html-in-your-relational-database/feed/" rel="self" type="application/rss+xml" />
	<link>http://robrohan.com/2007/02/09/do-you-save-html-in-your-relational-database/</link>
	<description>技术任意</description>
	<lastBuildDate>Fri, 04 Dec 2009 04:36:16 -0700</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: Tom</title>
		<link>http://robrohan.com/2007/02/09/do-you-save-html-in-your-relational-database/comment-page-1/#comment-2387</link>
		<dc:creator>Tom</dc:creator>
		<pubDate>Fri, 20 Jul 2007 12:18:19 +0000</pubDate>
		<guid isPermaLink="false">http://robrohan.com/2007/02/09/do-you-save-html-in-your-relational-database/#comment-2387</guid>
		<description>&lt;p&gt;It would indeed be of substantial advantage to have a less format complex method of storing and retrieving weblog text data.&lt;/p&gt;

&lt;p&gt;Perhaps an application tool that could be used in conjunction with web site development and not as a external server service.&lt;/p&gt;
</description>
		<content:encoded><![CDATA[<p>It would indeed be of substantial advantage to have a less format complex method of storing and retrieving weblog text data.</p>

<p>Perhaps an application tool that could be used in conjunction with web site development and not as a external server service.</p>]]></content:encoded>
	</item>
	<item>
		<title>By: Andy Haveland-Robinson</title>
		<link>http://robrohan.com/2007/02/09/do-you-save-html-in-your-relational-database/comment-page-1/#comment-1925</link>
		<dc:creator>Andy Haveland-Robinson</dc:creator>
		<pubDate>Fri, 18 May 2007 05:08:46 +0000</pubDate>
		<guid isPermaLink="false">http://robrohan.com/2007/02/09/do-you-save-html-in-your-relational-database/#comment-1925</guid>
		<description>&lt;p&gt;Hi, this is also a problem I&#039;ve struggled with... especially trying to hide html and formatting from mere humans that want to update the CMS sites.&lt;/p&gt;

&lt;p&gt;As the tags are extracted and deleted from the text, subsequent tags offsets change. This can be solved by working backwards when extracting tags, and forwards when reinserting. Not too trivial, but a nice algorithm can be found.&lt;/p&gt;

&lt;p&gt;However, editing the plain text independently afterwards would be impossible, especially if the text were modified in several places!
It is questionable to have to rely on fragile resynchronization - if there were a corruption in the database, then the presentation could turn to goo!&lt;/p&gt;

&lt;p&gt;So, for work in progress, the tags and text would have to be reintegrated for editing... which means that you might as well have stored the raw html in the first place!
If you need to export the pure text, then a simpler tag stripping routine will do on demand...&lt;/p&gt;

&lt;p&gt;However, your idea has merit if you want to do sql searching, and don&#039;t have enough space to store a parallel text only field.&lt;/p&gt;

&lt;p&gt;I think the bottom line for speed and effectiveness would be just to store text only + formatted text in separate fields. This may double storage requirements and be less elegant, but response time should be quicker.&lt;/p&gt;

&lt;p&gt;Regards,
Andy.&lt;/p&gt;
</description>
		<content:encoded><![CDATA[<p>Hi, this is also a problem I&#8217;ve struggled with&#8230; especially trying to hide html and formatting from mere humans that want to update the CMS sites.</p>

<p>As the tags are extracted and deleted from the text, subsequent tags offsets change. This can be solved by working backwards when extracting tags, and forwards when reinserting. Not too trivial, but a nice algorithm can be found.</p>

<p>However, editing the plain text independently afterwards would be impossible, especially if the text were modified in several places!
It is questionable to have to rely on fragile resynchronization &#8211; if there were a corruption in the database, then the presentation could turn to goo!</p>

<p>So, for work in progress, the tags and text would have to be reintegrated for editing&#8230; which means that you might as well have stored the raw html in the first place!
If you need to export the pure text, then a simpler tag stripping routine will do on demand&#8230;</p>

<p>However, your idea has merit if you want to do sql searching, and don&#8217;t have enough space to store a parallel text only field.</p>

<p>I think the bottom line for speed and effectiveness would be just to store text only + formatted text in separate fields. This may double storage requirements and be less elegant, but response time should be quicker.</p>

<p>Regards,
Andy.</p>]]></content:encoded>
	</item>
	<item>
		<title>By: rob</title>
		<link>http://robrohan.com/2007/02/09/do-you-save-html-in-your-relational-database/comment-page-1/#comment-158</link>
		<dc:creator>rob</dc:creator>
		<pubDate>Fri, 09 Feb 2007 17:45:02 +0000</pubDate>
		<guid isPermaLink="false">http://robrohan.com/2007/02/09/do-you-save-html-in-your-relational-database/#comment-158</guid>
		<description>&lt;p&gt;Hi Ian,&lt;/p&gt;

&lt;p&gt;No need to be sorry.&lt;/p&gt;

&lt;p&gt;The thing about this is I am not talking about a full HTML document. I am talking about a small part of the page (for a lack of a better term). I am not convinced that transforming HTML -&gt; XML/XHTML -&gt; XML+XSLT -&gt; HTML is really that great of a solution. That is a lot of CPU work to just display formatting. And again, I am not talking about structured data per se simply formatted data. I know that solution is the standard one, but it seems like a bit of overkill when dealing with very formatting on small &quot;pods&quot; of information.&lt;/p&gt;
</description>
		<content:encoded><![CDATA[<p>Hi Ian,</p>

<p>No need to be sorry.</p>

<p>The thing about this is I am not talking about a full HTML document. I am talking about a small part of the page (for a lack of a better term). I am not convinced that transforming HTML -> XML/XHTML -> XML+XSLT -> HTML is really that great of a solution. That is a lot of CPU work to just display formatting. And again, I am not talking about structured data per se simply formatted data. I know that solution is the standard one, but it seems like a bit of overkill when dealing with very formatting on small &#8220;pods&#8221; of information.</p>]]></content:encoded>
	</item>
	<item>
		<title>By: Ian Welsh</title>
		<link>http://robrohan.com/2007/02/09/do-you-save-html-in-your-relational-database/comment-page-1/#comment-155</link>
		<dc:creator>Ian Welsh</dc:creator>
		<pubDate>Fri, 09 Feb 2007 16:57:25 +0000</pubDate>
		<guid isPermaLink="false">http://robrohan.com/2007/02/09/do-you-save-html-in-your-relational-database/#comment-155</guid>
		<description>&lt;p&gt;I would argue that saving HTML (or rather XHTML) data within the database is actually a better solution, especially if you leverage the database for searching.&lt;/p&gt;

&lt;p&gt;If you are using a text area to capture data it is likely that this data is in some way structured (e.g. paragraphs, word emphasis, lists, etc.) and this structure is what HTML is designed to represent, so why strip the data (text) of its structural meaning?&lt;/p&gt;

&lt;p&gt;If it is to &#039;repurpose&#039; the data for non-html display, why not use XSLT to transform the data into the required format: for example, simply replacing P tags with /n (newline) or more complex XML transformations such as to pdf or MS new XML document format.&lt;/p&gt;

&lt;p&gt;If it is to enable easier full-text searching then it may be possible to leaverage the database to index the structured data for you? Of course not all databases are equal, but we use SQL2005 and it allows full text indexes of XML data. If this is not possible, then perhaps Peter&#039;s suggestion of an additional non-html data column in the database might suffice.&lt;/p&gt;

&lt;p&gt;It seems to me that you lose more my striping the data of its structural meaning than you gain. Sorry.&lt;/p&gt;

&lt;p&gt;Regards&lt;/p&gt;
</description>
		<content:encoded><![CDATA[<p>I would argue that saving HTML (or rather XHTML) data within the database is actually a better solution, especially if you leverage the database for searching.</p>

<p>If you are using a text area to capture data it is likely that this data is in some way structured (e.g. paragraphs, word emphasis, lists, etc.) and this structure is what HTML is designed to represent, so why strip the data (text) of its structural meaning?</p>

<p>If it is to &#8216;repurpose&#8217; the data for non-html display, why not use XSLT to transform the data into the required format: for example, simply replacing P tags with /n (newline) or more complex XML transformations such as to pdf or MS new XML document format.</p>

<p>If it is to enable easier full-text searching then it may be possible to leaverage the database to index the structured data for you? Of course not all databases are equal, but we use SQL2005 and it allows full text indexes of XML data. If this is not possible, then perhaps Peter&#8217;s suggestion of an additional non-html data column in the database might suffice.</p>

<p>It seems to me that you lose more my striping the data of its structural meaning than you gain. Sorry.</p>

<p>Regards</p>]]></content:encoded>
	</item>
	<item>
		<title>By: rob</title>
		<link>http://robrohan.com/2007/02/09/do-you-save-html-in-your-relational-database/comment-page-1/#comment-154</link>
		<dc:creator>rob</dc:creator>
		<pubDate>Fri, 09 Feb 2007 16:42:33 +0000</pubDate>
		<guid isPermaLink="false">http://robrohan.com/2007/02/09/do-you-save-html-in-your-relational-database/#comment-154</guid>
		<description>&lt;p&gt;&quot;but one that strips the meaning out of the data you are trying to store&quot;&lt;/p&gt;

&lt;p&gt;But, I mean, for stuff where there isn&#039;t a meaning per se. Think of this as the text in one tag of an XML document - or in a CDATA section.  If someone adds bold or italic or underline to the display of content, there really isn&#039;t meaning to that just emphasis or style. While I&#039;d like to think people will, say, italicize or tag something as &quot;book name&quot;, odds are they wont.&lt;/p&gt;

&lt;p&gt;&quot;have you thought about structures like paragraphs and unordered lists?&quot;&lt;/p&gt;

&lt;p&gt;Not yet, but I think those kinds of things are more data centric - this thought  is not completely flushed out yet mind you - I think that might need to be handled differently... not sure yet. But it could work the same way as the bold italic example. I&#039;d like to try to find something before I go and build it :)&lt;/p&gt;

&lt;p&gt;&quot;I don’t see anything wrong with storing data in the database. I see the html markup as data. It’s no different than storing an image or other binary data in the database. &quot;&lt;/p&gt;

&lt;p&gt;That&#039;s true, but I don&#039;t often search though image data to find matching results nor has anyone ever asked me to give stats on the stuff inside of an image data field. I do see your point, but I don&#039;t think the comparison between binary data and html formatted data completely correlates.&lt;/p&gt;

&lt;p&gt;&quot;I’m not sure how easy to implement something like that would be, plus I wonder how worth it it all is?&quot;&lt;/p&gt;

&lt;p&gt;If done correctly, in the end, it should boil down to 2 functions - &quot;text[] styleSheetFromText(text)&quot; and &quot;text applyStyleSheet(text,sheet)&quot;. The details of those might get a bit hairy. As to how worth it it is, that depends on how clean you want the data in the end, and if you think it&#039;s a good idea - but I wonder too :)&lt;/p&gt;

&lt;p&gt;&quot;I did once read about a text format that stored formating info in the white space at the end of lines - intended for newsgroup use, iirc.&quot;&lt;/p&gt;

&lt;p&gt;Hum... sounds pretty close. Do you know what it was called?&lt;/p&gt;

&lt;p&gt;&quot;Page.HTML with markup and Page.Text without, stripping all tags from one (an easier proposition)&quot;&lt;/p&gt;

&lt;p&gt;Thanks Peter, that&#039;s not a bad idea. It essentially doubles the storage, but will mostly get the job done. If I can&#039;t figure out a nice way to separate the two, I&#039;ll fall back on that.&lt;/p&gt;

&lt;p&gt;(My apologies for the delays in showing your comments, I have to moderate the comments because ever since I moved to WordPress the spam comments have quadrupled.)&lt;/p&gt;
</description>
		<content:encoded><![CDATA[<p>&#8220;but one that strips the meaning out of the data you are trying to store&#8221;</p>

<p>But, I mean, for stuff where there isn&#8217;t a meaning per se. Think of this as the text in one tag of an XML document &#8211; or in a CDATA section.  If someone adds bold or italic or underline to the display of content, there really isn&#8217;t meaning to that just emphasis or style. While I&#8217;d like to think people will, say, italicize or tag something as &#8220;book name&#8221;, odds are they wont.</p>

<p>&#8220;have you thought about structures like paragraphs and unordered lists?&#8221;</p>

<p>Not yet, but I think those kinds of things are more data centric &#8211; this thought  is not completely flushed out yet mind you &#8211; I think that might need to be handled differently&#8230; not sure yet. But it could work the same way as the bold italic example. I&#8217;d like to try to find something before I go and build it :)</p>

<p>&#8220;I don’t see anything wrong with storing data in the database. I see the html markup as data. It’s no different than storing an image or other binary data in the database. &#8220;</p>

<p>That&#8217;s true, but I don&#8217;t often search though image data to find matching results nor has anyone ever asked me to give stats on the stuff inside of an image data field. I do see your point, but I don&#8217;t think the comparison between binary data and html formatted data completely correlates.</p>

<p>&#8220;I’m not sure how easy to implement something like that would be, plus I wonder how worth it it all is?&#8221;</p>

<p>If done correctly, in the end, it should boil down to 2 functions &#8211; &#8220;text[] styleSheetFromText(text)&#8221; and &#8220;text applyStyleSheet(text,sheet)&#8221;. The details of those might get a bit hairy. As to how worth it it is, that depends on how clean you want the data in the end, and if you think it&#8217;s a good idea &#8211; but I wonder too :)</p>

<p>&#8220;I did once read about a text format that stored formating info in the white space at the end of lines &#8211; intended for newsgroup use, iirc.&#8221;</p>

<p>Hum&#8230; sounds pretty close. Do you know what it was called?</p>

<p>&#8220;Page.HTML with markup and Page.Text without, stripping all tags from one (an easier proposition)&#8221;</p>

<p>Thanks Peter, that&#8217;s not a bad idea. It essentially doubles the storage, but will mostly get the job done. If I can&#8217;t figure out a nice way to separate the two, I&#8217;ll fall back on that.</p>

<p>(My apologies for the delays in showing your comments, I have to moderate the comments because ever since I moved to WordPress the spam comments have quadrupled.)</p>]]></content:encoded>
	</item>
	<item>
		<title>By: Dan G. Switzer, II</title>
		<link>http://robrohan.com/2007/02/09/do-you-save-html-in-your-relational-database/comment-page-1/#comment-153</link>
		<dc:creator>Dan G. Switzer, II</dc:creator>
		<pubDate>Fri, 09 Feb 2007 16:38:50 +0000</pubDate>
		<guid isPermaLink="false">http://robrohan.com/2007/02/09/do-you-save-html-in-your-relational-database/#comment-153</guid>
		<description>&lt;p&gt;I was facing the similar quandry last year. However, I needed to provide the capability to have rich text (with embedded images.) I ended up going with the XStandard plug-in--which not only produces strict XHTML, but also allows drag-n-drop/copy-n-paste uploading of images.&lt;/p&gt;

&lt;p&gt;Since the markup was valid XHTML then parse the XHTML into a rendered plain text view. This turned out to be more cumbersome than I hoped--just because there aren&#039;t any good open source Java solutions for converted XHTML to formatted plain text, so I had to write the parser myself.&lt;/p&gt;

&lt;p&gt;However, this allows us to store both plain text and XHTML versions of the user created content. Obviously, the plain text code doesn&#039;t include images, but it allows us to send documents to mobile devices w/out having to worry about whether they support HTML.&lt;/p&gt;
</description>
		<content:encoded><![CDATA[<p>I was facing the similar quandry last year. However, I needed to provide the capability to have rich text (with embedded images.) I ended up going with the XStandard plug-in&#8211;which not only produces strict XHTML, but also allows drag-n-drop/copy-n-paste uploading of images.</p>

<p>Since the markup was valid XHTML then parse the XHTML into a rendered plain text view. This turned out to be more cumbersome than I hoped&#8211;just because there aren&#8217;t any good open source Java solutions for converted XHTML to formatted plain text, so I had to write the parser myself.</p>

<p>However, this allows us to store both plain text and XHTML versions of the user created content. Obviously, the plain text code doesn&#8217;t include images, but it allows us to send documents to mobile devices w/out having to worry about whether they support HTML.</p>]]></content:encoded>
	</item>
	<item>
		<title>By: Peter Bell</title>
		<link>http://robrohan.com/2007/02/09/do-you-save-html-in-your-relational-database/comment-page-1/#comment-152</link>
		<dc:creator>Peter Bell</dc:creator>
		<pubDate>Fri, 09 Feb 2007 15:34:20 +0000</pubDate>
		<guid isPermaLink="false">http://robrohan.com/2007/02/09/do-you-save-html-in-your-relational-database/#comment-152</guid>
		<description>&lt;p&gt;No idea, but I thought I&#039;d post a comment in case someone else did! My first thought was that this simply wasn&#039;t worth the effort, but the more I think about it the more I realize it could be very cool. You might have to constrain the markup supported somewhat to make this a manageable project, but if you come across anything, let me know.&lt;/p&gt;

&lt;p&gt;On the other hand, given that disk space is cheap and programmers are not, one solution would just be t have Page.HTML with markup and Page.Text without, stripping all tags from one (an easier proposition) and using the text version for some things and the HTML for others. Wouldn&#039;t have the full utility of having access to markup semantics to translate between formats (but of course that wouldn&#039;t be of use until you&#039;d written tools to go from the language &lt;em&gt;to&lt;/em&gt; PDF or whatever).&lt;/p&gt;

&lt;p&gt;Definitely an interesting concept. Let us know how you get on!&lt;/p&gt;
</description>
		<content:encoded><![CDATA[<p>No idea, but I thought I&#8217;d post a comment in case someone else did! My first thought was that this simply wasn&#8217;t worth the effort, but the more I think about it the more I realize it could be very cool. You might have to constrain the markup supported somewhat to make this a manageable project, but if you come across anything, let me know.</p>

<p>On the other hand, given that disk space is cheap and programmers are not, one solution would just be t have Page.HTML with markup and Page.Text without, stripping all tags from one (an easier proposition) and using the text version for some things and the HTML for others. Wouldn&#8217;t have the full utility of having access to markup semantics to translate between formats (but of course that wouldn&#8217;t be of use until you&#8217;d written tools to go from the language <em>to</em> PDF or whatever).</p>

<p>Definitely an interesting concept. Let us know how you get on!</p>]]></content:encoded>
	</item>
	<item>
		<title>By: Tom Chiverton</title>
		<link>http://robrohan.com/2007/02/09/do-you-save-html-in-your-relational-database/comment-page-1/#comment-151</link>
		<dc:creator>Tom Chiverton</dc:creator>
		<pubDate>Fri, 09 Feb 2007 14:58:28 +0000</pubDate>
		<guid isPermaLink="false">http://robrohan.com/2007/02/09/do-you-save-html-in-your-relational-database/#comment-151</guid>
		<description>&lt;p&gt;I did once read about a text format that stored formating info in the white space at the end of lines - intended for newsgroup use, iirc.&lt;/p&gt;
</description>
		<content:encoded><![CDATA[<p>I did once read about a text format that stored formating info in the white space at the end of lines &#8211; intended for newsgroup use, iirc.</p>]]></content:encoded>
	</item>
	<item>
		<title>By: Rick</title>
		<link>http://robrohan.com/2007/02/09/do-you-save-html-in-your-relational-database/comment-page-1/#comment-150</link>
		<dc:creator>Rick</dc:creator>
		<pubDate>Fri, 09 Feb 2007 13:28:25 +0000</pubDate>
		<guid isPermaLink="false">http://robrohan.com/2007/02/09/do-you-save-html-in-your-relational-database/#comment-150</guid>
		<description>&lt;p&gt;Hi, interesting idea. I&#039;m not sure how easy to implement something like that would be, plus I wonder how worth it it all is?&lt;/p&gt;

&lt;p&gt;It is an interesting area though in regards to separating presentation from structure and how that works when using a database. It&#039;s one thing to work with static HTML pages using an XHTML file and separate CSS file, but when you start drawing in dynamic content from a database it adds a different angle to that separation.&lt;/p&gt;

&lt;p&gt;Personally I don&#039;t have a problem storing the XHTML in the database. I would have though that good semantically structured XHTML mark-up is the only thing (ideally!) that would be getting added to the plain text. Therefore it should represent no problem simply stripping that mark-up from the content drawn from the database leaving you with the raw text?&lt;/p&gt;

&lt;p&gt;Perhaps the new Wysiwyg editor WYMeditor ( http://www.wymeditor.org/en/ )could be useful for keeping people to using good mark-up?&lt;/p&gt;

&lt;p&gt;Peter Krantz wrote an update to his &#039;Evaluation of WYSIWYG editors&#039; post which is interesting reading. http://www.standards-schmandards.com/2007/wysiwyg-editor-test-2/&lt;/p&gt;
</description>
		<content:encoded><![CDATA[<p>Hi, interesting idea. I&#8217;m not sure how easy to implement something like that would be, plus I wonder how worth it it all is?</p>

<p>It is an interesting area though in regards to separating presentation from structure and how that works when using a database. It&#8217;s one thing to work with static HTML pages using an XHTML file and separate CSS file, but when you start drawing in dynamic content from a database it adds a different angle to that separation.</p>

<p>Personally I don&#8217;t have a problem storing the XHTML in the database. I would have though that good semantically structured XHTML mark-up is the only thing (ideally!) that would be getting added to the plain text. Therefore it should represent no problem simply stripping that mark-up from the content drawn from the database leaving you with the raw text?</p>

<p>Perhaps the new Wysiwyg editor WYMeditor ( <a href="http://www.wymeditor.org/en/" rel="nofollow">http://www.wymeditor.org/en/</a> )could be useful for keeping people to using good mark-up?</p>

<p>Peter Krantz wrote an update to his &#8216;Evaluation of WYSIWYG editors&#8217; post which is interesting reading. <a href="http://www.standards-schmandards.com/2007/wysiwyg-editor-test-2/" rel="nofollow">http://www.standards-schmandards.com/2007/wysiwyg-editor-test-2/</a></p>]]></content:encoded>
	</item>
	<item>
		<title>By: Doug Hughes</title>
		<link>http://robrohan.com/2007/02/09/do-you-save-html-in-your-relational-database/comment-page-1/#comment-149</link>
		<dc:creator>Doug Hughes</dc:creator>
		<pubDate>Fri, 09 Feb 2007 13:25:29 +0000</pubDate>
		<guid isPermaLink="false">http://robrohan.com/2007/02/09/do-you-save-html-in-your-relational-database/#comment-149</guid>
		<description>&lt;p&gt;I don&#039;t see anything wrong with storing data in the database.  I see the html markup as data.  It&#039;s no different than storing an image or other binary data in the database.  One of the major purposes of a database is to relate information, which is what happens when entryId 1 is related to the HTML for that entry.&lt;/p&gt;

&lt;p&gt;Anyhow, it seems like the effort to do what you&#039;re suggesting is disproportionate to the rewards of the effort.  Why would you actually use this?  About the only reason I can come up with is if you wanted to put the text into a format that&#039;s not HTML.  IE: flash or something else similar.&lt;/p&gt;
</description>
		<content:encoded><![CDATA[<p>I don&#8217;t see anything wrong with storing data in the database.  I see the html markup as data.  It&#8217;s no different than storing an image or other binary data in the database.  One of the major purposes of a database is to relate information, which is what happens when entryId 1 is related to the HTML for that entry.</p>

<p>Anyhow, it seems like the effort to do what you&#8217;re suggesting is disproportionate to the rewards of the effort.  Why would you actually use this?  About the only reason I can come up with is if you wanted to put the text into a format that&#8217;s not HTML.  IE: flash or something else similar.</p>]]></content:encoded>
	</item>
</channel>
</rss>
