<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:clearspace="http://www.jivesoftware.com/xmlns/jive/rss" version="2.0">
  <channel>
    <title>Adobe Community: Message List - How is it that a searchable PDF text returns found words misspelled when the text is converted to an rtf file?</title>
    <link>https://forums.adobe.com/community/design_development/pdf_language_and_specifications?view=discussions</link>
    <description>Most recent forum messages</description>
    <language>en</language>
    <pubDate>Mon, 02 Jun 2014 16:20:08 GMT</pubDate>
    <generator>Jive Engage 7.0.0.1  (http://jivesoftware.com/products/)</generator>
    <dc:date>2014-06-02T16:20:08Z</dc:date>
    <dc:language>en</dc:language>
    <item>
      <title>Re: How is it that a searchable PDF text returns found words misspelled when the text is converted to an rtf file?</title>
      <link>https://forums.adobe.com/message/6427852?tstart=0#6427852</link>
      <description>&lt;!-- [DocumentBodyStart:689d6828-3062-4096-9f7a-74de441e1dae] --&gt;&lt;div class="jive-rendered-content"&gt;&lt;p&gt;I'll back up TSN - this isn't a PDF format question, but more about what is present in the file that is being used by search but not by "save as RTF".&amp;nbsp;&amp;nbsp; The only way to know is to examine the file.&amp;nbsp; If you can post it, we can look.&lt;/p&gt;&lt;/div&gt;&lt;!-- [DocumentBodyEnd:689d6828-3062-4096-9f7a-74de441e1dae] --&gt;&lt;img src='/beacon?t=1415903377796' /&gt;</description>
      <pubDate>Mon, 02 Jun 2014 16:20:08 GMT</pubDate>
      <author>forums_noreply@adobe.com</author>
      <guid>https://forums.adobe.com/message/6427852?tstart=0#6427852</guid>
      <dc:date>2014-06-02T16:20:08Z</dc:date>
      <clearspace:dateToText>5 months 2 weeks ago</clearspace:dateToText>
      <clearspace:objectType>0</clearspace:objectType>
    </item>
    <item>
      <title>Re: How is it that a searchable PDF text returns found words misspelled when the text is converted to an rtf file?</title>
      <link>https://forums.adobe.com/message/6424911?tstart=0#6424911</link>
      <description>&lt;!-- [DocumentBodyStart:0435197e-268a-4492-853e-9c8a61ddffd6] --&gt;&lt;div class="jive-rendered-content"&gt;&lt;p&gt;I do understand the intricacies of PDF structure. And I can tell you it's baffling. Hence my suggestion of a deeper investigation. Unless you can share the file publicly.&lt;/p&gt;&lt;/div&gt;&lt;!-- [DocumentBodyEnd:0435197e-268a-4492-853e-9c8a61ddffd6] --&gt;</description>
      <pubDate>Sun, 01 Jun 2014 07:54:16 GMT</pubDate>
      <author>forums_noreply@adobe.com</author>
      <guid>https://forums.adobe.com/message/6424911?tstart=0#6424911</guid>
      <dc:date>2014-06-01T07:54:16Z</dc:date>
      <clearspace:dateToText>5 months 2 weeks ago</clearspace:dateToText>
      <clearspace:replyCount>1</clearspace:replyCount>
      <clearspace:objectType>0</clearspace:objectType>
    </item>
    <item>
      <title>Re: How is it that a searchable PDF text returns found words misspelled when the text is converted to an rtf file?</title>
      <link>https://forums.adobe.com/message/6424785?tstart=0#6424785</link>
      <description>&lt;!-- [DocumentBodyStart:ce9f7d15-7da3-4efa-96ff-546c59d78f0d] --&gt;&lt;div class="jive-rendered-content"&gt;&lt;p&gt;Thank you for your suggestion TSN.&lt;/p&gt;&lt;p style="min-height: 8pt; padding: 0px;"&gt;&amp;nbsp;&lt;/p&gt;&lt;p&gt;At the moment I'm more curious to know what, in terms of technical aspects, why a PDF format returns a found word as if it were correctly spelled, whereas when converted to an rtf that same word comes back misspelled ... making it invisible to an rtf search.&amp;nbsp; When that is understood, then we can take steps to fix.&lt;/p&gt;&lt;p style="min-height: 8pt; padding: 0px;"&gt;&amp;nbsp;&lt;/p&gt;&lt;p&gt;Anyone understand the intricacies of PDF structure out there??&lt;/p&gt;&lt;p style="min-height: 8pt; padding: 0px;"&gt;&amp;nbsp;&lt;/p&gt;&lt;p&gt;W/&lt;/p&gt;&lt;/div&gt;&lt;!-- [DocumentBodyEnd:ce9f7d15-7da3-4efa-96ff-546c59d78f0d] --&gt;</description>
      <pubDate>Sun, 01 Jun 2014 03:40:26 GMT</pubDate>
      <author>forums_noreply@adobe.com</author>
      <guid>https://forums.adobe.com/message/6424785?tstart=0#6424785</guid>
      <dc:date>2014-06-01T03:40:26Z</dc:date>
      <clearspace:dateToText>5 months 2 weeks ago</clearspace:dateToText>
      <clearspace:replyCount>2</clearspace:replyCount>
      <clearspace:objectType>0</clearspace:objectType>
    </item>
    <item>
      <title>Re: How is it that a searchable PDF text returns found words misspelled when the text is converted to an rtf file?</title>
      <link>https://forums.adobe.com/message/6423913?tstart=0#6423913</link>
      <description>&lt;!-- [DocumentBodyStart:9302ffe4-98bc-4fe9-9ff8-3e811888b2c0] --&gt;&lt;div class="jive-rendered-content"&gt;&lt;p&gt;I can't fault your reasoning. I suggest a closer examination by selecting text and doing a copy/paste. &lt;/p&gt;&lt;/div&gt;&lt;!-- [DocumentBodyEnd:9302ffe4-98bc-4fe9-9ff8-3e811888b2c0] --&gt;</description>
      <pubDate>Sat, 31 May 2014 15:33:50 GMT</pubDate>
      <author>forums_noreply@adobe.com</author>
      <guid>https://forums.adobe.com/message/6423913?tstart=0#6423913</guid>
      <dc:date>2014-05-31T15:33:50Z</dc:date>
      <clearspace:dateToText>5 months 2 weeks ago</clearspace:dateToText>
      <clearspace:replyCount>3</clearspace:replyCount>
      <clearspace:objectType>0</clearspace:objectType>
    </item>
    <item>
      <title>Re: How is it that a searchable PDF text returns found words misspelled when the text is converted to an rtf file?</title>
      <link>https://forums.adobe.com/message/6423958?tstart=0#6423958</link>
      <description>&lt;!-- [DocumentBodyStart:b1484150-b78b-4f7b-80ce-8d1a40b24789] --&gt;&lt;div class="jive-rendered-content"&gt;&lt;p&gt;&lt;span style="font-size: 10pt;"&gt;Not sure what happened there, but I did not attach anything, replied to the e-mail as per instructions. Anyway ...Thank you for your prompt response.&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span style="font-size: 10pt;"&gt;&lt;br/&gt;&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span style="font-size: 10pt;"&gt;Here's the thing,&lt;br/&gt;&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span style="font-size: 10pt;"&gt;&lt;br/&gt;&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span style="font-size: 10pt;"&gt;I have downloaded a historical document (1936) that is in PDF format. There were no restrictions, it is searchable. &lt;span style="font-size: 10pt;"&gt;No OCR was done on my end.&lt;br/&gt;&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span style="font-size: 10pt;"&gt;&lt;br/&gt;&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span style="font-size: 10pt;"&gt;As an experiment, a "Find" was done for a keyword, and returned 10 results. There were no overlooked keywords. &lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span style="font-size: 10pt;"&gt;&lt;br/&gt;&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span style="font-size: 10pt;"&gt;The document was converted into RTF, and a search for the same keyword was done. The results returned only 7. &lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span style="font-size: 10pt;"&gt;&lt;br/&gt;&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span style="font-size: 10pt;"&gt;A spellcheck showed that the remaining 3 were spelled incorrectly and therefore could not be recognised. &lt;br/&gt;&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span style="font-size: 10pt;"&gt;&lt;br/&gt;&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span style="font-size: 10pt;"&gt;For some technical reason, the PDF search recognised all words even tho 3 of them, according to the rtf equivalent, were spelled correctly. &lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span style="font-size: 10pt;"&gt;&lt;br/&gt;&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span style="font-size: 10pt;"&gt;My question is why does that happen? &lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span style="font-size: 10pt;"&gt;&lt;br/&gt;&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span style="font-size: 10pt;"&gt;One would think that if 10 words were recognised in a PDF, they would all be spelled correctly in the rtf equivalent. &lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span style="font-size: 10pt;"&gt;&lt;br/&gt;&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span style="font-size: 10pt;"&gt;How is it that the rtf equivalent returns 3 misspelled words (and of course does not recognise them) when the PDF is blind to their misspellings?&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span style="font-size: 10pt;"&gt;&lt;br/&gt;&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span style="font-size: 10pt;"&gt;I'm hoping that someone who understands how the PDF format is structured would be able to explain why this strange behaviour occurs.&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span style="font-size: 10pt;"&gt;&lt;br/&gt;&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span style="font-size: 10pt;"&gt;Wayne&lt;br/&gt;&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span style="font-size: 10pt;"&gt;&lt;br/&gt;&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;&lt;!-- [DocumentBodyEnd:b1484150-b78b-4f7b-80ce-8d1a40b24789] --&gt;</description>
      <pubDate>Sat, 31 May 2014 15:25:38 GMT</pubDate>
      <author>forums_noreply@adobe.com</author>
      <guid>https://forums.adobe.com/message/6423958?tstart=0#6423958</guid>
      <dc:date>2014-05-31T15:25:38Z</dc:date>
      <clearspace:dateToText>5 months 2 weeks ago</clearspace:dateToText>
      <clearspace:replyCount>4</clearspace:replyCount>
      <clearspace:objectType>0</clearspace:objectType>
    </item>
    <item>
      <title>Re: How is it that a searchable PDF text returns found words misspelled when the text is converted to an rtf file?</title>
      <link>https://forums.adobe.com/message/6423904?tstart=0#6423904</link>
      <description>&lt;!-- [DocumentBodyStart:6e264ed9-92bd-478a-9f86-087d336157bc] --&gt;&lt;div class="jive-rendered-content"&gt;&lt;p&gt;Your reply was empty. (You cannot attach files)&lt;/p&gt;&lt;/div&gt;&lt;!-- [DocumentBodyEnd:6e264ed9-92bd-478a-9f86-087d336157bc] --&gt;</description>
      <pubDate>Sat, 31 May 2014 14:45:54 GMT</pubDate>
      <author>forums_noreply@adobe.com</author>
      <guid>https://forums.adobe.com/message/6423904?tstart=0#6423904</guid>
      <dc:date>2014-05-31T14:45:54Z</dc:date>
      <clearspace:dateToText>5 months 2 weeks ago</clearspace:dateToText>
      <clearspace:replyCount>5</clearspace:replyCount>
      <clearspace:objectType>0</clearspace:objectType>
    </item>
    <item>
      <title>Re: How is it that a searchable PDF text returns found words misspelled when the text is converted to an rtf file?</title>
      <link>https://forums.adobe.com/message/6423873?tstart=0#6423873</link>
      <description>&lt;!-- [DocumentBodyStart:fd2e985f-578b-4c36-ae9b-3bcf58bd80e8] --&gt;&lt;div class="jive-rendered-content"&gt;&lt;/div&gt;&lt;!-- [DocumentBodyEnd:fd2e985f-578b-4c36-ae9b-3bcf58bd80e8] --&gt;</description>
      <pubDate>Sat, 31 May 2014 14:22:13 GMT</pubDate>
      <author>forums_noreply@adobe.com</author>
      <guid>https://forums.adobe.com/message/6423873?tstart=0#6423873</guid>
      <dc:date>2014-05-31T14:22:13Z</dc:date>
      <clearspace:dateToText>5 months 2 weeks ago</clearspace:dateToText>
      <clearspace:replyCount>6</clearspace:replyCount>
      <clearspace:objectType>0</clearspace:objectType>
    </item>
    <item>
      <title>Re: How is it that a searchable PDF text returns found words misspelled when the text is converted to an rtf file?</title>
      <link>https://forums.adobe.com/message/6423485?tstart=0#6423485</link>
      <description>&lt;!-- [DocumentBodyStart:1f1de815-6b98-4a47-b44f-613ef166b93e] --&gt;&lt;div class="jive-rendered-content"&gt;&lt;p&gt;I'm not sure what you are saying. OCR is unreliable and if you don't correct it then text will be wrong. This seems simple and unavoidable, what do you suggest instead?&lt;/p&gt;&lt;/div&gt;&lt;!-- [DocumentBodyEnd:1f1de815-6b98-4a47-b44f-613ef166b93e] --&gt;</description>
      <pubDate>Sat, 31 May 2014 08:53:52 GMT</pubDate>
      <author>forums_noreply@adobe.com</author>
      <guid>https://forums.adobe.com/message/6423485?tstart=0#6423485</guid>
      <dc:date>2014-05-31T08:53:52Z</dc:date>
      <clearspace:dateToText>5 months 2 weeks ago</clearspace:dateToText>
      <clearspace:replyCount>7</clearspace:replyCount>
      <clearspace:objectType>0</clearspace:objectType>
    </item>
    <item>
      <title>How is it that a searchable PDF text returns found words misspelled when the text is converted to an rtf file?</title>
      <link>https://forums.adobe.com/message/6422361?tstart=0#6422361</link>
      <description>&lt;!-- [DocumentBodyStart:dea6d259-f3f2-45fe-9d0c-94e7d159d132] --&gt;&lt;div class="jive-rendered-content"&gt;&lt;p&gt;How is it that a searchable PDF text returns found words misspelled when the text is converted to an rtf file?&lt;/p&gt;&lt;p style="min-height: 8pt; padding: 0px;"&gt;&amp;nbsp;&lt;/p&gt;&lt;p&gt;I can't count the hours spent correcting pdf files that appear fine, but are simply unreadable when turned into editable or searchable text. It would seem to me that once that problem is understood, a solution may be found for at least 1/2 of the misspelled words in post OCR corrections. This is a real hindrance when it come to research ... no one has unlimited time to post OCR correct files to ensure searchable texts are not overlooking important but misspelled words.&lt;/p&gt;&lt;p style="min-height: 8pt; padding: 0px;"&gt;&amp;nbsp;&lt;/p&gt;&lt;p&gt;WRBulmer&lt;/p&gt;&lt;/div&gt;&lt;!-- [DocumentBodyEnd:dea6d259-f3f2-45fe-9d0c-94e7d159d132] --&gt;</description>
      <pubDate>Fri, 30 May 2014 19:26:06 GMT</pubDate>
      <author>forums_noreply@adobe.com</author>
      <guid>https://forums.adobe.com/message/6422361?tstart=0#6422361</guid>
      <dc:date>2014-05-30T19:26:06Z</dc:date>
      <clearspace:dateToText>5 months 2 weeks ago</clearspace:dateToText>
      <clearspace:replyCount>8</clearspace:replyCount>
      <clearspace:objectType>0</clearspace:objectType>
    </item>
  </channel>
</rss>

