• After 15+ years, we've made a big change: Android Forums is now Early Bird Club. Learn more here.

PDF Extract return garbage

Hi Team,

I am looking some help in regards to reading pdf file. I have a file written mixture of Urdu and Arabic languages, the original it was composed in inpage and then converted into pdf.
We are trying to display the file in andriod App but it shows garbage once text read in code. Please see attached image

Looking forward to your help please, doing in andriod java.
 

Attachments

  • pdffile.PNG
    pdffile.PNG
    202.5 KB · Views: 99
  • output.jpg
    output.jpg
    147.4 KB · Views: 66
Being completely ignorant of Urdu and Arabic text I can't really make a valid suggestion on your posted images but just offhand when using an application like InPage to export a document to a PDF file, when the PDF appears to be gibberish that's often due to a font error/mismatch. A PDF is supposed to be essentially self-contained -- it contains all the necessary graphic and text required to make the PDF appear to be a duplicate of the original document. But there are a lot of types of fonts (bitmap, TrueType, OpenType, etc.) and a lot of them are proprietary so that adds more complications. Some fonts don't work out well when if applies to how they get integrated into something like a PDF. Have you tried using a different font, even just temporarily as a test? I realize that switching fonts will likely be a big problem with the overall look of the document but just a quick test can eliminate the chosen font(s) as part of the problem.

InPage
https://en.wikipedia.org/wiki/InPage
PDF
https://en.wikipedia.org/wiki/PDF
 
It seems that it showed garbage when you load the PDF file content in your app on an AVD. How about opening that PDF file in MS Office app? Confirm that the PDF file is correct at first.
If MS Office works well, there should be some defect in your app that loads the PDF content.
 
It seems that it showed garbage when you load the PDF file content in your app on an AVD. How about opening that PDF file in MS Office app? Confirm that the PDF file is correct at first.
If MS Office works well, there should be some defect in your app that loads the PDF content.

Hi James - I opened pdf in MS office that also returns garbage then I copy the text from original inpage file then use a site to convert into Unicode, paste in word and further save as pdf. This time I am able to read it but while reading it breaks one word into 1, 2 or 3 letters so still unreadable. Attached is the pic
 

Attachments

  • outcome.jpeg
    outcome.jpeg
    102 KB · Views: 53
Hi James - I opened pdf in MS office that also returns garbage then I copy the text from original inpage file then use a site to convert into Unicode, paste in word and further save as pdf. This time I am able to read it but while reading it breaks one word into 1, 2 or 3 letters so still unreadable. Attached is the pic

It could be the problem is with this "Inpage". I don't know this software, but it could be it's having issues with Urdu and Arabic, i.e. non-Latin characters.

You could try using MS Office? As that can export documents to PDF, and definitely supports non-Latin characters, like Arabic, Chinese, Mongolian, etc.
 
Back
Top Bottom