PDF's Full of Images only.
Open one in note pad, delete a few lines and close it again -- Effectively Breaking your pdf file.
You can still open it, but all your pages will be blank with maybe a scribble at the bottom somewhere.
Now... How do you check if a pdf file is broken like that (can STILL be opened by Adobe - with the message "Insufficient data for an image" but you can still scroll through your broken little file)
(oh and you do Not have the original to compare it with some fancy checksum)
What ive done so far with no success might I add.
- Check to see if the pages are blank (checking the amount of data on that page) -- Fun fact.. its exactly the same as the original... so Fail...
- Tried to open a pdf in a window... it still opened... broken, but it still opened. so couldnt catch anything
- I did the whole read the pages into a file stream but that only worked for the text.. and i have images. (ill put the code for this one at the bottom, since thats the only one that worked partly... but like i said, only for pdf's with words in.)
- Checking the headers. But the faults isnt in the headers (most of the time)
Ive used EVERY free library there isssss
- ITextSharp
- AcroPDF
- SautinSoft.PdfFocus
And hell if i know what else!
How would you check for a broken PDF like that?
Am i checking the right way? Is there a diffirent way to do it?
Im so desperate about this, it doesnt even have to be in c#... well preferably,,, i mean i even tried a whole new thing like GhostScript to do it but i sucked at that... so that failed..
The code that worked for the text, but didnt work for the images, since the strings came up empty. go-figure.
public bool ReadPdfFile(FileInfo f, string sourceDir)
{
lbl_CurrentFile.Text = f.FullName;
Application.DoEvents();
try
{
PdfReader pdfReader = new PdfReader(f.FullName);
for (int page = 1; page <= pdfReader.NumberOfPages; page++)
{
ITextExtractionStrategy strategy = new SimpleTextExtractionStrategy();
string currentText = PdfTextExtractor.GetTextFromPage(pdfReader, page, strategy);
currentText = Encoding.UTF8.GetString(ASCIIEncoding.Convert(Encoding.Default, Encoding.UTF8, Encoding.Default.GetBytes(currentText)));
}
pdfReader.Close();
}
catch (Exception a)
{
return false;
}
return true;
}
Any help would be Awesome.
Thanx