There's no way to do this using Adobe Acrobat. Files that are visually identical can be structured very differently. If your only concern is duplicate visual content, my recommendation is to render the PDF pages to images and compare those and then delete the corresponding PDF.
I forgot to saye that the files were downloaded from a server. Simple 2 page documents,
As a matter of fact, I might download the very same document 10x in a row, but each time the hash is different.
1. Within Acrobat->View->Compare documents: Acrobat tells me the documents are the same.
2. A 3rd party tool that compares on contents, tells me the files are the same.
Imagine one is downloading the acrobat-xi-pro-accessibility-best-practice-guide.pdf twice.
In that case the hashes are definitely the same.
However, once the document is generated 'on the fly' the hashes are different, i.e. it is the way how a document is produced that makes them different.
The reason for posting here is that I am unaware of Acrobat-tools that allow comparing documents on contents only
(number of lines, number of characters,, or size and number of lines) but ignore comparing on hashes.
Except then by comparing 2 documents within Acrobat, which, in case of many documents, is quite a workload.
I know it does not exist, but it would be nice if Acrobat could do a kind of 'batch-compare files' in a folder in the same way as document compare.
hm... I see I am not the only one..
Can we have a batch process for comparing pdfs
I found yur post very confusing until I realised you are using the word "hash" in a completely new way, different from anyone else... especially confusing as PDF files do use hashes, for signatures. Hash function - Wikipedia
I am very sorry for the confusion.
What I meant is that the hashes calculated from the files are different.
(using tools like 'HashMyFiles', from Nirsoft,
or MD5 & SHA-1 Checksum Utility, from MD5 & SHA Checksum Utility | Raymond's WordPress )
again, sorry for the confusion.
Below is what I meant...
Yes, these hashes are different, but these are not used by the compare tool. It is certain that making the same file twice will give a different hash, because each PDF file MUST contain a unique string that is different from all other files (the "document ID") as well as the date and time it was made.
I believe it is the method of downloading causing different checksums.
As said, if the file is not generated 'on the fly', then the checksums are the same.
I know that checksums are not considered within Acrobat View->Compare Documents.
This method is perfect for comparing 2/two documents only, but not when comparing a lot of probably duplicates.
Ideal would be if this method (Compare Documents) could be applied on an entire folder and Acrobat would show what files are duplicates.
Anyway, this is not possible right now and there is no alternative but to go thru the folder(s), sort on size and check and delete each duplicate file.