• Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
    Dedicated community for Japanese speakers
  • 한국 커뮤니티
    Dedicated community for Korean speakers
Exit
0

Accessing a PDF file through VBA

New Here ,
Jul 20, 2017 Jul 20, 2017

Copy link to clipboard

Copied

0down votefavorite

I was reading from a forum thread: https://forums.adobe.com/thread/604177 and started experimenting with it.
But I think the function didn't load at all. I wasn't sure what might the reason be behind it -
I reckon it has to be simple and probably related to the library.
Can someone help point out why the following code failed to compile at all?
(the code appeared not to have ran upon execution in immediate as none of the breakpoints triggered)

The libraries I have loaded include

  • Acrobat Distiller
  • Adobe Acrobat 10.0 Type Library
  • Acrobat Scan 1.0 Type Library

The computer which this code is executed on have Acrobat Professional installed on it.

Public Function GetPDF() '(FilePath As String) As Object 
Dim origPdf As Acrobat.AcroPDDoc
Dim path1 As String
MsgBox
("Start") 

path1
= Application.ActiveWorkbook.Path
path1
= path1 & "/31700100" 

Set
origPdf = CreateObject("AcroExch.PDDoc") 

If
origPdf.Open(path1) Then
MsgBox
("weee")
End If 

origPdf
.Close
Set origPdf = Nothing
End Function
TOPICS
Acrobat SDK and JavaScript

Views

126.3K

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines

correct answers 2 Correct answers

Community Expert , Jul 20, 2017 Jul 20, 2017

You need a few more lines. Take a look here for a working example: Adobe Acrobat and VBA - An Introduction - KHKonsulting LLC

Add these lines to the beginning of your program and see if that fixes it:

Dim AcroApp As Acrobat.CAcroApp

Set AcroApp = CreateObject("AcroExch.App")

And, all you need is a reference to the Adobe Acrobat 10.0 Type Library

Votes

Translate

Translate
Community Expert , Feb 02, 2023 Feb 02, 2023

I think you will find it very helpfull to write an Acrobat folder level JavaScript function for extracting the page text, and then call this function from the VBA script.

 

This will help in 2 big ways

1) Since you are using a JavaScript function to acquire the text, it is more efficient to develop the script in the native environment, where it is easy to debug and maintain. 

2) The interface between VBA and Acrobat JavaScript is inefficient. It's slower than running the JS in it's native enviro

...

Votes

Translate

Translate
Community Expert ,
Jul 20, 2017 Jul 20, 2017

Copy link to clipboard

Copied

You need a few more lines. Take a look here for a working example: Adobe Acrobat and VBA - An Introduction - KHKonsulting LLC

Add these lines to the beginning of your program and see if that fixes it:

Dim AcroApp As Acrobat.CAcroApp

Set AcroApp = CreateObject("AcroExch.App")

And, all you need is a reference to the Adobe Acrobat 10.0 Type Library

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Jul 20, 2017 Jul 20, 2017

Copy link to clipboard

Copied

Ah! Thanks Karl. That is helpful. I got it to open just fine with the following.

My second question is then - what kind of object is the PDDoc considered as? Is it like a pointer or is it an actual object that contains all the data within the file? If I pass it as an object around, are there limitations on shallow passes and deep passes? (Say, I want to get the function to return a AcroPDDoc object and do other things with it).

Thanks!

Public Function GetPDF (FilePath As String) As Object

    Dim ArcoApp As New Acrobat.AcroApp

    Dim OriPdf As New Acrobat.AcroPDDoc

    Set ArcoApp = CreateObject("AcroExch.App")

    Set OriPdf = CreateObject("AcroExch.PDDoc")

 

    If OriPdf.Open(FilePath) Then

        MsgBox ("weee")

    End If

 

    GetPDF = OriPdf

    OriPdf.Close

    AcroApp.Close

    Set OriPdf = Nothing

    Set AcroApp =  Nothing

End Function

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Adobe Employee ,
Jul 21, 2017 Jul 21, 2017

Copy link to clipboard

Copied

Also, make sure you download the SDK and read the documentation. Snippets on the web are not documentation…

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Beginner ,
Mar 22, 2023 Mar 22, 2023

Copy link to clipboard

Copied

After running the code for the second time, the application took a long time to load and the error pointed to the CreateObject("AcroExch.App") line. The error was : Cannot create ActiveX component. Please advise.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Mar 23, 2023 Mar 23, 2023

Copy link to clipboard

Copied

That error is usually reported because the reference to the "Acrobat.tlb" file is missing.   

 

https://opensource.adobe.com/dc-acrobat-sdk-docs/library/interapp/IAC_DevApp_OLE_Support.html#enviro...

 

Thom Parker - Software Developer at PDFScripting
Use the Acrobat JavaScript Reference early and often

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Beginner ,
Mar 24, 2023 Mar 24, 2023

Copy link to clipboard

Copied

Thank you very much 🙂

 

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Jun 23, 2023 Jun 23, 2023

Copy link to clipboard

Copied

Thanks for your great article.  I am hung up on this line of code from your post:

 

If Part1Document.InsertPages(numPages - 1, Part2Document,
		0, Part2Document.GetNumPages(), True) = False Then
        MsgBox "Cannot insert pages"
    End If

I only want to copy the first two pages of the Part 2 Document, but every syntax variation I try gets me all the pages from Part 2 copied to Part 1.

Any ideas?

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Jun 23, 2023 Jun 23, 2023

Copy link to clipboard

Copied

LATEST

You say "syntax variation". Does that mean you don't have the documentation? It's here: https://opensource.adobe.com/dc-acrobat-sdk-docs/library/interapp/IAC_API_OLE_Objects.html#insertpag...

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Beginner ,
Feb 01, 2023 Feb 01, 2023

Copy link to clipboard

Copied

Hi, can you help me with this question? I am stuck, any help is highly appreciated, thanks

I am getting the error:

Run-time error '-2147023170 (800706be)':

Automation error

The remote procedure call failed. 

in the line : For i = 0 To pdf_doc.GetNumPages - 1

This is my code:

Option Explicit
Public Const pdf_file As String = "C:\Users\... 10\2022 PY PAR.pdf"
Sub rad_from_pdf()
Dim aApp As Acrobat.AcroApp
Dim av_doc As CAcroAVDoc
Dim pdf_doc As CAcroPDDoc
Dim sel_text As CAcroPDTextSelect
Dim i As Long, j As Long
Dim pagenumber, pageContent, content
Set aApp = CreateObject("AcroExch.App")
Set av_doc = CreateObject("AcroExch.AVDoc")

If av_doc.Open(pdf_file, vbNull) <> True Then Exit Sub
While av_doc Is Nothing
Set av_doc = aApp.GetActiveDoc
Wend
Set pdf_doc = av_doc.GetPDDoc
For i = 0 To pdf_doc.GetNumPages - 1
Set pagenumber = pdf_doc.AcquirePage(i)
Set pageContent = CreateObject("AcroExch.HiliteList")

On Error Resume Next
If pageContent.Add(0, 9000) <> True Then Exit Sub

Set sel_text = pagenumber.CreatePageHilite(pageContent)
On Error GoTo 0

For j = 0 To sel_text.GetNumText - 1
Debug.Print sel_text.GetText(j)
Next
Next
av_doc.Close False
aApp.Exit
Set sel_text = Nothing
Set pagenumber = Nothing
Set pdf_doc = Nothing
Set av_doc = Nothing
Set aApp = Nothing
End Sub

 

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Feb 02, 2023 Feb 02, 2023

Copy link to clipboard

Copied

Are you trying to read all the text from a PDF? The JavaScript GetPageNthWord is now considered the best way to do that.  

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Beginner ,
Feb 02, 2023 Feb 02, 2023

Copy link to clipboard

Copied

Thanks for your answer and help!

Yes, need to get the text from a pdf, and paste it in excel to manipulate it with the vba macro. The old vba created for some one else was given to me to troubleshoot and make it more efficient as it became super slow, but in my machine I am not even able to run it as I get the automatiuon error when reaching the below:

If Not AC_PGTxt Is Nothing Then

With AC_PGTxt

For j = 0 To .GetNumText - 1
T_Str = T_Str & .GetText(j)
Next j

End With

End If

 

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Feb 02, 2023 Feb 02, 2023

Copy link to clipboard

Copied

I think you will find it very helpfull to write an Acrobat folder level JavaScript function for extracting the page text, and then call this function from the VBA script.

 

This will help in 2 big ways

1) Since you are using a JavaScript function to acquire the text, it is more efficient to develop the script in the native environment, where it is easy to debug and maintain. 

2) The interface between VBA and Acrobat JavaScript is inefficient. It's slower than running the JS in it's native environement and the is a fundamental incompatibility between complex types. Putting all the JS code in a single location will save you a lot of headaches. 

 

  

Thom Parker - Software Developer at PDFScripting
Use the Acrobat JavaScript Reference early and often

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Beginner ,
Feb 02, 2023 Feb 02, 2023

Copy link to clipboard

Copied

Thanks a lot Thom Parket

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Beginner ,
Feb 20, 2023 Feb 20, 2023

Copy link to clipboard

Copied

Thanks again Thom,

Would you mind to pls provide a sample of the code needed? both javascript and how to call it from vba excel?

This is the intended output in excel:

Text In Page - 1
 
wqeqqwe
ewqeqw
qweqweqw
wqeqweq
wqeqweq
wqeqweq
wqeqweq
wqeqweq
wqeqweq
wqeqweq
wqeqweq
wqeqweq
wqeqweq
wqeqweq
wqeqweq
wqeqweq
wqeqweq
wqeqweq
wqeqweq
wqeqweq
 
Text In Page - 2
 
ythythy
kiukiu
kuikiu
kiukiu
kuikui
kuikui
ioloi
iloi
loilio

 

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Feb 20, 2023 Feb 20, 2023

Copy link to clipboard

Copied

You'll find everything you need in the IAC reference:

Here's the page on using the IAC OLE interface, which is the Windows version.  There are examples on this page for C# and VBA.

You'll find an example of what you need about halfway down under the title "Using the JSObject".

 

https://opensource.adobe.com/dc-acrobat-sdk-docs/library/interapp/IAC_DevApp_OLE_Support.html#about-...

 

 

Thom Parker - Software Developer at PDFScripting
Use the Acrobat JavaScript Reference early and often

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Beginner ,
Feb 20, 2023 Feb 20, 2023

Copy link to clipboard

Copied

Thank Yiu Thom, much appreciated

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Beginner ,
Feb 21, 2023 Feb 21, 2023

Copy link to clipboard

Copied

Hi Thom P,

I am studying the documentation provided. Thanks

Is there a way I can extract per line? is there a method such as GetLine or something line that.

The JSO.getPageNthWord does not recognize "-", spaces, or Line Feed or CR or anything to determine the end of line.

Thank you

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Feb 21, 2023 Feb 21, 2023

Copy link to clipboard

Copied

PDF files do not contain CR or LF. Often they don't even contain spaces. Each character is like a graphic, with its own position on the page. Acrobat has used guesswork to divide into words, based on the distance between characters. You can do your own guesswork to divide into lines - GetPageNthWordQuad is the tool you need. In the general case this is pretty hard; there may be main text, subscript, superscript, non-aligned columns and more. But you might be lucky with your usage case. But it's all guesswork. 

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Beginner ,
Feb 21, 2023 Feb 21, 2023

Copy link to clipboard

Copied

THank you Thom!

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Feb 21, 2023 Feb 21, 2023

Copy link to clipboard

Copied

To make sure special and white-space characters (if they exist!) are returned by getPageNthWord make sure to specify the bStrip parameter as false (by default it's set to true).

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Beginner ,
Feb 21, 2023 Feb 21, 2023

Copy link to clipboard

Copied

Got it, testing now, thank you Thom👍

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines