Many times organizations are faced with converting large batches of PDF documents to other format such as word. While there are are few options e.g., Adobe API, 3rd Party Apps, developer hire, most of these prove to be rather expensive especially as your count gets into the 50k range. This simple method will cut cost, and its a breeze to put together. Let’s get automating!
I’m going to be pretty generic in my explanation here, because there are a ton of variations. First get your data source which in our case are Folders. I have the customer tag certain folders with keywords and indicators. For this its “$Convert” as a suffix tag. One we get that, I want to filter out some of the records retrieved by looking for the keyword “Attachments”, we don’t want the items in there to be converted so we can leave out the folder all together. Next we can get the individual files that match *PDF. We want to remove the $Convert tag, so we can hang our hat on. We then create a new folder in another directory if it doesn’t exist.
- Adobe Pro needed.
Now this is very much a game of GUI image recognition. Run the application based off of the file name. Instead of clicking, I like to rely on hotkeys. I find them to be a little more reliant. This will save out the doc as whatever and then open that new document. In my case this is a word .docx, but again focus on the journey and not the destination. When all is done we can kill the processes (safest bet).
Once the new document is obtained feel to run some clean up.
I also use a mail relay to send an email after each folder is complete. You could also just pump some data into Power BI as well to get some good metrics on the conversion.
“A hero is someone who voluntarily walks into the unknown.”
-Tom Hanks