[For Deviantart] I Made A Python Script To Download Embedded Images From HTML Files That Were Downloaded By Gallery-DL #6939
CivilizedCiv
started this conversation in
Show and tell
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi,
I posted earlier this week asking if someone knew how one could download images that were embedded into html scraped by gallery-dl. Upon further searching on this github, I don't believe it is possible yet, according to some comments I read from the github owner.
With gallery-dl, I was able to download from the Deviantart creator I wanted, and it worked well. Now, some had posts that was literature and included embedded images, but gallery-dl was not able to api call those images.
I decided to then try creating my own solution. The python script I wrote was written in python 3.12, it includes some python imported libraries/modules, and it was run on windows 10.
This tool is to be used after you have used gallery-dl on what you want.
The script searches through all the directories/folders in levels below the script and the directory/folder the script is placed in for .html and .htm files. So, place this in a directory above where your gallery-dl downloaded things are. I placed mine in the deviantart folder, which is located under the gallery-dl folder for me.
It searches the .html/.htm files for any 'img' tags, and if an image tag is found, then the script will try some different actions to get the name of the image and download the image with the found name.
The script will also duplicate any .html/.htm files that are found to have an 'img' tag and append 'local_' to the front of the .html/.htm.
For example:
Original html: HowToCookEggs.html
Duplicate html: local_HowToCookEggs.html
The images that were downloaded get stored in newly created folders that have the name of the duplicated html file.
For example:
FILES STRUCTURE:
The 'local_' versions of .html/.htm files also have their src tags replaced, so that the 'local_' .html/.htm file versions now use the images that were downloaded by this script.
The script will also skip downloading an html and doing all the image stuff if a local_ version already exists. The one case where this could be annoying that I just thought of is if a deviantart html story has one image and you run my script and get all your local version stuff. If the deviantart poster changes that image for another image or adds more images to the post, then you as the user download that updated post with gallery-dl, my script will not catch that the original deviantart post was updated by you and your local version will still be one the first version of the post. For fixes, you could modify my code somehow to factor in this case. You could also delete the corresponding local_ html file of the post. Sorry that I missed this.
Why did I do all this?
I have in the past enjoyed content from a Deviantart creator and later learned that their account was shutdown, either by their decision or forced upon them. That is major reason why I searched out for a tool like gallery-dl. Sadly, gallery-dl isn't an infallible magical tool that can do everything I want the exact way I want (Though it can do nearly everything I want from it.). This led me to writing up this simple python script to fill in for my remaining needs.
I will include the script I wrote below and my config file.
Important things to consider when running this script:
LocalizeDeviantartHtmlPostsWithImages.py > LDHPWI_Output.txt
. The script will print things into the .txt file instead of the console, as for me it was easier to debug and check the output quality of things.THE MOST IMPORTANT THINGS:
I will likely not offer much or any support to anyone asking. I threw this script together in a short amount of time and spent most of that time getting the script to work with the Deviantart posts I specifically use as of February 5th, 2025. Your deviantart posts you may want to use this on may have html or image urls that I did not address, so the script not may not get everything you want and you may have to edit my script to fit your use case. Prepping the script for your computer and running it is a less user friendly experience that using gallery-dl. If you are unfamiliar with Python, this may not be easy for you, so be aware. This script may also break as deviantart and gallery-dl gets updated into the future.
THE MOST IMPORTANT RECOMMENDATION OF ALL:
Because this is meant to be used on html files downloaded through gallery-dl, I would recommend duplicating your entire gallery-dl deviantart content folder to somewhere not within the scope of my script being ran. I found that it was easy to progressively make fixes to the script if necessary and I can easily restore any altered user post collections.
For Script and Config File, I posted them in .txt file format. Make sure to change their extensions to .py and .conf respectively, if you use the files themselves and not your own.
My Python Script (Named: LocalizeDeviantartHtmlPostsWithImages.py):
LocalizeDeviantartHtmlPostsWithImages.txt
My Config file (Named: gallery-dl.conf):
gallery-dl.txt
Beta Was this translation helpful? Give feedback.
All reactions