PodHoarder is a set of python scripts to bulk-download podcasts.
It can also do a couple of other cool things:
- Generate a new RSS feed pointing to your hoarded files so that you can transparently listen to the mirrored podcast in your player of choice.
- Run optional, on-demand post-processing commands so that you could, for example, get
soxto do automatic volume control, or speed up the audio. These can be global or per-podcast.
- It's tested to run in Termux, so you could run it on your phone (requires root now, and
Table of Contents
Installation and initial setup
- defusedxml library (
pip install defusedxml --user)
Optional. if you want to re-host the podcast:
- A web-server
- PHP-CGI set up if you want on-demand post-processing.
sox, probably (
sudo apt install sox libsox-fmt-mp3on Debian)
Get the latest version of podhoarder and extract it to a directory of your choice. Run
init_setup.py and it'll ask you to enter some paths:
cache_diris where the podcasts files will be downloaded to. Separate sub-directories will be created for each podcast. Separate RSS files for each podcast will also be generated into this directory.
feed_cacheis a master XML file where podhoarder stores all your hoarded channel and episode metadata.
www_user_groupis the username and groupname of your webserver, usually
www-data:www-data. This is necessary only if you're re-hosting the cache folder. It'll be used to set permissions for the post-processing helper script.
www_prefix, needed only if re-hosting, is the URL that corresponds to
cache_diron the filesystem.
For example, if your files are in
/var/www/podcasts on the filesystem, and
/var/www is accessible at
https://example.com/, you would set your
https://example.com/podcasts/. The trailing slash is important.
This way when the RSS feed is being re-generated by podhoarder, it will map
add.py with podcast RSS feeds as arguments (multiple are okay).
ui.py to interactively add, remove, or configure podcast feeds. You can also change global PodHoarder settings here.
sync.py to download and generate feeds. It shows a progress indicator when run interactively.
Anytime you change settings, you should probably re-run
regenerate_feeds.py for good measure.
Post-processing is done on-demand, and uses the
ph_redir.php script to maintain a cache of processed files. It uses shell scripts, so you can write your own pipelines and share them between podcasts.
Post-processing might take a while the first time it is requested on a file, so if a file is not ready yet, PodHoarder will reply with an
HTTP 503 message. You can set the wait time for this with the
postprocess_async_time option (see the config options).
feed_cache, under the
channel you want, add the tag
script should be under the
libph/post directory. It should take arguments in the form
Input filename will be a full path to an episode file, with an appropriate extension (
Check the included
agc postprocess for a template on what to do, and remember to delete the lock file when you're done.
If you want to poke into the config file at
~/.podhoarder.xml, here are all the config options:
feed_cache // xml file with channel and episode information cache_dir // directory where episode audio files are downloaded www_prefix // external prefix to replace cache_dir when generating feed for server www_user_group // user:group that web server is run as verbosity // debug, info, errors - default info stack_traces // whether to re-raise exceptions (for debugging) overwrite_existing // global setting - if true, podcast media files are always re-downloaded, not just new ones. redownload_existing // if false, if a file is found with the same filename, it is assumed to be a completed downloaded chunk_size // transfer chunk size - default 1024 * 1024 bytes log_file //
filename mode is "w" default postprocess_script_dir // directory where postprocess commands are stored, defaults to working_dir/libph/post postprocess_niceness // unix nice command level with which post-processing scripts are invoked, default 15 postprocess_async_time // how long to wait before sending HTTP 503 when postprocessing title_postfix // If present, this text is added to regenerated feed titles (for ex: " (hoarded)") postprocess_cache_size // max size, in bytes, for post-processing cache (default -1 = ignore) retry_failed_downloads // default false. if true, when an episode download has failed, it will be retried everytime sync.py is run
Known Issues and Troubleshooting
Python crashes with 'UnicodeEncodeError'
This happens when run interactively, because your terminal locale is probably set to ASCII or a derivative, and a feed that you're trying to display on screen is trying to show a unicode character. Set your codeset to UTF-8 to fix this.