Developer Reference¶

Classes and functions definitions for main modules. All descriptions are generated automatically from source code docstrings.

lep_downloader.lep¶

LEP module for general logic and classes.

class lep_downloader.lep.Lep(session=None, log=None)¶

Bases: object

Represent base class for LEP’s general attributes and methods.

Parameters:

session (requests.Session, optional) – Global session for descendants.
log (LepLog, optional) – Log object where to output messages.

cls_session¶

Class session. Default is taken from module variable PROD_SES

Type:: requests.Session

cls_lep_log¶

Class log object where to output messages. Default is LepLog() - only stdout output.

Type:: LepLog

json_body¶

Content of JSON database file.

Type:: str

classmethod extract_only_valid_episodes(json_body, json_url=None)¶

Return list of valid (not None) LepEpisode objects.

Parameters:

json_body (str) – Content of JSON database file.
json_url (str, optional) – JSON URL, only for printing it to output.

Returns:

List of LepEpisode objects.: It’s empty if there are no valid objects at all.

Return type:

LepEpisodeList

classmethod get_db_episodes(json_url, session=None)¶

Get valid episodes from JSON.

Parameters:

json_url (str) – URL to JSON database file.
session (requests.Session, optional) – Session object to send request. Default is Lep.cls_session.

Return type:

LepEpisodeList

Raises:

DataBaseUnavailableError – if JSON is unavailable

classmethod get_web_document(page_url, session=None)¶

Get text content of web document (HTML, JSON, etc.).

Parameters:

page_url (str) – URL for getting text response.
session (requests.Session, optional) – Session object to send request. Default is Lep.cls_session.

Returns:

A tuple (resp.text, final_location, is_url_ok) where

resp.text (str) is text content of URL response
final_location (str) is location after all redirections
is_url_ok (bool) is flag of URL status

Return type:

Tuple[str, str, bool]

class lep_downloader.lep.LepEpisode(episode=0, date=datetime.datetime(2000, 1, 1, 0, 0, tzinfo=datetime.timezone.utc), url='', post_title='', post_type='', parsed_at='', index=0, files=None, admin_note='', updated_at='', html_title='')¶

Bases: object

LEP episode class.

Parameters:

episode (int) – Episode number.
date (str | datetime) – Post datetime. It will be converted to aware datetime object (with timezone). If None, defaults to datetime equaling “2000-01-01T00:00:00+00:00”.
url (str) – Final location of web post URL.
post_title (str) – Post title extracted from link text (unsafe).
post_type (str) – Post type (“AUDIO”, “TEXT”, etc.).
files (dict | None) – Dictionary with files for episode. Each key of it is a file category (“audios”, “audiotrack”, “page_pdf”, etc). If None defaults to empty dict.
parsed_at (str) – Parsing datetime in UTC timezone, with microseconds.
index (int) – Parsing index, concatenation of date from URL and increment (for several posts in a day).
admin_note (str) – Note for administrator and storing error message (for bad response during parsing)
updated_at (str) – Datetime in UTC when episode was updated (usually manually by admin).
html_title (str) – Page title extracted from HTML tag <title>. Important: Not stored in JSON database.

property date: Any¶

Episode datetime (with timezone).

To be accurate, posting datetime on the website.

property post_title: str¶

Post title converted to be safe for Windows path (filename).

Conversion via replace_unsafe_chars().

property short_date: str¶

Episode short date.

It’s the same as posting date in the episode URL, just formatted as “YYYY-MM-DD”.

class lep_downloader.lep.LepEpisodeList(iterable=(), /)¶

Bases: List[Any]

Represent list of LepEpisode objects.

default_start_date¶

Min date. It’s equal to “1999-01-01T00:01:00+00:00”

Type:: datetime

default_end_date¶

Max date. It’s equal to “2999-12-31T23:55:00+00:00”

Type:: datetime

desc_sort_by_date_and_index()¶

Sort LepEpisodeList by post datetime.

Returns:: New sorted LepEpisodeList.
Return type:: LepEpisodeList

Notes

Sort is descending (last by date will be first). Sort goes by two attrs: “date” and “index”.

filter_by_date(start=None, end=None)¶

Filter list by episode date.

Parameters:

start (datetime, optional) – Episode date (left bound). If start is None, defaults to min date LepEpisodeList.default_start_date.
end (datetime, optional) – Episode date (right bound). If end is None, defaults to max date LepEpisodeList.default_end_date.

Returns:

New filtered LepEpisodeList.

Return type:

LepEpisodeList

Notes

If end < start - they are swapped.

filter_by_number(start, end)¶

Filter list by episode number.

Parameters:

start (int) – Episode number (left bound)
end (int) – Episode number (right bound)

Returns:

New filtered LepEpisodeList.

Return type:

LepEpisodeList

Notes

If end < start - they are swapped.

filter_by_type(type)¶

Filter list by episode type.

Parameters:: type (str) – Episode type (“AUDIO”, “TEXT”, etc)
Returns:: New filtered LepEpisodeList.
Return type:: LepEpisodeList

class lep_downloader.lep.LepJsonEncoder(*, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, sort_keys=False, indent=None, separators=None, default=None)¶

Bases: JSONEncoder

Custom JSONEncoder for LepEpisode objects.

default(obj)¶

Override ‘default’ method for encoding JSON objects.

Parameters:

obj (Any) – Object for encoding.

Returns:

If object is LepEpisode returns dict.: Otherwise, TypeError exception is raised.

Return type:

Any

class lep_downloader.lep.LepLog(debug=False, logfile='_lep_debug_.log')¶

Bases: object

Represent LepLog object.

Parameters:

debug (bool) – Debug mode flag. Defaults to False.
logfile (str) – Name of log file. Defaults to config.DEBUG_FILENAME = “_lep_debug_.log”.

debug¶

Debug mode flag (True / False).

Type:: bool

logfile¶

Name of log file.

Type:: str

lep_log¶

Custom loguru.logger object, which is returned from init_lep_log().

Type:: loguru.logger

msg(msg, *, skip_file=False, one_line=True, msg_lvl='PRINT', wait_input=False, **kwargs)¶

Output message to console or log file.

Parameters:

msg (str) – Message to output. Supports loguru color markups.
skip_file (bool) – Flag to skip writing to logfile (even in Debug mode). Defaults to False.
one_line (bool) – Flag to replace new line character with Unicode char of it, i.e. ⏎. Defaults to True.
msg_lvl (str) – Message level. Defaults to “PRINT”.
wait_input (bool) – Flag to stay on line after printing message to console. Defaults to False.
kwargs (Any) – Arbitrary keyword arguments.

Return type:

None

Notes

If Debug mode is False and message level is “PRINT”, method outputs to console only. Otherwise, it duplicates all console messages to log file too (with level PRINT). Also records (messages) for other log levels goes into file (if skip_file is not True).

lep_downloader.lep.as_lep_episode_obj(dct)¶

Specialize JSON objects decoding.

Parameters:: dct (dict) – Dictionary object from JSON (including nested dictionaries).
Returns:: LepEpisode object or None.
Return type:: Any

Notes

If dictionary is empty or has “audios” key it’s returned “as-is”. Returns None if TypeError was raised.

lep_downloader.lep.init_lep_log(debug=False, logfile='_lep_debug_.log')¶

Create custom logger object.

Parameters:

debug (bool) – Debug log or not. Defaults to False.
logfile (str) – Name of the logfile. Defaults to config.DEBUG_FILENAME = “_lep_debug_.log”

Returns:

Custom loguru.logger object

Return type:

Any

lep_downloader.lep.logfile_formatter(record)¶

Return formatter string for log file sink.

Parameters:

record (Any) – Loguru’s record dict.

Returns:

Format string for log file

{time:YYYY-MM-DD HH:mm:ss.SSS} | {level: <8} | "{message}" + LF: LF - newline character here.

Return type:

str

Note

2022-02-25 07:20:48.909 | PRINT    | Running script...⏎
2022-02-25 07:20:48.917 | PRINT    | Starting parsing...

lep_downloader.lep.replace_unsafe_chars(filename)¶

Replace most common invalid path characters with ‘_’.

Parameters:: filename (str) – Filename (should be a string representing the final path component) without the drive and root.
Returns:: Safe name for writing file on Windows OS (and others).
Return type:: str

Example

>>> import lep_downloader.lep
>>> unsafe = "What/ will: be* replaced?.mp3"
>>> lep_downloader.lep.replace_unsafe_chars(unsafe)
'What_ will_ be_ replaced_.mp3'

lep_downloader.lep.stdout_formatter(record)¶

Return formatter string for console sink.

Parameters:

record (Any) – Loguru’s record dict.

Returns:

Format string for stdout log: "{message}" + end

Return type:

str

Notes

Controling ending character for log message by storing it in the ‘extra’ dict and changing later via bind(). Default is the newline character.

lep_downloader.downloader¶

LEP module for downloading logic.

class lep_downloader.downloader.ATrack(ep_id=0, name='', ext='.mp3', short_date='', filename='', primary_url='', secondary_url='', tertiary_url='', part_no=0)¶

Bases: LepFile

Represent audio track object (to episode video or part of it).

Parameters:

ep_id (int) – Episode index. Defaults to 0.
name (str) – File name (without extension). Defaults to empty str.
ext (str) – File extension. Defaults to “.mp3”.
short_date (str) – Episode date (format “YYYY-MM-DD”). Defaults to empty str.
filename (str) – File name + extension. Defaults to empty str.
primary_url (str) – Primary URL to download file. Defaults to empty str.
secondary_url (str) – Secondary URL to download file. Defaults to empty str.
tertiary_url (str) – Tertiary URL to download file. Defaults to empty str.
part_no (int) – Part number. Defaults to 0.

Notes

Filename depends on part number.

If part_no = 0,
composed as f"[{short_date}] # {name}" + " _aTrack_" + ext
If part_no > 0,
f"[{short_date}] # {name}" + " [Part NN]" + " _aTrack_" + ext

Other attrs see LepFile

ext: str = '.mp3'¶: Extension for audio track file.

part_no: int = 0¶: Part number.

class lep_downloader.downloader.Audio(ep_id=0, name='', ext='.mp3', short_date='', filename='', primary_url='', secondary_url='', tertiary_url='', part_no=0)¶

Bases: LepFile

Represent audio object to episode (or part of it).

Parameters:

ep_id (int) – Episode index. Defaults to 0.
name (str) – File name (without extension). Defaults to empty str.
ext (str) – File extension. Defaults to “.mp3”.
short_date (str) – Episode date (format “YYYY-MM-DD”). Defaults to empty str.
filename (str) – File name + extension. Defaults to empty str.
primary_url (str) – Primary URL to download file. Defaults to empty str.
secondary_url (str) – Secondary URL to download file. Defaults to empty str.
tertiary_url (str) – Tertiary URL to download file. Defaults to empty str.
part_no (int) – Part number. Defaults to 0.

Notes

Filename depends on part number.

If part_no = 0, composed as f"[{short_date}] # {name}" + ext
If part_no > 0, f"[{short_date}] # {name}" + " [Part NN]" + ext

Other attrs see LepFile

ext: str = '.mp3'¶: Extension for audio file.

part_no: int = 0¶: Part number.

class lep_downloader.downloader.LepDL(json_url='https://hotenov.com/d/lep/v3-lep-db.min.json', session=None, log=None)¶

Bases: Lep

Represent downloader object.

Parameters:

json_url (str) – URL to JSON database
session (requests.Session) – Requests session object. If None defaults to global session lep.PROD_SES.
log (LepLog) – Log instance where to output messages.

db_episodes: LepEpisodeList¶: List of episodes in JSON database.

db_urls: Dict[str, str]¶: Dictionary “URL - post title”.

detach_existed_files(save_dir, files=None)¶

Detach ‘existed’ files from non ‘non_existed’.

Parameters:

save_dir (Path) – Folder for saving files.
files (LepFileList, optional) – List of files. If None, defaults to self ‘files’ attribute.

Return type:

None

download_files(save_dir)¶

Download files from ‘non_existed’ attribute list.

For reliability: If primary link is not available, method will try to download other two links (if they present).

Parameters:: save_dir (Path) – Path to folder where to save files.
Return type:: None

downloaded: LepFileList¶: List of downloaded files.

existed: LepFileList¶: List of existing files on disc.

files: LepFileList¶: List of all files (gathered for downloading).

get_remote_episodes()¶

Get database episodes from remote JSON database.

After retreiving episodes, also extract all URLs and their titles and store them in ‘db_urls’ attribute.

Return type:: None

json_url: str¶: URL to JSON database.

non_existed: LepFileList¶: List of non-existing files on disc.

not_found: LepFileList¶: List of unavailable files.

populate_default_url()¶

Fill in secondary download url (if it is empty) with default value.

Iterate over ‘files’ attribute list. Default value composed as: config.DOWNLOADS_BASE_URL + url-encoded filename.

Return type:: None

class lep_downloader.downloader.LepFile(ep_id=0, name='', ext='', short_date='', filename='', primary_url='', secondary_url='', tertiary_url='')¶

Bases: object

Represent base class for LEP file object.

Parameters:

ep_id (int) – Episode index. Defaults to 0.
name (str) – File name (without extension). Defaults to empty str.
ext (str) – File extension. Defaults to empty str.
short_date (str) – Episode date (format “YYYY-MM-DD”). Defaults to empty str.
filename (str) – File name + extension. Defaults to empty str.
primary_url (str) – Primary URL to download file. Defaults to empty str.
secondary_url (str) – Secondary URL to download file. Defaults to empty str.
tertiary_url (str) – Tertiary URL to download file. Defaults to empty str.

ep_id: int = 0¶: Episode index.

ext: str = ''¶: File extension.

filename: str = ''¶: File name + extension.

name: str = ''¶: File name (without extension).

primary_url: str = ''¶: Primary URL to download file.

secondary_url: str = ''¶: Secondary URL to download file.

short_date: str = ''¶: Episode date (format “YYYY-MM-DD”).

tertiary_url: str = ''¶: Tertiary URL to download file.

class lep_downloader.downloader.LepFileList(iterable=(), /)¶

Bases: List[Any]

Represent list of LepFile objects.

filter_by_type(*file_types)¶

Filter list by file type(s).

Parameters:: file_types (Any) – Variable length argument list of file types (Audio, PagePDF, ATrack, and others).
Returns:: New filtered LepFileList.
Return type:: LepFileList

class lep_downloader.downloader.PagePDF(ep_id=0, name='', ext='.pdf', short_date='', filename='', primary_url='', secondary_url='', tertiary_url='')¶

Bases: LepFile

Represent PDF file of episode page.

Parameters:

ep_id (int) – Episode index. Defaults to 0.
name (str) – File name (without extension). Defaults to empty str.
ext (str) – File extension. Defaults to “.pdf”.
short_date (str) – Episode date (format “YYYY-MM-DD”). Defaults to empty str.
filename (str) – File name + extension. Defaults to empty str.
primary_url (str) – Primary URL to download file. Defaults to empty str.
secondary_url (str) – Secondary URL to download file. Defaults to empty str.
tertiary_url (str) – Tertiary URL to download file. Defaults to empty str.

Notes

Filename is composed after initialization other attrs as: f"[{short_date}] # {name}" + ext

Other attrs see LepFile

ext: str = '.pdf'¶: Extension for PDF file.

lep_downloader.downloader.URL_ENCODED_CHARS_PATTERN = re.compile('%[0-9A-Z]{2}')¶

Pattern for matching %-encoded Unicode characters.

Type:: re.Pattern

lep_downloader.downloader.append_each_audio_to_container_list(ep_id, name, short_date, audios, file_class)¶

Relate links for each audio file with episode.

And put audio as ‘Audio’ or ‘ATrack’ object to container list of LepFile objects.

Parameters:

ep_id (int) – Episode number.
name (str) – File name (without extension).
short_date (str) – Date (format “YYYY-MM-DD”).
audios (list[list[str]]) – List of list of URLs for each audio part.
file_class (Audio | ATrack) – LepFile subclass (audio type).

Return type:

None

lep_downloader.downloader.append_page_pdf_file_to_container_list(ep_id, name, short_date, page_pdf)¶

Relate links for page PDF file with episode.

And put it as ‘PagePDF’ object to container list of LepFile objects.

Parameters:

ep_id (int) – Episode number.
name (str) – File name (without extension).
short_date (str) – Date (format “YYYY-MM-DD”).
page_pdf (list[str]) – List of URLs for page PDF file.

Return type:

None

lep_downloader.downloader.crawl_list(links)¶

Crawl list of links and return tuple of three links.

For absent URL empty string is assigned.

Parameters:: links (list[str]) – List of URLs (for one file).
Returns:: A tuple of three strings (URLs).
Return type:: Tuple[str, str, str]

lep_downloader.downloader.detect_existing_files(save_dir, files)¶

Separate list for existing and non-existing files.

Method scans all files in the directory and composes list of filtered files by extensions: mp3, pdf, mp4. Then it separates ‘files’ list on two: existed files and non-existed files (iterating over filtered files in the directory, not all).

Parameters:

save_dir (Path) – Path to destination folder.
files (LepFileList) – List of LepFile objects.

Returns:

A tuple with two lists: existed, non_existed.

Return type:

Tuple[LepFileList, LepFileList]

lep_downloader.downloader.download_and_write_file(url, session, save_dir, filename, log)¶

Download a file by URL and save it.

Parameters:

url (str) – URL to file.
session (requests.Session) – Session to send request.
save_dir (Path) – Folder where to save file.
filename (str) – Filename (with extension).
log (LepLog) – Log object where to print messages.

Returns:

Status operation. True for success, False otherwise.

Return type:

bool

lep_downloader.downloader.extract_urls_from_episode_list(episodes)¶

Extract page URL and its title for each episode object in list.

Parameters:: episodes (LepEpisodeList) – List of episodes.
Returns:: Dictionary “URL - post title”.
Return type:: dict[str, str]

lep_downloader.downloader.files_box = []¶

Module level container list of LepFile objects.

Type:: LepFileList

lep_downloader.downloader.gather_all_files(lep_episodes)¶

Skim list of episodes and collect all files.

Parameters:: lep_episodes (LepEpisodeList) – List of LepEpisode objects.
Returns:: Module’s container list files_box.
Return type:: LepFileList

lep_downloader.downloader.url_encoded_chars_to_lower_case(url)¶

Change %-escaped chars in string to lower case.

Parameters:: url (str) – URL with uppercase unicode characters.
Returns:: URL with lowercase unicode characters.
Return type:: str

Example

>>> import lep_downloader.downloader
>>> url = "https://teacherluke.co.uk/2016/03/01/333-more-misheard-lyrics-%E2%99%AC/"
>>> lep_downloader.downloader.url_encoded_chars_to_lower_case(url)
'https://teacherluke.co.uk/2016/03/01/333-more-misheard-lyrics-%e2%99%ac/'

lep_downloader.parser¶

LEP module for parsing logic.

class lep_downloader.parser.Archive(url='https://teacherluke.co.uk/archive-of-episodes-1-149/', session=None, mode='fetch', with_html=False, html_path=None, log=None)¶

Bases: Lep

Represent archive page object.

Parameters:

url (str) – URL to LEP Archive page. Defaults to config.ARCHIVE_URL.
session (requests.Session) – Session to send requests. If None, defaults to super’s (global) session from lep.PROD_SES.
mode (str) – Parsing mode (“raw” | “fetch” | “pull”). Defaults to “fetch”.
with_html (bool) – Flag to save HTML file for parsed web page. Defaults to False.
html_path (str, optional) – Path to folder where HTML files will be saved. If None, it will be later replaced with config.PATH_TO_HTML_FILES.
log (LepLog, optional) – Log instance. If None, global (super’s) value LepLog() will be set (output to console only).

collected_links: Dict[str, str]¶: Valid episodes links on archive page.

deleted_links: Set[str]¶: Deleted (invalid) links.

do_parsing_actions(json_url, json_name='')¶

Do parsing job.

Parameters:

json_url (str) – URL to remote JSON database.
json_name (str) – Name for JSON local file (with parsing results).

Return type:

None

Raises:

NoEpisodesInDataBaseError – If JSON database has no episodes at all.

episodes: LepEpisodeList¶: List of archive episodes.

html_path: str | None¶: Path to folder for saving HTMLs.

mode: str¶: Parsing mode.

parse_each_episode(urls)¶

Parse each episode in dictionary of URLs.

Parameters:: urls (Dict[str, str]) – Dictionary of differing URLs (or all URLs in case of “raw” mode).
Return type:: None

parser: ArchiveParser¶: Parser instance.

take_updates(db_urls, archive_urls=None, mode='fetch')¶

Take differing URLs between database and archive page.

Difference is determined according to parsing mode: “fetch” or “pull”.

Parameters:

db_urls (Dict[str, str]) – URLs dictionary of database.
archive_urls (Dict[str, str], optional) – URLs dictionary of archive. If None, takes attribute dictionary ‘collected_links’.
mode (str) – Parsing mode. Defaults to “fetch”.

Returns:

Difference dictionary or None (for “fetch” mode when database contains more episodes than archive).

Return type:

Any

url: str¶: URL to LEP Archive page.

used_indexes: Set[int]¶: Set of indexes.

with_html: bool¶: Flag to save HTML files.

write_text_to_html(text, file_stem, path=None, ext='.html')¶

Write text to HTML file.

Parameters:

text (str) – Text (HTML content) to be written to file.
file_stem (str) – Name of the file (without extension).
path (str, optional) – Folder path where HTML files will be saved. If None, defaults to config.PATH_TO_HTML_FILES (it’s nested folder ./data_dump in app folder).
ext (str) – Extension for HTML file. Defaults to “.html”.

Return type:

None

class lep_downloader.parser.ArchiveParser(archive_obj, url, session=None, log=None)¶

Bases: LepParser

Parser for archive page.

Parameters:

archive_obj (Archive) – Instance of Archive object to put and use data in its containers attributes.
url (str) – URL for parsing.
session (requests.Session) – Requests session object. If None, get default global session.
log (LepLog) – Log instance of LepLog class where to output message.

collect_links()¶

Parse all links matching episode URL and their texts.

Ignoring repeated links. One more case is unlikely to be true, but if an archive page consists completely of repeated links, method silently skips them (as if there were no episodes at all).

Raises:: NoEpisodeLinksError – If there are no episode links on archive page.
Return type:: None

do_post_parsing()¶

Remove irrelevant links and substitute short links.

Return type:: None

do_pre_parsing()¶

Substitute link with ‘.ukm’ misspelled TLD in HTML content.

Return type:: None

remove_irrelevant_links()¶

Delete known irrelevant links from dictionary.

First, irrelevant links is saved into ‘deleted_links’ attribute before deletion them from dictionary. Then dictionary is rebuilt ignoring irrelevant links.

Return type:: None

substitute_short_links()¶

Paste final URL location instead of short links.

Return type:: None

class lep_downloader.parser.EpisodeParser(archive_obj, page_url, session=None, post_title='', log=None)¶

Bases: LepParser

Parser for episode page.

Parameters:

archive_obj (Archive) – Archive instance.
page_url (str) – Target page URL.
session (requests.Session, optional) – Parsing session. Defaults to None. If None, takes global session from lep.PROD_SES.
post_title (str) – Link text for this episode.
log (LepLog, optional) – Log instance to output parsing messages. Defaults to None.

collect_links()¶

Parse link(s) to episode audio(s).

Also parse datetime of episode publishing.

Return type:: None

do_post_parsing()¶

Post parsing actions for EpisodeParser.

No actions - just pass.

Return type:: None

do_pre_parsing()¶

Parse episode date, number, HTML title and generate index.

Raises:

NotEpisodeURLError – If URL does not contain date.
LepEpisodeNotFoundError – If URL is not available.

Return type:

None

episode¶: Episode instance.

used_indexes¶: Used indexes from archive instance.

class lep_downloader.parser.LepParser(archive_obj, url, session=None, log=None)¶

Bases: Lep

Base class for LEP parsers.

Parameters:

archive_obj (Archive) – Archive instance.
url (str) – Target page URL.
session (requests.Session) – Parsing session. Defaults to None. If None, takes global session from lep.PROD_SES.
log (LepLog, optional) – Log instance to output parsing messages. Defaults to None.

archive¶: Archive instance.

collect_links()¶

Parse all links by parser own rules.

Raises:: NotImplementedError – This method must be implemented.
Return type:: None

content: str¶: Page content.

do_post_parsing()¶

Finalize and process parsing results.

Raises:: NotImplementedError – This method must be implemented.
Return type:: None

do_pre_parsing()¶

Prepare for parsing.

It might be: extracting data from URL, clearing / replacement tags, etc.

Raises:: NotImplementedError – This method must be implemented.
Return type:: None

final_location: str¶: Final location of target URL. In case of redirects.

get_url()¶

Retrieve target web page.

Method result are saved in attributes:

content
final_location
is_url_ok

Return type:: None

is_url_ok: bool¶: URL getting status.

parse_dom_for_article_container()¶

Parse DOM for HTML’s <article> tag only.

This is common step for parsers.

Raises:: NotEpisodeURLError – If target page has now HTML’s <article> tag.
Return type:: None

parse_url()¶

Perform parsing steps.

Return type:: None

soup: BeautifulSoup¶: Parsed HTML as BeautifulSoup object.

url¶: Target page URL.

lep_downloader.parser.convert_date_from_url(url)¶

Extract date from URL and then convert it to ‘datetime’ object.

Parameters:: url (str) – URL to episode.
Returns:: Naive datetime.
Return type:: datetime

lep_downloader.parser.extract_date_from_url(url)¶

Parse date from URL.

Parameters:: url (str) – URL to episode.
Returns:: Date in YYYY/MM/DD format. If date is not found, returns empty string.
Return type:: str

lep_downloader.parser.generate_post_index(post_url, indexes)¶

Generate index number for post from URL.

Parameters:

post_url (str) – URL to episode.
indexes (Set[int]) – Already used indexes.

Returns:

Index number. If URL is not valid, returns 0.

Return type:

int

lep_downloader.parser.has_tag_a_appropriate_audio(tag_a)¶

Check link text for “download” audio purpose.

Key words are revealed in advance and placed in regex.

Parameters:: tag_a (Tag) – Tag object (<a>).
Returns:: True for appropriate link, False otherwise.
Return type:: bool

lep_downloader.parser.is_tag_a_repeated(tag_a)¶

Check link to episode for repetition.

Repetitions are revealed in advance and placed in regex.

Parameters:: tag_a (Tag) – Tag object (<a>).
Returns:: True for repeated link, False otherwise.
Return type:: bool

lep_downloader.parser.parse_episode_number(post_title)¶

Parse episode number from post title.

Parameters:: post_title (str) – Post title (link text).
Returns:: Episode number. If number is not found, returns 0.
Return type:: int

lep_downloader.parser.parse_post_audio(soup)¶

Find links to audio(s) on episode page.

Parameters:: soup (BeautifulSoup) – Parsed HTML document of episode page.
Returns:: list of lists (for multi-part episode) with links to audio (or part).
Return type:: List[List[str]]

lep_downloader.parser.parse_post_publish_datetime(soup)¶

Extract value from HTML’s <time> tag.

Parameters:: soup (BeautifulSoup) – Parsed HTML document.
Returns:: Post datetime. If <time> tag is not found returns default value 1999-01-01T01:01:01+02:00.
Return type:: str

lep_downloader.parser.write_parsed_episodes_to_json(lep_objects, json_path='')¶

Serialize list of episodes to JSON file.

Parameters:

lep_objects (LepEpisodeList) – List of LepEpisode objects.
json_path (str) – Path to JSON file. Defaults to empty string.

Return type:

None

lep_downloader.exceptions¶

Module for LEP custom exceptions.

exception lep_downloader.exceptions.DataBaseUnavailableError(message='')¶

Bases: LepExceptionError

Raised when JSON database file is not available.

Parameters:: message (str) – Explanation of the error. Default is empty string.
Return type:: None

message: str¶: Explanation of the error.

exception lep_downloader.exceptions.LepEpisodeNotFoundError(episode, message='')¶

Bases: LepExceptionError

Raised when given episode URL is not available.

First argument serves to pass partially filled episode instance, in order to add it as ‘bad’ episode.

Parameters:

episode (LepEpisode) – Episode instance.
message (str) – Explanation of the error. Default is empty string.

Return type:

None

bad_episode: LepEpisode¶: Episode instance.

message: str¶: Explanation of the error.

exception lep_downloader.exceptions.LepExceptionError¶

Bases: Exception

Base class for exceptions in ‘lep_downloader’ package.

exception lep_downloader.exceptions.NoEpisodeLinksError(url='', message='')¶

Bases: LepExceptionError

Raised when no valid episode links on page.

Parameters:

url (str) – URL which has no episode links. Default is empty string.
message (str) – Explanation of the error. Default is empty string.

Return type:

None

message: str¶: Explanation of the error.

url: str¶: URL which has no episode links.

exception lep_downloader.exceptions.NoEpisodesInDataBaseError(message='')¶

Bases: LepExceptionError

Raised when JSON database has no any valid episode.

Parameters:: message (str) – Explanation of the error. Default is empty string.
Return type:: None

message: str¶: Explanation of the error.

exception lep_downloader.exceptions.NotEpisodeURLError(url='', message='')¶

Bases: LepExceptionError

Raised when given URL is not episode / archive URL.

Parameters:

url (str) – URL which has no <article> tag. Default is empty string.
message (str) – Explanation of the error. Default is empty string.

Return type:

None

message: str¶: Explanation of the error.

url: str¶: URL which has no <article> tag.