Developer Reference¶
Classes and functions definitions for main modules. All descriptions are generated automatically from source code docstrings.
lep_downloader.lep¶
LEP module for general logic and classes.
- class lep_downloader.lep.Lep(session=None, log=None)¶
Bases:
objectRepresent base class for LEP’s general attributes and methods.
- Parameters:
session (requests.Session, optional) – Global session for descendants.
log (
LepLog, optional) – Log object where to output messages.
- cls_session¶
Class session. Default is taken from module variable
PROD_SES- Type:
requests.Session
- cls_lep_log¶
Class log object where to output messages. Default is LepLog() - only stdout output.
- Type:
- json_body¶
Content of JSON database file.
- Type:
str
- classmethod extract_only_valid_episodes(json_body, json_url=None)¶
Return list of valid (not None) LepEpisode objects.
- Parameters:
json_body (str) – Content of JSON database file.
json_url (str, optional) – JSON URL, only for printing it to output.
- Returns:
- List of
LepEpisodeobjects. It’s empty if there are no valid objects at all.
- List of
- Return type:
- classmethod get_db_episodes(json_url, session=None)¶
Get valid episodes from JSON.
- Parameters:
json_url (str) – URL to JSON database file.
session (requests.Session, optional) – Session object to send request. Default is
Lep.cls_session.
- Return type:
- Raises:
DataBaseUnavailableError – if JSON is unavailable
- classmethod get_web_document(page_url, session=None)¶
Get text content of web document (HTML, JSON, etc.).
- Parameters:
page_url (str) – URL for getting text response.
session (requests.Session, optional) – Session object to send request. Default is
Lep.cls_session.
- Returns:
- A tuple (resp.text, final_location, is_url_ok) where
resp.text (str) is text content of URL response
final_location (str) is location after all redirections
is_url_ok (bool) is flag of URL status
- Return type:
Tuple[str, str, bool]
- class lep_downloader.lep.LepEpisode(episode=0, date=datetime.datetime(2000, 1, 1, 0, 0, tzinfo=datetime.timezone.utc), url='', post_title='', post_type='', parsed_at='', index=0, files=None, admin_note='', updated_at='', html_title='')¶
Bases:
objectLEP episode class.
- Parameters:
episode (int) – Episode number.
date (str | datetime) – Post datetime. It will be converted to aware datetime object (with timezone). If None, defaults to datetime equaling “2000-01-01T00:00:00+00:00”.
url (str) – Final location of web post URL.
post_title (str) – Post title extracted from link text (unsafe).
post_type (str) – Post type (“AUDIO”, “TEXT”, etc.).
files (dict | None) – Dictionary with files for episode. Each key of it is a file category (“audios”, “audiotrack”, “page_pdf”, etc). If None defaults to empty dict.
parsed_at (str) – Parsing datetime in UTC timezone, with microseconds.
index (int) – Parsing index, concatenation of date from URL and increment (for several posts in a day).
admin_note (str) – Note for administrator and storing error message (for bad response during parsing)
updated_at (str) – Datetime in UTC when episode was updated (usually manually by admin).
html_title (str) – Page title extracted from HTML tag <title>. Important: Not stored in JSON database.
- property date: Any¶
Episode datetime (with timezone).
To be accurate, posting datetime on the website.
- property post_title: str¶
Post title converted to be safe for Windows path (filename).
Conversion via
replace_unsafe_chars().
- property short_date: str¶
Episode short date.
It’s the same as posting date in the episode URL, just formatted as “YYYY-MM-DD”.
- class lep_downloader.lep.LepEpisodeList(iterable=(), /)¶
Bases:
List[Any]Represent list of LepEpisode objects.
- default_start_date¶
Min date. It’s equal to “1999-01-01T00:01:00+00:00”
- Type:
datetime
- default_end_date¶
Max date. It’s equal to “2999-12-31T23:55:00+00:00”
- Type:
datetime
- desc_sort_by_date_and_index()¶
Sort LepEpisodeList by post datetime.
- Returns:
New sorted LepEpisodeList.
- Return type:
Notes
Sort is descending (last by date will be first). Sort goes by two attrs: “date” and “index”.
- filter_by_date(start=None, end=None)¶
Filter list by episode date.
- Parameters:
start (datetime, optional) – Episode date (left bound). If start is None, defaults to min date
LepEpisodeList.default_start_date.end (datetime, optional) – Episode date (right bound). If end is None, defaults to max date
LepEpisodeList.default_end_date.
- Returns:
New filtered LepEpisodeList.
- Return type:
Notes
If end < start - they are swapped.
- filter_by_number(start, end)¶
Filter list by episode number.
- Parameters:
start (int) – Episode number (left bound)
end (int) – Episode number (right bound)
- Returns:
New filtered LepEpisodeList.
- Return type:
Notes
If end < start - they are swapped.
- filter_by_type(type)¶
Filter list by episode type.
- Parameters:
type (str) – Episode type (“AUDIO”, “TEXT”, etc)
- Returns:
New filtered LepEpisodeList.
- Return type:
- class lep_downloader.lep.LepJsonEncoder(*, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, sort_keys=False, indent=None, separators=None, default=None)¶
Bases:
JSONEncoderCustom JSONEncoder for LepEpisode objects.
- default(obj)¶
Override ‘default’ method for encoding JSON objects.
- Parameters:
obj (Any) – Object for encoding.
- Returns:
- If object is
LepEpisodereturns dict. Otherwise, TypeError exception is raised.
- If object is
- Return type:
Any
- class lep_downloader.lep.LepLog(debug=False, logfile='_lep_debug_.log')¶
Bases:
objectRepresent LepLog object.
- Parameters:
debug (bool) – Debug mode flag. Defaults to False.
logfile (str) – Name of log file. Defaults to
config.DEBUG_FILENAME= “_lep_debug_.log”.
- debug¶
Debug mode flag (True / False).
- Type:
bool
- logfile¶
Name of log file.
- Type:
str
- lep_log¶
Custom loguru.logger object, which is returned from
init_lep_log().- Type:
loguru.logger
- msg(msg, *, skip_file=False, one_line=True, msg_lvl='PRINT', wait_input=False, **kwargs)¶
Output message to console or log file.
- Parameters:
msg (str) – Message to output. Supports loguru color markups.
skip_file (bool) – Flag to skip writing to logfile (even in Debug mode). Defaults to False.
one_line (bool) – Flag to replace new line character with Unicode char of it, i.e. ⏎. Defaults to True.
msg_lvl (str) – Message level. Defaults to “PRINT”.
wait_input (bool) – Flag to stay on line after printing message to console. Defaults to False.
kwargs (Any) – Arbitrary keyword arguments.
- Return type:
None
Notes
If Debug mode is False and message level is “PRINT”, method outputs to console only. Otherwise, it duplicates all console messages to log file too (with level PRINT). Also records (messages) for other log levels goes into file (if skip_file is not True).
- lep_downloader.lep.as_lep_episode_obj(dct)¶
Specialize JSON objects decoding.
- Parameters:
dct (dict) – Dictionary object from JSON (including nested dictionaries).
- Returns:
LepEpisodeobject or None.- Return type:
Any
Notes
If dictionary is empty or has “audios” key it’s returned “as-is”. Returns None if TypeError was raised.
- lep_downloader.lep.init_lep_log(debug=False, logfile='_lep_debug_.log')¶
Create custom logger object.
- Parameters:
debug (bool) – Debug log or not. Defaults to False.
logfile (str) – Name of the logfile. Defaults to
config.DEBUG_FILENAME= “_lep_debug_.log”
- Returns:
Custom loguru.logger object
- Return type:
Any
- lep_downloader.lep.logfile_formatter(record)¶
Return formatter string for log file sink.
- Parameters:
record (Any) – Loguru’s record dict.
- Returns:
- Format string for log file
{time:YYYY-MM-DD HH:mm:ss.SSS} | {level: <8} | "{message}" + LFLF - newline character here.
- Return type:
str
Note
2022-02-25 07:20:48.909 | PRINT | Running script...⏎ 2022-02-25 07:20:48.917 | PRINT | Starting parsing...
- lep_downloader.lep.replace_unsafe_chars(filename)¶
Replace most common invalid path characters with ‘_’.
- Parameters:
filename (str) – Filename (should be a string representing the final path component) without the drive and root.
- Returns:
Safe name for writing file on Windows OS (and others).
- Return type:
str
Example
>>> import lep_downloader.lep >>> unsafe = "What/ will: be* replaced?.mp3" >>> lep_downloader.lep.replace_unsafe_chars(unsafe) 'What_ will_ be_ replaced_.mp3'
- lep_downloader.lep.stdout_formatter(record)¶
Return formatter string for console sink.
- Parameters:
record (Any) – Loguru’s record dict.
- Returns:
- Format string for stdout log
"{message}" + end
- Return type:
str
Notes
Controling ending character for log message by storing it in the ‘extra’ dict and changing later via bind(). Default is the newline character.
lep_downloader.downloader¶
LEP module for downloading logic.
- class lep_downloader.downloader.ATrack(ep_id=0, name='', ext='.mp3', short_date='', filename='', primary_url='', secondary_url='', tertiary_url='', part_no=0)¶
Bases:
LepFileRepresent audio track object (to episode video or part of it).
- Parameters:
ep_id (int) – Episode index. Defaults to 0.
name (str) – File name (without extension). Defaults to empty str.
ext (str) – File extension. Defaults to “.mp3”.
short_date (str) – Episode date (format “YYYY-MM-DD”). Defaults to empty str.
filename (str) – File name + extension. Defaults to empty str.
primary_url (str) – Primary URL to download file. Defaults to empty str.
secondary_url (str) – Secondary URL to download file. Defaults to empty str.
tertiary_url (str) – Tertiary URL to download file. Defaults to empty str.
part_no (int) – Part number. Defaults to 0.
Notes
- Filename depends on part number.
- If part_no = 0,
composed as
f"[{short_date}] # {name}" + " _aTrack_" + ext
- If part_no > 0,
f"[{short_date}] # {name}" + " [Part NN]" + " _aTrack_" + ext
Other attrs see
LepFile- ext: str = '.mp3'¶
Extension for audio track file.
- part_no: int = 0¶
Part number.
- class lep_downloader.downloader.Audio(ep_id=0, name='', ext='.mp3', short_date='', filename='', primary_url='', secondary_url='', tertiary_url='', part_no=0)¶
Bases:
LepFileRepresent audio object to episode (or part of it).
- Parameters:
ep_id (int) – Episode index. Defaults to 0.
name (str) – File name (without extension). Defaults to empty str.
ext (str) – File extension. Defaults to “.mp3”.
short_date (str) – Episode date (format “YYYY-MM-DD”). Defaults to empty str.
filename (str) – File name + extension. Defaults to empty str.
primary_url (str) – Primary URL to download file. Defaults to empty str.
secondary_url (str) – Secondary URL to download file. Defaults to empty str.
tertiary_url (str) – Tertiary URL to download file. Defaults to empty str.
part_no (int) – Part number. Defaults to 0.
Notes
- Filename depends on part number.
If part_no = 0, composed as
f"[{short_date}] # {name}" + extIf part_no > 0,
f"[{short_date}] # {name}" + " [Part NN]" + ext
Other attrs see
LepFile- ext: str = '.mp3'¶
Extension for audio file.
- part_no: int = 0¶
Part number.
- class lep_downloader.downloader.LepDL(json_url='https://hotenov.com/d/lep/v3-lep-db.min.json', session=None, log=None)¶
Bases:
LepRepresent downloader object.
- Parameters:
json_url (str) – URL to JSON database
session (requests.Session) – Requests session object. If None defaults to global session
lep.PROD_SES.log (LepLog) – Log instance where to output messages.
- db_episodes: LepEpisodeList¶
List of episodes in JSON database.
- db_urls: Dict[str, str]¶
Dictionary “URL - post title”.
- detach_existed_files(save_dir, files=None)¶
Detach ‘existed’ files from non ‘non_existed’.
- Parameters:
save_dir (Path) – Folder for saving files.
files (LepFileList, optional) – List of files. If None, defaults to self ‘files’ attribute.
- Return type:
None
- download_files(save_dir)¶
Download files from ‘non_existed’ attribute list.
For reliability: If primary link is not available, method will try to download other two links (if they present).
- Parameters:
save_dir (Path) – Path to folder where to save files.
- Return type:
None
- downloaded: LepFileList¶
List of downloaded files.
- existed: LepFileList¶
List of existing files on disc.
- files: LepFileList¶
List of all files (gathered for downloading).
- get_remote_episodes()¶
Get database episodes from remote JSON database.
After retreiving episodes, also extract all URLs and their titles and store them in ‘db_urls’ attribute.
- Return type:
None
- json_url: str¶
URL to JSON database.
- non_existed: LepFileList¶
List of non-existing files on disc.
- not_found: LepFileList¶
List of unavailable files.
- populate_default_url()¶
Fill in secondary download url (if it is empty) with default value.
Iterate over ‘files’ attribute list. Default value composed as:
config.DOWNLOADS_BASE_URL+ url-encoded filename.- Return type:
None
- class lep_downloader.downloader.LepFile(ep_id=0, name='', ext='', short_date='', filename='', primary_url='', secondary_url='', tertiary_url='')¶
Bases:
objectRepresent base class for LEP file object.
- Parameters:
ep_id (int) – Episode index. Defaults to 0.
name (str) – File name (without extension). Defaults to empty str.
ext (str) – File extension. Defaults to empty str.
short_date (str) – Episode date (format “YYYY-MM-DD”). Defaults to empty str.
filename (str) – File name + extension. Defaults to empty str.
primary_url (str) – Primary URL to download file. Defaults to empty str.
secondary_url (str) – Secondary URL to download file. Defaults to empty str.
tertiary_url (str) – Tertiary URL to download file. Defaults to empty str.
- ep_id: int = 0¶
Episode index.
- ext: str = ''¶
File extension.
- filename: str = ''¶
File name + extension.
- name: str = ''¶
File name (without extension).
- primary_url: str = ''¶
Primary URL to download file.
- secondary_url: str = ''¶
Secondary URL to download file.
- short_date: str = ''¶
Episode date (format “YYYY-MM-DD”).
- tertiary_url: str = ''¶
Tertiary URL to download file.
- class lep_downloader.downloader.LepFileList(iterable=(), /)¶
Bases:
List[Any]Represent list of LepFile objects.
- filter_by_type(*file_types)¶
Filter list by file type(s).
- Parameters:
file_types (Any) – Variable length argument list of file types (Audio, PagePDF, ATrack, and others).
- Returns:
New filtered LepFileList.
- Return type:
- class lep_downloader.downloader.PagePDF(ep_id=0, name='', ext='.pdf', short_date='', filename='', primary_url='', secondary_url='', tertiary_url='')¶
Bases:
LepFileRepresent PDF file of episode page.
- Parameters:
ep_id (int) – Episode index. Defaults to 0.
name (str) – File name (without extension). Defaults to empty str.
ext (str) – File extension. Defaults to “.pdf”.
short_date (str) – Episode date (format “YYYY-MM-DD”). Defaults to empty str.
filename (str) – File name + extension. Defaults to empty str.
primary_url (str) – Primary URL to download file. Defaults to empty str.
secondary_url (str) – Secondary URL to download file. Defaults to empty str.
tertiary_url (str) – Tertiary URL to download file. Defaults to empty str.
Notes
Filename is composed after initialization other attrs as:
f"[{short_date}] # {name}" + extOther attrs see
LepFile- ext: str = '.pdf'¶
Extension for PDF file.
- lep_downloader.downloader.URL_ENCODED_CHARS_PATTERN = re.compile('%[0-9A-Z]{2}')¶
Pattern for matching %-encoded Unicode characters.
- Type:
re.Pattern
- lep_downloader.downloader.append_each_audio_to_container_list(ep_id, name, short_date, audios, file_class)¶
Relate links for each audio file with episode.
And put audio as ‘Audio’ or ‘ATrack’ object to container list of LepFile objects.
- lep_downloader.downloader.append_page_pdf_file_to_container_list(ep_id, name, short_date, page_pdf)¶
Relate links for page PDF file with episode.
And put it as ‘PagePDF’ object to container list of LepFile objects.
- Parameters:
ep_id (int) – Episode number.
name (str) – File name (without extension).
short_date (str) – Date (format “YYYY-MM-DD”).
page_pdf (list[str]) – List of URLs for page PDF file.
- Return type:
None
- lep_downloader.downloader.crawl_list(links)¶
Crawl list of links and return tuple of three links.
For absent URL empty string is assigned.
- Parameters:
links (list[str]) – List of URLs (for one file).
- Returns:
A tuple of three strings (URLs).
- Return type:
Tuple[str, str, str]
- lep_downloader.downloader.detect_existing_files(save_dir, files)¶
Separate list for existing and non-existing files.
Method scans all files in the directory and composes list of filtered files by extensions: mp3, pdf, mp4. Then it separates ‘files’ list on two: existed files and non-existed files (iterating over filtered files in the directory, not all).
- Parameters:
save_dir (Path) – Path to destination folder.
files (LepFileList) – List of LepFile objects.
- Returns:
A tuple with two lists: existed, non_existed.
- Return type:
Tuple[LepFileList, LepFileList]
- lep_downloader.downloader.download_and_write_file(url, session, save_dir, filename, log)¶
Download a file by URL and save it.
- Parameters:
url (str) – URL to file.
session (requests.Session) – Session to send request.
save_dir (Path) – Folder where to save file.
filename (str) – Filename (with extension).
log (LepLog) – Log object where to print messages.
- Returns:
Status operation. True for success, False otherwise.
- Return type:
bool
- lep_downloader.downloader.extract_urls_from_episode_list(episodes)¶
Extract page URL and its title for each episode object in list.
- Parameters:
episodes (LepEpisodeList) – List of episodes.
- Returns:
Dictionary “URL - post title”.
- Return type:
dict[str, str]
- lep_downloader.downloader.files_box = []¶
Module level container list of LepFile objects.
- Type:
- lep_downloader.downloader.gather_all_files(lep_episodes)¶
Skim list of episodes and collect all files.
- Parameters:
lep_episodes (LepEpisodeList) – List of LepEpisode objects.
- Returns:
Module’s container list
files_box.- Return type:
- lep_downloader.downloader.url_encoded_chars_to_lower_case(url)¶
Change %-escaped chars in string to lower case.
- Parameters:
url (str) – URL with uppercase unicode characters.
- Returns:
URL with lowercase unicode characters.
- Return type:
str
Example
>>> import lep_downloader.downloader >>> url = "https://teacherluke.co.uk/2016/03/01/333-more-misheard-lyrics-%E2%99%AC/" >>> lep_downloader.downloader.url_encoded_chars_to_lower_case(url) 'https://teacherluke.co.uk/2016/03/01/333-more-misheard-lyrics-%e2%99%ac/'
lep_downloader.parser¶
LEP module for parsing logic.
- class lep_downloader.parser.Archive(url='https://teacherluke.co.uk/archive-of-episodes-1-149/', session=None, mode='fetch', with_html=False, html_path=None, log=None)¶
Bases:
LepRepresent archive page object.
- Parameters:
url (str) – URL to LEP Archive page. Defaults to
config.ARCHIVE_URL.session (requests.Session) – Session to send requests. If None, defaults to super’s (global) session from
lep.PROD_SES.mode (str) – Parsing mode (“raw” | “fetch” | “pull”). Defaults to “fetch”.
with_html (bool) – Flag to save HTML file for parsed web page. Defaults to False.
html_path (str, optional) – Path to folder where HTML files will be saved. If None, it will be later replaced with
config.PATH_TO_HTML_FILES.log (LepLog, optional) – Log instance. If None, global (super’s) value LepLog() will be set (output to console only).
- collected_links: Dict[str, str]¶
Valid episodes links on archive page.
- deleted_links: Set[str]¶
Deleted (invalid) links.
- do_parsing_actions(json_url, json_name='')¶
Do parsing job.
- Parameters:
json_url (str) – URL to remote JSON database.
json_name (str) – Name for JSON local file (with parsing results).
- Return type:
None
- Raises:
NoEpisodesInDataBaseError – If JSON database has no episodes at all.
- episodes: LepEpisodeList¶
List of archive episodes.
- html_path: str | None¶
Path to folder for saving HTMLs.
- mode: str¶
Parsing mode.
- parse_each_episode(urls)¶
Parse each episode in dictionary of URLs.
- Parameters:
urls (Dict[str, str]) – Dictionary of differing URLs (or all URLs in case of “raw” mode).
- Return type:
None
- parser: ArchiveParser¶
Parser instance.
- take_updates(db_urls, archive_urls=None, mode='fetch')¶
Take differing URLs between database and archive page.
Difference is determined according to parsing mode: “fetch” or “pull”.
- Parameters:
db_urls (Dict[str, str]) – URLs dictionary of database.
archive_urls (Dict[str, str], optional) – URLs dictionary of archive. If None, takes attribute dictionary ‘collected_links’.
mode (str) – Parsing mode. Defaults to “fetch”.
- Returns:
Difference dictionary or None (for “fetch” mode when database contains more episodes than archive).
- Return type:
Any
- url: str¶
URL to LEP Archive page.
- used_indexes: Set[int]¶
Set of indexes.
- with_html: bool¶
Flag to save HTML files.
- write_text_to_html(text, file_stem, path=None, ext='.html')¶
Write text to HTML file.
- Parameters:
text (str) – Text (HTML content) to be written to file.
file_stem (str) – Name of the file (without extension).
path (str, optional) – Folder path where HTML files will be saved. If None, defaults to
config.PATH_TO_HTML_FILES(it’s nested folder./data_dumpin app folder).ext (str) – Extension for HTML file. Defaults to “.html”.
- Return type:
None
- class lep_downloader.parser.ArchiveParser(archive_obj, url, session=None, log=None)¶
Bases:
LepParserParser for archive page.
- Parameters:
- collect_links()¶
Parse all links matching episode URL and their texts.
Ignoring repeated links. One more case is unlikely to be true, but if an archive page consists completely of repeated links, method silently skips them (as if there were no episodes at all).
- Raises:
NoEpisodeLinksError – If there are no episode links on archive page.
- Return type:
None
- do_post_parsing()¶
Remove irrelevant links and substitute short links.
- Return type:
None
- do_pre_parsing()¶
Substitute link with ‘.ukm’ misspelled TLD in HTML content.
- Return type:
None
- remove_irrelevant_links()¶
Delete known irrelevant links from dictionary.
First, irrelevant links is saved into ‘deleted_links’ attribute before deletion them from dictionary. Then dictionary is rebuilt ignoring irrelevant links.
- Return type:
None
- substitute_short_links()¶
Paste final URL location instead of short links.
- Return type:
None
- class lep_downloader.parser.EpisodeParser(archive_obj, page_url, session=None, post_title='', log=None)¶
Bases:
LepParserParser for episode page.
- Parameters:
archive_obj (Archive) – Archive instance.
page_url (str) – Target page URL.
session (requests.Session, optional) – Parsing session. Defaults to None. If None, takes global session from
lep.PROD_SES.post_title (str) – Link text for this episode.
log (LepLog, optional) – Log instance to output parsing messages. Defaults to None.
- collect_links()¶
Parse link(s) to episode audio(s).
Also parse datetime of episode publishing.
- Return type:
None
- do_post_parsing()¶
Post parsing actions for EpisodeParser.
No actions - just pass.
- Return type:
None
- do_pre_parsing()¶
Parse episode date, number, HTML title and generate index.
- Raises:
NotEpisodeURLError – If URL does not contain date.
LepEpisodeNotFoundError – If URL is not available.
- Return type:
None
- episode¶
Episode instance.
- used_indexes¶
Used indexes from archive instance.
- class lep_downloader.parser.LepParser(archive_obj, url, session=None, log=None)¶
Bases:
LepBase class for LEP parsers.
- Parameters:
- archive¶
Archive instance.
- collect_links()¶
Parse all links by parser own rules.
- Raises:
NotImplementedError – This method must be implemented.
- Return type:
None
- content: str¶
Page content.
- do_post_parsing()¶
Finalize and process parsing results.
- Raises:
NotImplementedError – This method must be implemented.
- Return type:
None
- do_pre_parsing()¶
Prepare for parsing.
It might be: extracting data from URL, clearing / replacement tags, etc.
- Raises:
NotImplementedError – This method must be implemented.
- Return type:
None
- final_location: str¶
Final location of target URL. In case of redirects.
- get_url()¶
Retrieve target web page.
Method result are saved in attributes:
content
final_location
is_url_ok
- Return type:
None
- is_url_ok: bool¶
URL getting status.
- parse_dom_for_article_container()¶
Parse DOM for HTML’s <article> tag only.
This is common step for parsers.
- Raises:
NotEpisodeURLError – If target page has now HTML’s <article> tag.
- Return type:
None
- parse_url()¶
Perform parsing steps.
- Return type:
None
- soup: BeautifulSoup¶
Parsed HTML as BeautifulSoup object.
- url¶
Target page URL.
- lep_downloader.parser.convert_date_from_url(url)¶
Extract date from URL and then convert it to ‘datetime’ object.
- Parameters:
url (str) – URL to episode.
- Returns:
Naive datetime.
- Return type:
datetime
- lep_downloader.parser.extract_date_from_url(url)¶
Parse date from URL.
- Parameters:
url (str) – URL to episode.
- Returns:
Date in YYYY/MM/DD format. If date is not found, returns empty string.
- Return type:
str
- lep_downloader.parser.generate_post_index(post_url, indexes)¶
Generate index number for post from URL.
- Parameters:
post_url (str) – URL to episode.
indexes (Set[int]) – Already used indexes.
- Returns:
Index number. If URL is not valid, returns 0.
- Return type:
int
- lep_downloader.parser.has_tag_a_appropriate_audio(tag_a)¶
Check link text for “download” audio purpose.
Key words are revealed in advance and placed in regex.
- Parameters:
tag_a (Tag) – Tag object (<a>).
- Returns:
True for appropriate link, False otherwise.
- Return type:
bool
- lep_downloader.parser.is_tag_a_repeated(tag_a)¶
Check link to episode for repetition.
Repetitions are revealed in advance and placed in regex.
- Parameters:
tag_a (Tag) – Tag object (<a>).
- Returns:
True for repeated link, False otherwise.
- Return type:
bool
- lep_downloader.parser.parse_episode_number(post_title)¶
Parse episode number from post title.
- Parameters:
post_title (str) – Post title (link text).
- Returns:
Episode number. If number is not found, returns 0.
- Return type:
int
- lep_downloader.parser.parse_post_audio(soup)¶
Find links to audio(s) on episode page.
- Parameters:
soup (BeautifulSoup) – Parsed HTML document of episode page.
- Returns:
list of lists (for multi-part episode) with links to audio (or part).
- Return type:
List[List[str]]
- lep_downloader.parser.parse_post_publish_datetime(soup)¶
Extract value from HTML’s <time> tag.
- Parameters:
soup (BeautifulSoup) – Parsed HTML document.
- Returns:
Post datetime. If <time> tag is not found returns default value
1999-01-01T01:01:01+02:00.- Return type:
str
- lep_downloader.parser.write_parsed_episodes_to_json(lep_objects, json_path='')¶
Serialize list of episodes to JSON file.
- Parameters:
lep_objects (LepEpisodeList) – List of LepEpisode objects.
json_path (str) – Path to JSON file. Defaults to empty string.
- Return type:
None
lep_downloader.exceptions¶
Module for LEP custom exceptions.
Bases:
LepExceptionErrorRaised when JSON database file is not available.
- Parameters:
message (str) – Explanation of the error. Default is empty string.
- Return type:
None
Explanation of the error.
- exception lep_downloader.exceptions.LepEpisodeNotFoundError(episode, message='')¶
Bases:
LepExceptionErrorRaised when given episode URL is not available.
First argument serves to pass partially filled episode instance, in order to add it as ‘bad’ episode.
- Parameters:
episode (LepEpisode) – Episode instance.
message (str) – Explanation of the error. Default is empty string.
- Return type:
None
- bad_episode: LepEpisode¶
Episode instance.
- message: str¶
Explanation of the error.
- exception lep_downloader.exceptions.LepExceptionError¶
Bases:
ExceptionBase class for exceptions in ‘lep_downloader’ package.
- exception lep_downloader.exceptions.NoEpisodeLinksError(url='', message='')¶
Bases:
LepExceptionErrorRaised when no valid episode links on page.
- Parameters:
url (str) – URL which has no episode links. Default is empty string.
message (str) – Explanation of the error. Default is empty string.
- Return type:
None
- message: str¶
Explanation of the error.
- url: str¶
URL which has no episode links.
- exception lep_downloader.exceptions.NoEpisodesInDataBaseError(message='')¶
Bases:
LepExceptionErrorRaised when JSON database has no any valid episode.
- Parameters:
message (str) – Explanation of the error. Default is empty string.
- Return type:
None
- message: str¶
Explanation of the error.
- exception lep_downloader.exceptions.NotEpisodeURLError(url='', message='')¶
Bases:
LepExceptionErrorRaised when given URL is not episode / archive URL.
- Parameters:
url (str) – URL which has no <article> tag. Default is empty string.
message (str) – Explanation of the error. Default is empty string.
- Return type:
None
- message: str¶
Explanation of the error.
- url: str¶
URL which has no <article> tag.