Markdown Parser
This module provides the ability to parse markdown-formatted text and convert formatting entities to TLRPC objects suitable for the Telegram API.
The markdown_utils.py module allows you to easily convert text with common Markdown V2-style formatting into a plain text string and a list of TLRPC.MessageEntity objects. These entities can then be used with client_utils.send_message or other API methods that accept formatted text.
Core Components
The parser returns a ParsedMessage object, which has two main attributes:
text: str: The plain text content with all Markdown markers removed.entities: Tuple[RawEntity, ...]: A tuple ofRawEntityobjects, each representing a formatting instruction.
Each RawEntity object contains:
type: TLEntityType: The type of the entity (e.g., bold, italic, code).offset: int: The starting position of the entity in thetext(UTF-16 code units).length: int: The length of the formatted segment in thetext(UTF-16 code units).language: Optional[str]: Forpre(code block) entities, the specified language.url: Optional[str]: Fortext_linkentities, the URL.document_id: Optional[int]: Forcustom_emojientities, the ID of the custom emoji document.
To convert RawEntity objects into TLRPC.MessageEntity objects suitable for the Telegram API, call the to_tlrpc_object() method on each RawEntity.
Supported Entity Types (TLEntityType)
The parser supports the following entity types:
BOLD(*bold*)ITALIC(_italic_)UNDERLINE(__underline__)STRIKETHROUGH(~strikethrough~)SPOILER(||spoiler||)CODE(inlinecode)PRE(code block) - can include an optional language specifier.TEXT_LINK([link text](http://example.com))CUSTOM_EMOJI([alt text](document_id)) -alt textbecomes the content of the entity,document_idis the emoji's ID.
Usage Example
This example demonstrates how to parse a Markdown string and send it as a formatted message.
Important Notes
- UTF-16 Offsets & Lengths: The
offsetandlengthinRawEntity(and the resultingTLRPC.MessageEntity) are calculated based on UTF-16 code units, as required by the Telegram API. The parser handles this conversion automatically. - Error Handling: If the Markdown syntax is incorrect (e.g., unclosed tags),
parse_markdownwill raise aSyntaxError. It's good practice to wrap the call in atry-exceptblock. - Nesting: Basic nesting of styles (e.g., bold inside italic) is generally supported, but complex or ambiguous nesting might lead to unexpected results.
- Escaping: Special Markdown characters (
*,_,~,|,`,[,],\) can be escaped with a backslash (\) if you want them to appear as literal characters. For example,\*not bold\*will render as*not bold*. - Code Blocks:
- Inline code is surrounded by single backticks (
`). - Fenced code blocks are surrounded by triple backticks (
). - An optional language identifier can be placed immediately after the opening triple backticks (e.g., ```python).
- Inline code is surrounded by single backticks (
- Custom Emoji: The syntax
[alt text](document_id)is used. Thealt text(e.g., the emoji character itself) becomes the text segment covered by theTLRPC.TL_messageEntityCustomEmojientity, anddocument_idis the ID of the custom emoji. You can obtain the emoji ID by sending the emoji to @AdsMarkdownBot on Telegram.
This parser provides a robust way to include rich text formatting in messages sent by your plugins.