Markdown Parser
This module provides the ability to parse markdown-formatted text and convert formatting entities to TLRPC objects suitable for the Telegram API.
The markdown_utils.py
module allows you to easily convert text with common Markdown V2-style formatting into a plain text string and a list of TLRPC.MessageEntity
objects. These entities can then be used with client_utils.send_message
or other API methods that accept formatted text.
Core Components
The parser returns a ParsedMessage
object, which has two main attributes:
text: str
: The plain text content with all Markdown markers removed.entities: Tuple[RawEntity, ...]
: A tuple ofRawEntity
objects, each representing a formatting instruction.
Each RawEntity
object contains:
type: TLEntityType
: The type of the entity (e.g., bold, italic, code).offset: int
: The starting position of the entity in thetext
(UTF-16 code units).length: int
: The length of the formatted segment in thetext
(UTF-16 code units).language: Optional[str]
: Forpre
(code block) entities, the specified language.url: Optional[str]
: Fortext_link
entities, the URL.document_id: Optional[int]
: Forcustom_emoji
entities, the ID of the custom emoji document.
To convert RawEntity
objects into TLRPC.MessageEntity
objects suitable for the Telegram API, call the to_tlrpc_object()
method on each RawEntity
.
Supported Entity Types (TLEntityType
)
The parser supports the following entity types:
BOLD
(*bold*
)ITALIC
(_italic_
)UNDERLINE
(__underline__
)STRIKETHROUGH
(~strikethrough~
)SPOILER
(||spoiler||
)CODE
(inlinecode
)PRE
(code block
) - can include an optional language specifier.TEXT_LINK
([link text](http://example.com)
)CUSTOM_EMOJI
([alt text](document_id)
) -alt text
becomes the content of the entity,document_id
is the emoji's ID.
Usage Example
This example demonstrates how to parse a Markdown string and send it as a formatted message.
Important Notes
- UTF-16 Offsets & Lengths: The
offset
andlength
inRawEntity
(and the resultingTLRPC.MessageEntity
) are calculated based on UTF-16 code units, as required by the Telegram API. The parser handles this conversion automatically. - Error Handling: If the Markdown syntax is incorrect (e.g., unclosed tags),
parse_markdown
will raise aSyntaxError
. It's good practice to wrap the call in atry-except
block. - Nesting: Basic nesting of styles (e.g., bold inside italic) is generally supported, but complex or ambiguous nesting might lead to unexpected results.
- Escaping: Special Markdown characters (
*
,_
,~
,|
,`
,[
,]
,\
) can be escaped with a backslash (\
) if you want them to appear as literal characters. For example,\*not bold\*
will render as*not bold*
. - Code Blocks:
- Inline code is surrounded by single backticks (
`
). - Fenced code blocks are surrounded by triple backticks (
- An optional language identifier can be placed immediately after the opening triple backticks (e.g., ```python).
- Inline code is surrounded by single backticks (
- Custom Emoji: The syntax
[alt text](document_id)
is used. Thealt text
(e.g., the emoji character itself) becomes the text segment covered by theTLRPC.TL_messageEntityCustomEmoji
entity, anddocument_id
is the ID of the custom emoji. You can obtain the emoji ID by sending the emoji to @AdsMarkdownBot on Telegram.
This parser provides a robust way to include rich text formatting in messages sent by your plugins.