Fix for a buffer overflow#167
Conversation
…fuzzying (in c-string format): "\xA9##r[](r[](".
"\n# h1\nc hh##e2ked\n\n A | rong__ ___strong \u0000\u0000\u0000\u0000\u0000\u0000\a\u0000\u0000\u0000\u0000\n# h1\nh# mity#2\n### h3\n#### h4\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\\\n##### h5\n#*#####\u0000\n6"
Codecov Report
@@ Coverage Diff @@
## master #167 +/- ##
==========================================
+ Coverage 94.33% 94.39% +0.06%
==========================================
Files 3 3
Lines 3088 3089 +1
==========================================
+ Hits 2913 2916 +3
+ Misses 175 173 -2
Continue to review full report at Codecov.
|
|
And another one, this time for: |
|
Thank you. Will merge shortly. (And sorry for the late answer, was out of github for some months.) |
## Version 0.5.1
Changes:
* LaTeX math extension (`MD_FLAG_LATEXMATHSPANS`) now requires that opener
mark is not immediately preceded with alpha-numeric character and similarly
that closer mark is not immediately followed with alpha-numeric character.
So for example `foo$ x + y = z $` is not recognized as LaTeX equation
anymore because there is no space between `foo` and the opening `$`.
* Table extension (`MD_FLAG_TABLES`) now recognizes only tables with no more
than 128 columns. This limit has been imposed to prevent a pathological
case of quadratic output size explosion which could be used as DoS attack
vector.
* We are now more strict with `MD_FLAG_PERMISSIVExxxAUTOLINKS` family of
extensions with respect to non-alphanumeric characters, with the aim to
mitigate false positive detections.
Only relatively few selected non-alphanumeric are now allowed in permissive
e-mail auto-links (`MD_FLAG_PERMISSIVEEMAILAUTOLINKS`):
- `.`, `-`, `_`, `+` in user name part of e-mail address; and
- `.`, `-`, `_` in host part of the e-mail address.
Similarly for URL and e-mail auto-links (`MD_FLAG_PERMISSIVEURLAUTOLINKS` and
`MD_FLAG_PERMISSIVEWWWAUTOLINKS`):
- `.`, `-`, `_` in host part of the URL;
- `/`, `.`, `-`, `_` in path part of the URL;
- `&`, `.`, `-`, `+`, `_`, `=`, `(`, `)` in the query part of the URL
(additionally, if present, `(` and `)` must form balanced pairs); and
- `.`, `-`, `+`, `_` in the fragment part of the URL.
Furthermore these characters (with some exceptions like where they serve as
delimiter characters, e.g. `/` for paths) are generally accepted only when
an alphanumeric character both precedes and follows them (i.e. these cannot
be "stacked" together).
Fixes:
* Fix several bugs where we haven't properly respected already resolved spans
of higher precedence level in handling of permissive auto-links extensions
(family of `MD_FLAG_PERMISSIVExxxAUTOLINKS` flags), LaTeX math extension
(`MD_FLAG_LATEXMATHSPANS`) and wiki-links extension (`MD_FLAG_WIKILINKS`)
of the form `[[label|text]]` (with pipe `|`). In some complex cases this
could lead to invalid internal parser state and memory corruption.
Identified with [OSS-Fuzz](https://github.com/google/oss-fuzz).
* [#222](mity/md4c#222):
Fix strike-through extension (`MD_FLAG_STRIKETHROUGH`) which did not respect
same rules for pairing opener and closer marks as other emphasis spans.
* [#223](mity/md4c#223):
Fix incorrect handling of new-line character just at the beginning and/or
end of a code span where we were not following CommonMark specification
requirements correctly.
## Version 0.5.0
Changes:
* Changes mandated by CommonMark specification 0.30.
Actually there are only very minor changes to recognition of HTML blocks:
- The tag `<textarea>` now triggers HTML block (of type 1 as per the
specification).
- HTML declaration (HTML block type 4) is not required to begin with an
upper-case ASCII character after the `<!`. Any ASCII character is now
allowed. Also it now doesn't require a whitespace before the closing `>`.
Other than that, the newest specification mainly improves test coverage and
clarifies its wording in some cases, without affecting the implementation.
Refer to [CommonMark
0.30 notes](https://github.com/commonmark/commonmark-spec/releases/tag/0.30)
for more info.
* Make Unicode-specific code compliant to Unicode 15.1.
* Update list of entities known to the HTML renderer from
https://html.spec.whatwg.org/entities.json.
New Features:
* Add extension allowing to treat all soft break as hard ones. It has to be
explicitly enabled with `MD_FLAG_HARD_SOFT_BREAKS`.
Contributed by [l-m](https://github.com/l1mey112).
* Structure `MD_SPAN_A_DETAIL` now has a new member `is_autolink`.
Contributed by [Jens Alfke](https://github.com/snej).
* `md2html` utility now supports command line options `--html-title` and
`--html-css`.
Contributed by [Andreas Baumann](https://github.com/andreasbaumann).
Fixes:
* [#163](mity/md4c#163):
Make HTML renderer to emit `'\n'` after the root tag when in the XHTML mode.
* [#165](mity/md4c#165):
Make HTML renderer not to percent-encode `'~'` in URLs. Although it does
work, it's not needed, and it can actually be confusing with URLs such as
`http://www.example.com/~johndoe/`.
* [#167](mity/md4c#167),
[#168](mity/md4c#168):
Fix multiple instances of various buffer overflow bugs, found mostly using
a fuzz testing. Contributed by [dtldarek](https://github.com/dtldarek) and
[Thierry Coppey](https://github.com/TCKnet).
* [#169](mity/md4c#169):
Table underline now does not require 3 characters per table column anymore.
One dash (optionally with a leading or tailing `:` appended or prepended)
is now sufficient. This improves compatibility with the GFM.
* [#172](mity/md4c#172):
Fix quadratic time behavior caused by unnecessary lookup for link reference
definition even if the potential label contains nested brackets.
* [#173](mity/md4c#173),
[#174](mity/md4c#174),
[#212](mity/md4c#212),
[#213](mity/md4c#213):
Multiple bugs identified with [OSS-Fuzz](https://github.com/google/oss-fuzz)
were fixed.
* [#190](mity/md4c#190),
[#200](mity/md4c#200),
[#201](mity/md4c#201):
Multiple fixes of incorrect interactions of indented code block with a
preceding block.
* [#202](mity/md4c#202):
We were not correctly calling `enter_block()` and `leave_block()` callbacks
if multiple HTML blocks followed one after another; instead previously
such blocks were merged into one.
(This may likely impact only applications interested in Markdown's AST,
and not just converting Markdown to other formats like HTML.)
* [#210](mity/md4c#210):
The `md2html` utility now handles nested images with optional titles
correctly.
* [#214](mity/md4c#214):
Tags `<h2>` ... `<h6>` incorrectly did not trigger HTML block.
* [#215](mity/md4c#215):
The parser incorrectly did not accept optional tabs after setext header
underline.
* [#217](mity/md4c#217):
The parser incorrectly resolved emphasis in some situations, if the emphasis
marks were enclosed by punctuation characters.
## Version 0.5.1
Changes:
* LaTeX math extension (`MD_FLAG_LATEXMATHSPANS`) now requires that opener
mark is not immediately preceded with alpha-numeric character and similarly
that closer mark is not immediately followed with alpha-numeric character.
So for example `foo$ x + y = z $` is not recognized as LaTeX equation
anymore because there is no space between `foo` and the opening `$`.
* Table extension (`MD_FLAG_TABLES`) now recognizes only tables with no more
than 128 columns. This limit has been imposed to prevent a pathological
case of quadratic output size explosion which could be used as DoS attack
vector.
* We are now more strict with `MD_FLAG_PERMISSIVExxxAUTOLINKS` family of
extensions with respect to non-alphanumeric characters, with the aim to
mitigate false positive detections.
Only relatively few selected non-alphanumeric are now allowed in permissive
e-mail auto-links (`MD_FLAG_PERMISSIVEEMAILAUTOLINKS`):
- `.`, `-`, `_`, `+` in user name part of e-mail address; and
- `.`, `-`, `_` in host part of the e-mail address.
Similarly for URL and e-mail auto-links (`MD_FLAG_PERMISSIVEURLAUTOLINKS` and
`MD_FLAG_PERMISSIVEWWWAUTOLINKS`):
- `.`, `-`, `_` in host part of the URL;
- `/`, `.`, `-`, `_` in path part of the URL;
- `&`, `.`, `-`, `+`, `_`, `=`, `(`, `)` in the query part of the URL
(additionally, if present, `(` and `)` must form balanced pairs); and
- `.`, `-`, `+`, `_` in the fragment part of the URL.
Furthermore these characters (with some exceptions like where they serve as
delimiter characters, e.g. `/` for paths) are generally accepted only when
an alphanumeric character both precedes and follows them (i.e. these cannot
be "stacked" together).
Fixes:
* Fix several bugs where we haven't properly respected already resolved spans
of higher precedence level in handling of permissive auto-links extensions
(family of `MD_FLAG_PERMISSIVExxxAUTOLINKS` flags), LaTeX math extension
(`MD_FLAG_LATEXMATHSPANS`) and wiki-links extension (`MD_FLAG_WIKILINKS`)
of the form `[[label|text]]` (with pipe `|`). In some complex cases this
could lead to invalid internal parser state and memory corruption.
Identified with [OSS-Fuzz](https://github.com/google/oss-fuzz).
* [#222](mity/md4c#222):
Fix strike-through extension (`MD_FLAG_STRIKETHROUGH`) which did not respect
same rules for pairing opener and closer marks as other emphasis spans.
* [#223](mity/md4c#223):
Fix incorrect handling of new-line character just at the beginning and/or
end of a code span where we were not following CommonMark specification
requirements correctly.
## Version 0.5.0
Changes:
* Changes mandated by CommonMark specification 0.30.
Actually there are only very minor changes to recognition of HTML blocks:
- The tag `<textarea>` now triggers HTML block (of type 1 as per the
specification).
- HTML declaration (HTML block type 4) is not required to begin with an
upper-case ASCII character after the `<!`. Any ASCII character is now
allowed. Also it now doesn't require a whitespace before the closing `>`.
Other than that, the newest specification mainly improves test coverage and
clarifies its wording in some cases, without affecting the implementation.
Refer to [CommonMark
0.30 notes](https://github.com/commonmark/commonmark-spec/releases/tag/0.30)
for more info.
* Make Unicode-specific code compliant to Unicode 15.1.
* Update list of entities known to the HTML renderer from
https://html.spec.whatwg.org/entities.json.
New Features:
* Add extension allowing to treat all soft break as hard ones. It has to be
explicitly enabled with `MD_FLAG_HARD_SOFT_BREAKS`.
Contributed by [l-m](https://github.com/l1mey112).
* Structure `MD_SPAN_A_DETAIL` now has a new member `is_autolink`.
Contributed by [Jens Alfke](https://github.com/snej).
* `md2html` utility now supports command line options `--html-title` and
`--html-css`.
Contributed by [Andreas Baumann](https://github.com/andreasbaumann).
Fixes:
* [NetBSD#163](mity/md4c#163):
Make HTML renderer to emit `'\n'` after the root tag when in the XHTML mode.
* [NetBSD#165](mity/md4c#165):
Make HTML renderer not to percent-encode `'~'` in URLs. Although it does
work, it's not needed, and it can actually be confusing with URLs such as
`http://www.example.com/~johndoe/`.
* [NetBSD#167](mity/md4c#167),
[NetBSD#168](mity/md4c#168):
Fix multiple instances of various buffer overflow bugs, found mostly using
a fuzz testing. Contributed by [dtldarek](https://github.com/dtldarek) and
[Thierry Coppey](https://github.com/TCKnet).
* [NetBSD#169](mity/md4c#169):
Table underline now does not require 3 characters per table column anymore.
One dash (optionally with a leading or tailing `:` appended or prepended)
is now sufficient. This improves compatibility with the GFM.
* [NetBSD#172](mity/md4c#172):
Fix quadratic time behavior caused by unnecessary lookup for link reference
definition even if the potential label contains nested brackets.
* [NetBSD#173](mity/md4c#173),
[NetBSD#174](mity/md4c#174),
[#212](mity/md4c#212),
[#213](mity/md4c#213):
Multiple bugs identified with [OSS-Fuzz](https://github.com/google/oss-fuzz)
were fixed.
* [#190](mity/md4c#190),
[#200](mity/md4c#200),
[#201](mity/md4c#201):
Multiple fixes of incorrect interactions of indented code block with a
preceding block.
* [#202](mity/md4c#202):
We were not correctly calling `enter_block()` and `leave_block()` callbacks
if multiple HTML blocks followed one after another; instead previously
such blocks were merged into one.
(This may likely impact only applications interested in Markdown's AST,
and not just converting Markdown to other formats like HTML.)
* [#210](mity/md4c#210):
The `md2html` utility now handles nested images with optional titles
correctly.
* [#214](mity/md4c#214):
Tags `<h2>` ... `<h6>` incorrectly did not trigger HTML block.
* [#215](mity/md4c#215):
The parser incorrectly did not accept optional tabs after setext header
underline.
* [#217](mity/md4c#217):
The parser incorrectly resolved emphasis in some situations, if the emphasis
marks were enclosed by punctuation characters.
## Version 0.5.1
Changes:
* LaTeX math extension (`MD_FLAG_LATEXMATHSPANS`) now requires that opener
mark is not immediately preceded with alpha-numeric character and similarly
that closer mark is not immediately followed with alpha-numeric character.
So for example `foo$ x + y = z $` is not recognized as LaTeX equation
anymore because there is no space between `foo` and the opening `$`.
* Table extension (`MD_FLAG_TABLES`) now recognizes only tables with no more
than 128 columns. This limit has been imposed to prevent a pathological
case of quadratic output size explosion which could be used as DoS attack
vector.
* We are now more strict with `MD_FLAG_PERMISSIVExxxAUTOLINKS` family of
extensions with respect to non-alphanumeric characters, with the aim to
mitigate false positive detections.
Only relatively few selected non-alphanumeric are now allowed in permissive
e-mail auto-links (`MD_FLAG_PERMISSIVEEMAILAUTOLINKS`):
- `.`, `-`, `_`, `+` in user name part of e-mail address; and
- `.`, `-`, `_` in host part of the e-mail address.
Similarly for URL and e-mail auto-links (`MD_FLAG_PERMISSIVEURLAUTOLINKS` and
`MD_FLAG_PERMISSIVEWWWAUTOLINKS`):
- `.`, `-`, `_` in host part of the URL;
- `/`, `.`, `-`, `_` in path part of the URL;
- `&`, `.`, `-`, `+`, `_`, `=`, `(`, `)` in the query part of the URL
(additionally, if present, `(` and `)` must form balanced pairs); and
- `.`, `-`, `+`, `_` in the fragment part of the URL.
Furthermore these characters (with some exceptions like where they serve as
delimiter characters, e.g. `/` for paths) are generally accepted only when
an alphanumeric character both precedes and follows them (i.e. these cannot
be "stacked" together).
Fixes:
* Fix several bugs where we haven't properly respected already resolved spans
of higher precedence level in handling of permissive auto-links extensions
(family of `MD_FLAG_PERMISSIVExxxAUTOLINKS` flags), LaTeX math extension
(`MD_FLAG_LATEXMATHSPANS`) and wiki-links extension (`MD_FLAG_WIKILINKS`)
of the form `[[label|text]]` (with pipe `|`). In some complex cases this
could lead to invalid internal parser state and memory corruption.
Identified with [OSS-Fuzz](https://github.com/google/oss-fuzz).
* [#222](mity/md4c#222):
Fix strike-through extension (`MD_FLAG_STRIKETHROUGH`) which did not respect
same rules for pairing opener and closer marks as other emphasis spans.
* [#223](mity/md4c#223):
Fix incorrect handling of new-line character just at the beginning and/or
end of a code span where we were not following CommonMark specification
requirements correctly.
## Version 0.5.0
Changes:
* Changes mandated by CommonMark specification 0.30.
Actually there are only very minor changes to recognition of HTML blocks:
- The tag `<textarea>` now triggers HTML block (of type 1 as per the
specification).
- HTML declaration (HTML block type 4) is not required to begin with an
upper-case ASCII character after the `<!`. Any ASCII character is now
allowed. Also it now doesn't require a whitespace before the closing `>`.
Other than that, the newest specification mainly improves test coverage and
clarifies its wording in some cases, without affecting the implementation.
Refer to [CommonMark
0.30 notes](https://github.com/commonmark/commonmark-spec/releases/tag/0.30)
for more info.
* Make Unicode-specific code compliant to Unicode 15.1.
* Update list of entities known to the HTML renderer from
https://html.spec.whatwg.org/entities.json.
New Features:
* Add extension allowing to treat all soft break as hard ones. It has to be
explicitly enabled with `MD_FLAG_HARD_SOFT_BREAKS`.
Contributed by [l-m](https://github.com/l1mey112).
* Structure `MD_SPAN_A_DETAIL` now has a new member `is_autolink`.
Contributed by [Jens Alfke](https://github.com/snej).
* `md2html` utility now supports command line options `--html-title` and
`--html-css`.
Contributed by [Andreas Baumann](https://github.com/andreasbaumann).
Fixes:
* [#163](mity/md4c#163):
Make HTML renderer to emit `'\n'` after the root tag when in the XHTML mode.
* [#165](mity/md4c#165):
Make HTML renderer not to percent-encode `'~'` in URLs. Although it does
work, it's not needed, and it can actually be confusing with URLs such as
`http://www.example.com/~johndoe/`.
* [#167](mity/md4c#167),
[#168](mity/md4c#168):
Fix multiple instances of various buffer overflow bugs, found mostly using
a fuzz testing. Contributed by [dtldarek](https://github.com/dtldarek) and
[Thierry Coppey](https://github.com/TCKnet).
* [#169](mity/md4c#169):
Table underline now does not require 3 characters per table column anymore.
One dash (optionally with a leading or tailing `:` appended or prepended)
is now sufficient. This improves compatibility with the GFM.
* [#172](mity/md4c#172):
Fix quadratic time behavior caused by unnecessary lookup for link reference
definition even if the potential label contains nested brackets.
* [#173](mity/md4c#173),
[#174](mity/md4c#174),
[#212](mity/md4c#212),
[#213](mity/md4c#213):
Multiple bugs identified with [OSS-Fuzz](https://github.com/google/oss-fuzz)
were fixed.
* [#190](mity/md4c#190),
[#200](mity/md4c#200),
[#201](mity/md4c#201):
Multiple fixes of incorrect interactions of indented code block with a
preceding block.
* [#202](mity/md4c#202):
We were not correctly calling `enter_block()` and `leave_block()` callbacks
if multiple HTML blocks followed one after another; instead previously
such blocks were merged into one.
(This may likely impact only applications interested in Markdown's AST,
and not just converting Markdown to other formats like HTML.)
* [#210](mity/md4c#210):
The `md2html` utility now handles nested images with optional titles
correctly.
* [#214](mity/md4c#214):
Tags `<h2>` ... `<h6>` incorrectly did not trigger HTML block.
* [#215](mity/md4c#215):
The parser incorrectly did not accept optional tabs after setext header
underline.
* [#217](mity/md4c#217):
The parser incorrectly resolved emphasis in some situations, if the emphasis
marks were enclosed by punctuation characters.
Fuzzying found that input
"\xA9##r[](r[]("generates a buffer overflow (see the diff for the exact place), and this is a small change that should fix it.