Sanitize control/format characters in console logger output across all formatters by Copilot · Pull Request #128741 · dotnet/runtime

Copilot · 2026-05-29T04:44:55Z

Console logging currently writes untrusted control characters verbatim, allowing terminal escape/control effects and ambiguous output. This change sanitizes control/format characters across Simple, Systemd, and Json formatter paths.

Behavioral change
- Escapes Unicode Cc/Cf characters as \uXXXX before writing log output.
Implementation
- Added shared sanitizer: ConsoleControlCharacterSanitizer.
- Applied sanitization to:
  - message text
  - exception text
  - category/scope text
  - JSON state/scopes keys and string values
  - JSON timestamp string payload when present
Tests
- Added coverage for default sanitization behavior across formatter types.
- Updated existing expectations where newline-containing exception output is now escaped.

builder.Logging.AddSimpleConsole(o =>
{
    o.SanitizeControlCharacters = true; // default
});

logger.LogInformation("Hello {Name}", "prefix\u001B[31mRED\u001B[0m");
// output contains escaped control chars (e.g. \u001B), not raw terminal sequences

dotnet-policy-service · 2026-05-29T05:05:01Z

Tagging subscribers to this area: @dotnet/area-extensions-logging
See info in area-owners.md if you want to be subscribed.

Co-authored-by: rosebyte <14963300+rosebyte@users.noreply.github.com>

tarekgh

(superseded by the comment below)

tarekgh

Review of the control character sanitization changes

The security motivation here is solid. Preventing terminal escape injection (ANSI sequences, bidi overrides, etc.) is worth doing. But the current implementation has several problems that need to be addressed before merging.

`\n`, `\r`, and `\t` should not be escaped

The sanitizer uses UnicodeCategory.Control which catches every character in U+0000-U+001F, including \n, \r, and \t. These are not security threats. They are structural formatting characters that the formatters depend on.

Both SimpleConsoleFormatter and SystemdConsoleFormatter have explicit downstream logic that operates on real newlines:

SimpleConsoleFormatter.WriteMessage calls message.Replace(Environment.NewLine, _newLineWithMessagePadding) to add indentation padding after each newline in exception text.
SystemdConsoleFormatter.WriteReplacingNewLine calls message.Replace(Environment.NewLine, " ") to flatten multi-line messages into a single line (required by systemd/journald).

Because the sanitizer runs before these calls, it converts \n to the literal text \u000A. The downstream Replace calls then find no real newlines and become no-ops. This breaks multi-line exception formatting in Simple mode (no padding) and breaks the single-line guarantee in Systemd mode.

The fix should target only the actually dangerous characters: ESC (\x1B), BEL (\x07), backspace (\x08), bidi overrides (\u202E, \u202D), and similar. Not \n/\r/\t.

Double-escaping on the JSON path

Utf8JsonWriter with JavaScriptEncoder.Default already escapes all control characters (U+0000-U+001F) and all non-BasicLatin characters (including \u202E). Pre-sanitizing the strings is redundant and produces double-escaped output.

For example, ESC (\x1B) would normally appear as \u001B in the JSON output. With the sanitizer, it becomes \\u001B, which is a literal backslash followed by u001B. JSON consumers parsing these logs would see the text \u001B instead of the actual ESC character. The test changes in JsonConsoleFormatterTests.cs confirm this: they switched to expecting \\\\u000D\\\\u000A.

The sanitizer should be skipped entirely for the JsonConsoleFormatter path, or at minimum should not run when Utf8JsonWriter is handling the escaping.

Breaking change with default `true`

Setting SanitizeControlCharacters = true by default changes the output format for every existing application without any opt-in. Exception stack traces go from properly formatted multi-line output to a single blob containing \u000A literals. This will break log parsing tools and dashboards that expect the current format.

Consider either defaulting to false or narrowing the escape set so that \n/\r/\t pass through unchanged (which would make the default safe).

Minor issues

API review: adding a public property to ConsoleFormatterOptions requires going through the dotnet/runtime API review process.
Allocations: every exception log triggers a StringBuilder allocation since exception strings always contain \n. Consider string.Create or ValueStringBuilder for the hot path.
Test coverage: Log_ControlCharacters_SanitizationCanBeDisabled only tests Simple and Systemd formatters. JSON opt-out is not covered.
Existing test expectations modified: the changes to ConsoleLoggerTest.cs normalize broken formatting as the new expected output rather than preserving the original behavior.

rosebyte · 2026-06-17T06:48:19Z

A note on JSON output and custom encoders:

JsonConsoleFormatter delegates character escaping to Utf8JsonWriter, which by default escapes all control characters. This means dangerous characters like ESC or bidi overrides will appear as \u001B etc. in the JSON output without us needing to do anything extra.

However, dotnet's JSON writer supports pluggable encoders, and one of them, JavaScriptEncoder.UnsafeRelaxedJsonEscaping, deliberately relaxes what gets escaped. If someone explicitly configures their JsonConsoleFormatterOptions.JsonWriterOptions to use this encoder, certain invisible/directional Unicode characters (the kind that can mislead someone reading raw text) will pass through unescaped.

Why we should leave it as it for now:

• The encoder's name literally contains "Unsafe", so using it is a deliberate opt-in to relaxed behaviour.
• JSON logs are typically consumed by log aggregators (Seq, ELK, Datadog, etc.) which parse the JSON and display values in their own UI, not by someone displaying a file in a terminal.
• Adding pre-sanitisation for this edge case would complicate the default path and risk double-escaping for the majority of users who haven't changed the encoder.

If this proves to be a real-world concern, we can add targeted sanitisation to the JSON path later without any API change.

…acters

Copilot

Pull request overview

This PR introduces a shared sanitizer (ConsoleControlCharacterSanitizer) and applies it to the Simple and Systemd console formatter pipelines to reduce the risk of untrusted control / formatting characters influencing terminal output. It also adds new unit tests intended to validate sanitization behavior for non-JSON formatters.

Changes:

Added ConsoleControlCharacterSanitizer and used it to sanitize message / exception / category / scope text in SimpleConsoleFormatter and SystemdConsoleFormatter.
Updated JsonConsoleFormatter’s char state-property handling (but Json formatter still does not apply the new sanitizer).
Added tests for sanitization behavior (currently for non-JSON formatters only) and updated minor test code comments.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 8 comments.

Show a summary per file

File	Description
src/libraries/Microsoft.Extensions.Logging.Console/tests/Microsoft.Extensions.Logging.Console.Tests/ConsoleLoggerTest.cs	Removes Arrange/Act/Assert comments in an existing test.
src/libraries/Microsoft.Extensions.Logging.Console/tests/Microsoft.Extensions.Logging.Console.Tests/ConsoleFormatterTests.cs	Adds new sanitization tests (currently limited to non-JSON formatters).
src/libraries/Microsoft.Extensions.Logging.Console/src/SystemdConsoleFormatter.cs	Sanitizes message/exception/category and scope string output prior to writing.
src/libraries/Microsoft.Extensions.Logging.Console/src/SimpleConsoleFormatter.cs	Sanitizes message/exception/category and scope string output prior to writing.
src/libraries/Microsoft.Extensions.Logging.Console/src/Microsoft.Extensions.Logging.Console.csproj	Adds `ValueStringBuilder` to support the sanitizer implementation.
src/libraries/Microsoft.Extensions.Logging.Console/src/JsonConsoleFormatter.cs	Adjusts JSON state-property writing for `char` values (allocates) but does not add sanitizer usage.
src/libraries/Microsoft.Extensions.Logging.Console/src/ConsoleControlCharacterSanitizer.cs	New sanitizer that escapes a hardcoded set of characters to `\\uXXXX`.

Copilot

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

Copilot

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

…acters

Copilot

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

mrek-msft · 2026-06-17T13:10:17Z

+                >= '\u200B' and <= '\u200F' => true, // zero-width and directional marks
+                >= '\u202A' and <= '\u202E' => true, // bidi embedding/override
+                >= '\u2066' and <= '\u2069' => true, // bidi isolates
+                _ => false,


I am curious how did we choose regions? There are many other questionable ranges. For example https://en.wikipedia.org/wiki/Tags_(Unicode_block). I think the more proper way would be to decide based on https://learn.microsoft.com/en-us/dotnet/api/system.char.getunicodecategory?view=net-10.0 but did not check for any edge cases using this approach. I also recommend reading https://learn.microsoft.com/en-us/dotnet/standard/base-types/character-encoding-introduction which list many interesting and possibly error prone parts.

I agree. Can we leverage GetUnicodeCategory without giving up perf using SearchValues? At least for the fast non-sanitizing path? E.g:

// Built once. GetUnicodeCategory cost is paid here, not per log call. private static readonly SearchValues<char> s_charsToEscape = SearchValues.Create(BuildEscapeSet()); private static char[] BuildEscapeSet() { var chars = new List<char>(256); for (int c = 0; c <= 0xFFFF; c++) { if (c is '\t' or '\r' or '\n') continue; if (char.GetUnicodeCategory((char)c) is UnicodeCategory.Control or UnicodeCategory.Format) chars.Add((char)c); } return chars.ToArray(); } public static string Sanitize(string value) { int idx = value.AsSpan().IndexOfAny(s_charsToEscape); // vectorized if (idx < 0) return value;

mrek-msft · 2026-06-17T13:14:15Z

Are we sure that Console Logger is right place to do such sanitization? I think it will negatively impact performance, even in many case when static constant strings are passed and sanitization will happen every time they are logged which seems like big overhead considering MECFG scale. I did not read thread model or any related documents, but to me it seems that this should be responsibility of caller.

rosebyte · 2026-06-17T13:37:57Z

Are we sure that Console Logger is right place to do such sanitization? I think it will negatively impact performance, even in many case when static constant strings are passed and sanitization will happen every time they are logged which seems like big overhead considering MECFG scale. I did not read thread model or any related documents, but to me it seems that this should be responsibility of caller.

@BrennanConroy, @halter73, do you have any thoughts? Honestly, the argument users are supposed to sanitise the log message because they know the context, makes complete sense. We can still provide a convenience method to help them.

svick · 2026-06-17T16:00:13Z

+                    sanitized.Append('\\');
+                    sanitized.Append('u');
+                    int codePoint = current;
+                    Span<char> hex = sanitized.AppendSpan(4);
+                    hex[0] = ToHexChar(codePoint >> 12);
+                    hex[1] = ToHexChar((codePoint >> 8) & 0xF);
+                    hex[2] = ToHexChar((codePoint >> 4) & 0xF);
+                    hex[3] = ToHexChar(codePoint & 0xF);


Nit: If AppendSpan() is better, should we use it for all 6 characters?

Suggested change

sanitized.Append('\\');

sanitized.Append('u');

int codePoint = current;

Span<char> hex = sanitized.AppendSpan(4);

hex[0] = ToHexChar(codePoint >> 12);

hex[1] = ToHexChar((codePoint >> 8) & 0xF);

hex[2] = ToHexChar((codePoint >> 4) & 0xF);

hex[3] = ToHexChar(codePoint & 0xF);

Span<char> escaped = sanitized.AppendSpan(6);

escaped[0] = '\\';

escaped[1] = 'u';

escaped[2] = ToHexChar(current >> 12);

escaped[3] = ToHexChar((current >> 8) & 0xF);

escaped[4] = ToHexChar((current >> 4) & 0xF);

escaped[5] = ToHexChar(current & 0xF);

svick · 2026-06-17T16:05:25Z

+                    sanitized.Append('\\');
+                    sanitized.Append('u');
+                    int codePoint = current;
+                    Span<char> hex = sanitized.AppendSpan(4);
+                    hex[0] = ToHexChar(codePoint >> 12);
+                    hex[1] = ToHexChar((codePoint >> 8) & 0xF);
+                    hex[2] = ToHexChar((codePoint >> 4) & 0xF);
+                    hex[3] = ToHexChar(codePoint & 0xF);


I'm not against this approach, but a simpler alternative would be to replace everything with � a.k.a. '\uFFFD' a.k.a. U+FFFD REPLACEMENT CHARACTER.

I don't think � is a good replacement. Converting to hex number is way more "expected".

svick · 2026-06-17T16:09:48Z

+                >= '\u200B' and <= '\u200F' => true, // zero-width and directional marks
+                >= '\u202A' and <= '\u202E' => true, // bidi embedding/override
+                >= '\u2066' and <= '\u2069' => true, // bidi isolates
+                _ => false,


Are there no problematic code points outside the BMP (i.e. those that would be composed of two chars)?

BrennanConroy · 2026-06-17T18:02:43Z

The logger implementation (Console) knows what safe and unsafe input for its output is, the logging callers (app code) does not know what is safe or unsafe.

tarekgh · 2026-06-17T21:04:44Z

Following up on the performance question raised above. Independent of where sanitization ultimately lives, if it stays in the Console logger it runs on the hot path for every message, exception, category, and scope on every log call. The current detection scan (GetFirstEscapedCharacterIndex calling the per-character ShouldEscape switch) is scalar, so even the common case (plain text with nothing to escape) pays a full O(n) branchy walk of every string.

The same escape set can be expressed as a static SearchValues<char> and scanned with a vectorized IndexOfAny. Measuring the detection scan only (both are zero-alloc on the common path), net8.0 x64:

Input	Length	Scalar switch	SearchValues	Speedup
ASCII	32	57.5 ns	4.4 ns	13x
ASCII	256	462 ns	15.7 ns	29x
ASCII	1024	1,832 ns	45.8 ns	40x
CJK (all >= 0x80)	256	389 ns	40.6 ns	9.6x
ESC at end	256	559 ns	18.6 ns	30x
ESC at end	1024	2,386 ns	47.3 ns	50x

(Local numbers, not CI hardware, so treat the absolute values as indicative.) The cost is pure CPU per log call, multiplied across the four sanitized values.

Suggestions:

Use a static SearchValues<char> for the detection scan (value.AsSpan().IndexOfAny(...)), available on net8.0+, keeping the existing #if NET pattern for a scalar fallback on netstandard2.0/net462.
In the escape-building path, use IndexOfAny to find the next character to escape and bulk-copy the safe run, rather than appending one character at a time.
category is constant per logger instance but is re-scanned on every call; it could be sanitized once.

This keeps the point that the logger is the right place to know what is unsafe for its output, while removing the per-call overhead.

cincuranet · 2026-06-18T07:49:33Z

+                    hex[0] = ToHexChar(codePoint >> 12);
+                    hex[1] = ToHexChar((codePoint >> 8) & 0xF);
+                    hex[2] = ToHexChar((codePoint >> 4) & 0xF);
+                    hex[3] = ToHexChar(codePoint & 0xF);


Remove the ToHexChar and use HexConverter.ToCharUpper (Common/System/HexConverter.cs).

Suggested change

hex[0] = ToHexChar(codePoint >> 12);

hex[1] = ToHexChar((codePoint >> 8) & 0xF);

hex[2] = ToHexChar((codePoint >> 4) & 0xF);

hex[3] = ToHexChar(codePoint & 0xF);

hex[0] = HexConverter.ToCharUpper(current >> 12);

hex[1] = HexConverter.ToCharUpper(current >> 8);

hex[2] = HexConverter.ToCharUpper(current >> 4);

hex[3] = HexConverter.ToCharUpper(current);

You don't need to mask the shifts - ToCharUpper does value &= 0xF internally, so the leftover high bits are discarded for free.

Or if you want to avoid hand-composing the values use ((int)current).TryFormat(hex, out _, "X4").

Initial plan

9933fd5

Copilot AI assigned Copilot and rosebyte May 29, 2026

Copilot AI requested review from Copilot and removed request for Copilot May 29, 2026 04:44

Copilot AI linked an issue May 29, 2026 that may be closed by this pull request

Console logger should sanitize control characters in log messages #128727

Open

Copilot started work on behalf of rosebyte May 29, 2026 04:44 View session

github-actions Bot added the area-Extensions-Logging label May 29, 2026

Sanitize control characters in console formatter output

85172bc

Co-authored-by: rosebyte <14963300+rosebyte@users.noreply.github.com>

Copilot AI requested review from Copilot and removed request for Copilot May 29, 2026 05:14

Copilot AI changed the title ~~[WIP] Sanitize control characters in console logger log messages~~ Sanitize control/format characters in console logger output across all formatters May 29, 2026

Copilot finished work on behalf of rosebyte May 29, 2026 05:16

Copilot AI requested a review from rosebyte May 29, 2026 05:16

tarekgh reviewed May 29, 2026

View reviewed changes

rosebyte added 3 commits June 17, 2026 08:15

fix the pr

8e2eaec

use ValueStringBuilder

0268f1d

implement PR feedback

f112ef0

Copilot AI review requested due to automatic review settings June 17, 2026 06:48

Copilot started reviewing on behalf of rosebyte June 17, 2026 06:49 View session

Merge branch 'main' into copilot/sanitize-console-logger-control-char…

fa0ca04

…acters

Copilot AI reviewed Jun 17, 2026

View reviewed changes

clean up

8b81df2

Copilot AI review requested due to automatic review settings June 17, 2026 06:58

Copilot started reviewing on behalf of rosebyte June 17, 2026 06:59 View session

Copilot AI reviewed Jun 17, 2026

View reviewed changes

implement PR comments

4c8e0fa

rosebyte marked this pull request as ready for review June 17, 2026 07:54

Copilot AI review requested due to automatic review settings June 17, 2026 07:54

Copilot started reviewing on behalf of rosebyte June 17, 2026 07:54 View session

Copilot AI reviewed Jun 17, 2026

View reviewed changes

Potential fix for pull request finding

750109f

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

Copilot AI review requested due to automatic review settings June 17, 2026 11:08

Copilot started reviewing on behalf of rosebyte June 17, 2026 11:09 View session

Merge branch 'main' into copilot/sanitize-console-logger-control-char…

d7946d5

…acters

Copilot AI reviewed Jun 17, 2026

View reviewed changes

rosebyte requested review from cincuranet, mrek-msft and svick June 17, 2026 11:27

mrek-msft reviewed Jun 17, 2026

View reviewed changes

This was referenced Jun 17, 2026

slow macOS - "##[error]The job running on agent Azure Pipelines 9 ran longer than the maximum time of 60 minutes." dotnet/dnceng#1883

Open

The Operation will be canceled. The next steps may not contain expected logs. dotnet/dnceng#3008

Open

svick reviewed Jun 17, 2026

View reviewed changes

EgorBot mentioned this pull request Jun 17, 2026

Benchmarks for dotnet/runtime#128741 (for @tarekgh) EgorBot/Benchmarks#252

Open

cincuranet reviewed Jun 18, 2026

View reviewed changes

Conversation

Copilot AI commented May 29, 2026 • edited by rosebyte Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dotnet-policy-service Bot commented May 29, 2026

Uh oh!

tarekgh left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tarekgh left a comment

Choose a reason for hiding this comment

Review of the control character sanitization changes

\n, \r, and \t should not be escaped

Double-escaping on the JSON path

Breaking change with default true

Minor issues

Uh oh!

rosebyte commented Jun 17, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mrek-msft Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

halter73 Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

mrek-msft commented Jun 17, 2026

Uh oh!

rosebyte commented Jun 17, 2026

Uh oh!

svick Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

svick Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

cincuranet Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

svick Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

BrennanConroy commented Jun 17, 2026

Uh oh!

Copilot AI commented May 29, 2026 •

edited by rosebyte

Loading

tarekgh left a comment •

edited

Loading

`\n`, `\r`, and `\t` should not be escaped

Breaking change with default `true`