Skip to content

fix: update mention room regex#840

Merged
Gonals merged 7 commits into
Expensify:mainfrom
truph01:fix/57195-regex-not-match-room-mention
Apr 29, 2025
Merged

fix: update mention room regex#840
Gonals merged 7 commits into
Expensify:mainfrom
truph01:fix/57195-regex-not-match-room-mention

Conversation

@truph01

@truph01 truph01 commented Mar 19, 2025

Copy link
Copy Markdown
Contributor

This PR fixes an issue where the regex for room mentions (#room-name) was not matching case when the # symbol was preceded by : character

Fixed Issues

Expensify/App#57195

Tests

  1. What unit/integration tests cover your change? What autoQA tests cover your change?
  • Verify the #test-room string matches the room name if the string is:
    :#test-room
  1. What tests did you perform that validates your changed worked?

Test steps:

1. Create a room named `test-room-name`
2. Open the admin room of the workspace in step 1
3. Send `:#test-room-name` message
4. Verify the `#test-room-name` is highlighted
Screen.Recording.2025-03-19.at.12.59.01.mov

QA

  1. What does QA need to do to validate your changes?

  2. What areas to they need to test for regressions?

@github-actions

github-actions Bot commented Mar 19, 2025

Copy link
Copy Markdown

All contributors have signed the CLA ✍️ ✅
Posted by the CLA Assistant Lite bot.

@truph01

truph01 commented Mar 19, 2025

Copy link
Copy Markdown
Contributor Author

I have read the CLA Document and I hereby sign the CLA

CLABotify added a commit to Expensify/CLA that referenced this pull request Mar 19, 2025
@truph01 truph01 marked this pull request as ready for review March 19, 2025 06:00
@truph01 truph01 requested a review from a team as a code owner March 19, 2025 06:00
@melvin-bot melvin-bot Bot requested review from Gonals and removed request for a team March 19, 2025 06:01

@rayane-d rayane-d left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewing

@rayane-d rayane-d left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@truph01 Can you please add a unit test?

@rayane-d

Copy link
Copy Markdown

Note: The room mentions regex was added in PR #680 and edited later in PR #804

@truph01

truph01 commented Mar 31, 2025

Copy link
Copy Markdown
Contributor Author

@rayane-d I just added and updated the test

@truph01

truph01 commented Mar 31, 2025

Copy link
Copy Markdown
Contributor Author

@rayane-d Our current PR applies the mention report logic to all #room mentions, regardless of the character that comes before them. This broader matching can cause test failures in certain edge cases like the one below:

test('Test for inline code block with triple backtick with spaces as content', () => {
    const testString = '```   ```';
    const resultString = '&#x60;&#x60;<code>&nbsp;&nbsp;&nbsp;</code>&#x60;&#x60;';
    expect(parser.replace(testString)).toBe(resultString);
});
 Expected: "&#x60;&#x60;<code>&nbsp;&nbsp;&nbsp;</code>&#x60;&#x60;"
 Received: "&<mention-report>#x60</mention-report>;&<mention-report>#x60</mention-report>;<code>&nbsp;&nbsp;&nbsp;</code>&<mention-report>#x60</mention-report>;&<mention-report>#x60</mention-report>;"

In this case, the parser mistakenly treats parts of the encoded backticks as mentionable content (e.g. #x60), which breaks valid syntax highlighting and markdown rendering.

Do you think we should narrow the scope of the PR?

@rayane-d

Copy link
Copy Markdown

From testing, this appears to only affect ' and **** characters. Perhaps we could add a negative lookbehind (?<!&)to skip#` in encoded strings?

Screen.Recording.2025-03-31.at.10.59.43.PM.mov
Screenshot 2025-03-31 at 11 23 00 PM

@truph01

truph01 commented Apr 2, 2025

Copy link
Copy Markdown
Contributor Author

@rayane-d Thanks! I'm currently working on it and testing a few cases.

@truph01

truph01 commented Apr 2, 2025

Copy link
Copy Markdown
Contributor Author

@rayane-d I applied the new regex:
regex: /(?<=^|(?<!&)[*~_]?|[ \n][*~_]?)(#[\p{Ll}0-9-]{1,99})(?![^<]*(?:<\/pre>|<\/code>|<\/a>))/gimu,

It resolves the issue with encoded strings, but the bug still occurs in other cases:

Screenshot 2025-04-03 at 00 11 22

You can pull this PR and run npm run test locally to see the result.

@rayane-d

rayane-d commented Apr 7, 2025

Copy link
Copy Markdown

@truph01 I think this is happening because we're allowing any character before #, including alphanumeric characters. We should instead extend the current app logic to allow only special characters.

@truph01

truph01 commented Apr 8, 2025

Copy link
Copy Markdown
Contributor Author

@rayane-d Just to confirm—you're referring to special characters like ! @ $ % ^ & * ( ) _ + - = { } [ ] | \ : ; " ' < > , . ? / ~, correct?

Even if we limit the changes in this PR strictly to those characters, I still see a risk of regression. For example, we could reintroduce issues like the one I mentioned earlier with &#room-name:
#840 (comment)

@rayane-d

rayane-d commented Apr 8, 2025

Copy link
Copy Markdown

@rayane-d Just to confirm—you're referring to special characters like ! @ $ % ^ & * ( ) _ + - = { } [ ] | \ : ; " ' < > , . ? / ~, correct?

Yes, that is correct.

Even if we limit the changes in this PR strictly to those characters, I still see a risk of regression. For example, we could reintroduce issues like the one I mentioned earlier with &#room-name: #840 (comment)

I believe we can test them individually to ensure there are no regressions and eliminate the ones that cause issues, such as &.

@truph01

truph01 commented Apr 14, 2025

Copy link
Copy Markdown
Contributor Author

@rayane-d I'm currently working on this, and the target characters we're focusing on are: @ $ % ^ { } ? : (Since none of them appears in HTML encode or markdown syntax).

That said, I’d like to highlight a related bug discussed here: #840 (comment). In that case, a string like /path#fragment is incorrectly matched as a room mention.

Although our current logic already prevents matching room mentions with alphabetic characters, it doesn't account for cases like /path@#fragment. Since we can’t guarantee that users won’t include special characters before the @ (e.g., /path@#fragment), this pattern still incorrectly triggers the room mention regex. Similar false positives may occur with other special characters as well.

It will lead to the high risk of regression.

Could you share your thoughts on my concern?

@rayane-d

Copy link
Copy Markdown

@truph01 Can you please start a discussion in #expensify-open-source for more 👀, you can also tag @Robert Kozik who implemented this feature in #680

@truph01

truph01 commented Apr 21, 2025

Copy link
Copy Markdown
Contributor Author

@rayane-d PR is updated

@rayane-d

Copy link
Copy Markdown

@puneetlath Could you please approve the workflows so they can run?

@puneetlath

Copy link
Copy Markdown
Contributor

Triggered.

@rayane-d rayane-d left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM and tests well 👍

Screenshot 2025-04-29 at 12 27 49 AM

@rayane-d

Copy link
Copy Markdown

@puneetlath all yours!

@Gonals Gonals merged commit c65593b into Expensify:main Apr 29, 2025
@os-botify

os-botify Bot commented Apr 29, 2025

Copy link
Copy Markdown
Contributor

🚀 Published to npm in 2.0.135 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants