MXP processing hang - unescaped ampersands

User avatar
ulysses
Posts: 52
Joined: Fri Jan 05, 2018 7:43 pm

Re: Potential color handling bug in Mudlet?

Post by ulysses »

No reply yet from our Admins from the game's message boards, I will try them on live chat and also via email.

I wonder if the MXP spec says what the maximum length for an entity name is? If it is 255 (like C variable names) then Mudlet could:

- Warn the user if no semi-colon was found for a corresponding ampersand
- Maybe display the buffer so far (including the ampersand) in plain

I can't see that upsetting any existing MXP functionality. I would be happy to investigate the code modification myself.
Wod :mrgreen:
CthulhuMUD
www.cthulhumud.com
A hugely entertaining MUD based on the horror writings of HP Lovecraft.

User avatar
ulysses
Posts: 52
Joined: Fri Jan 05, 2018 7:43 pm

Re: Potential color handling bug in Mudlet?

Post by ulysses »

p.s. Can we please change the title of this thread to "MXP processing hang - unescaped ampersands" - it might signpost the way for others in the future.
Wod :mrgreen:
CthulhuMUD
www.cthulhumud.com
A hugely entertaining MUD based on the horror writings of HP Lovecraft.

Jor'Mox
Posts: 1146
Joined: Wed Apr 03, 2013 2:19 am

Re: Potential color handling bug in Mudlet?

Post by Jor'Mox »

So, from what I can find about MXP [https://www.zuggsoft.com/zmud/mxp.htm], it is modeled after XML, which has the following rules regarding entity usage:
https://www.w3resource.com/xml/entities.php wrote:Rules for using legal Entity Markup
  • The entity must be declared in the DTD. If you are using an XML document which is not validated against a DTD or schema, then you have to declare one within the xml document itself and this must consist the entity you are using.
  • A general entity is referenced within an xml document must be surrounded by an ampersand (&) on one end and the semicolon (;) on the other (&myEntity;).
  • The name of an entity must begin with a letter or underscore (_) but can contain letters, underscores, whole numbers, colons, periods and/or hyphens.
  • An entity declaration cannot consist of markup that begins in the entity declaration and ends outside of it .
  • A parameter entity must be declared with a preceding percent sign (%) with a white space before and after the percent sign, and it must be referenced by a percent sing with no trailing white space. A typical parameter entity declaration looks like this: <!ENTITY % myParameterEntity "myElement">
So, any time you see an & that is followed by any character other than those before the arrival of a semicolon, it should be the case that it is not, in fact, an MXP entity, and can therefore be treated like plain text. In other words, it should match the following REGEX pattern if it is valid: &[a-zA-Z_][a-zA-Z_0-9:.\-]*;
Last edited by Jor'Mox on Sun Apr 07, 2024 2:11 pm, edited 1 time in total.

User avatar
ulysses
Posts: 52
Joined: Fri Jan 05, 2018 7:43 pm

Re: Potential color handling bug in Mudlet?

Post by ulysses »

So that rule and regex helps the case where someone has written a naked ampersand in the form of "You are standing at the intersection of Church St & University Road" but not in the case "You flick on the TV and see the closing credits to the Tom&Jerry show". It also doesn't help if a MUD author did try to escape the & but forgot to close it with ; - in such a case Mudlet will swallow all the output until a ; is found.

According to https://stackoverflow.com/questions/125 ... lSquare%20.

&CounterClockwiseContourIntegral; (amp CounterClockwiseContourIntegral semi-colon) is the longest HTML entity, but of course in MXP people can define their own. It would be VERY useful here to have an upper limit.
Wod :mrgreen:
CthulhuMUD
www.cthulhumud.com
A hugely entertaining MUD based on the horror writings of HP Lovecraft.

Jor'Mox
Posts: 1146
Joined: Wed Apr 03, 2013 2:19 am

Re: Potential color handling bug in Mudlet?

Post by Jor'Mox »

ulysses wrote:
Fri Apr 05, 2024 12:08 am
So that rule and regex helps the case where someone has written a naked ampersand in the form of "You are standing at the intersection of Church St & University Road" but not in the case "You flick on the TV and see the closing credits to the Tom&Jerry show". It also doesn't help if a MUD author did try to escape the & but forgot to close it with ; - in such a case Mudlet will swallow all the output until a ; is found.
Tom&Jerry would be caught by the regex I proposed, because it isn't followed immediately by a semicolon. Rather, there is a character outside the bounds of the class used, namely the space after the 'y' in Jerry. So, I think that in theory, upon seeing an &, you'd first check to see if the next character is an underscore or a letter, if so you'd then check each following character to see if it is a letter, number, underscore, hyphen, period, or colon. If that ever fails before you reach a semicolon, then it isn't a valid MXP entity, and should be treated as regular text.

Under normal use cases, where a proper MXP entity is encountered, we can expect relatively few characters in any given entity name, so such checks should be minimal in terms of processing power to validate entity names, character by character. And when there is a mistake that is missed by the game, making those checks is still likely to not take too long, but will save us from having Mudlet swallow all following output.

User avatar
ulysses
Posts: 52
Joined: Fri Jan 05, 2018 7:43 pm

Re: Potential color handling bug in Mudlet?

Post by ulysses »

Thank you, Jor'Mox. Yes, I wrote that too quickly last night under insomniac conditions :-). You are right that the rule to only allow those specific characters i.e. the REGEX &[a-zA-Z_][a-zA-Z_0-9:.\-]; would work in most cases to flush out naked ampersands.

Could an entity start in one buffer and end in the next? Unlikely perhaps, but maybe that would have to be taken care of too. This might be trivial to implement though - in the entity processing code, if a character was found which wasn't in the list above prior to reading the terminal semi-colon, then it's not an entity and so the characters read so far are passed to the output in plaintext, and the buffer reading is taken out of entity-reading mode.

@Vadi, would be prepared to consider this approach? I would be happy to work on it and raise a PR. I think it will be an easy fix.
Wod :mrgreen:
CthulhuMUD
www.cthulhumud.com
A hugely entertaining MUD based on the horror writings of HP Lovecraft.

User avatar
ulysses
Posts: 52
Joined: Fri Jan 05, 2018 7:43 pm

Re: Potential color handling bug in Mudlet?

Post by ulysses »

Just had a chat with some of the admins and it seems they disagree that the MXP protocol states that bare ampersands must be escaped in MXP mode. Looking at the spec for the protocol here https://www.zuggsoft.com/zmud/mxp.htm I can't see where it says that. I think applying Jor'Mox's REGEX should be safe and allow both MXP entities and unescaped amps to co-exist.
Wod :mrgreen:
CthulhuMUD
www.cthulhumud.com
A hugely entertaining MUD based on the horror writings of HP Lovecraft.

User avatar
Vadi
Posts: 5042
Joined: Sat Mar 14, 2009 3:13 pm

Re: MXP processing hang - unescaped ampersands

Post by Vadi »

MXP is modelled after XML as Jor'Mox posted, and to represent an apostrophe in XML, you need to use &apos;. Here is a selection of links backing this for the game admins:

https://www.baeldung.com/xml-encode-spe ... l-entities
https://en.wikipedia.org/wiki/List_of_X ... references
https://stackoverflow.com/questions/145 ... l-document

I can provide more if necessary. This is basic behaviour of XML so there is lots of documentation backing this up.

That said Mudlet should be better about not losing all of of the output until a reconnect - a PR to only lose some of the output if the game doesn't properly escape the ampersand would be welcome @ulysses.

Post Reply