Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for custom parsing of APC, SOS and PM sequences. #115

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
157 changes: 153 additions & 4 deletions src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -179,7 +179,9 @@ impl<const OSC_RAW_BUF_SIZE: usize> Parser<OSC_RAW_BUF_SIZE> {
State::Escape => self.advance_esc(performer, byte),
State::EscapeIntermediate => self.advance_esc_intermediate(performer, byte),
State::OscString => self.advance_osc_string(performer, byte),
State::SosPmApcString => self.anywhere(performer, byte),
State::SosString => self.advance_opaque_string(SosDispatch(performer), byte),
State::ApcString => self.advance_opaque_string(ApcDispatch(performer), byte),
State::PmString => self.advance_opaque_string(PmDispatch(performer), byte),
State::Ground => unreachable!(),
}
}
Expand Down Expand Up @@ -356,7 +358,12 @@ impl<const OSC_RAW_BUF_SIZE: usize> Parser<OSC_RAW_BUF_SIZE> {
performer.esc_dispatch(self.intermediates(), self.ignoring, byte);
self.state = State::Ground
},
0x58 => self.state = State::SosPmApcString,
0x58 => {
self.state = {
performer.sos_start();
State::SosString
}
},
0x59..=0x5A => {
performer.esc_dispatch(self.intermediates(), self.ignoring, byte);
self.state = State::Ground
Expand All @@ -374,7 +381,14 @@ impl<const OSC_RAW_BUF_SIZE: usize> Parser<OSC_RAW_BUF_SIZE> {
self.osc_num_params = 0;
self.state = State::OscString
},
0x5E..=0x5F => self.state = State::SosPmApcString,
0x5E => {
performer.pm_start();
self.state = State::PmString
},
0x5F => {
performer.apc_start();
self.state = State::ApcString
},
0x60..=0x7E => {
performer.esc_dispatch(self.intermediates(), self.ignoring, byte);
self.state = State::Ground
Expand Down Expand Up @@ -434,6 +448,41 @@ impl<const OSC_RAW_BUF_SIZE: usize> Parser<OSC_RAW_BUF_SIZE> {
}
}

#[inline(always)]
fn advance_opaque_string<D: OpaqueDispatch>(&mut self, mut dispatcher: D, byte: u8) {
match byte {
0x07 => {
// The standard only supports ST-terminated SOS/APC/PM strings, using either
// ESC-ST (ESC-\) and C1-ST (0x9C), but kitty (and probably some other
// terminals) also support bell-terminated strings. Some
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is a bit silly because "some other terminals" includes Alacritty. This is mostly an extension to how other sequences like OSCs are handled, which also support the bell terminator and do so already in Alacritty. So this might need some rephrasing to not sound strange.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was referring to SOS/PM/APC sequences, which are not yet supported by alacritty. The current behavior (treating them as the anywhere state) doesn't seem to support termination with BEL, so this would be a behavior change for alacrity, wouldn't it?

But since as you've said, this is how OSC sequences already behave, so I think we should change this behavior to be more consistent.

As stated above, I added these comments only to explain my reasoning, I didn't intend them to be included in the code in the final version of the PR.

// terminals (including Kitty), do not support C1-ST (0x9C) as a
// terminator, which means every character from 0x20-0xFF can be
// used with this sequence in theory.
dispatcher.opaque_end();
self.state = State::Ground
},
0x18 | 0x1A => {
// XTerm terminates SOS/APC/PM strings on C1 CAN (^X) and SUB (^Z). This is also
// the same behavior we implement for OSC strings.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0x18/0x1A is just a general-purpose reset into ground from anywhere, the transition is unrelated to its origin state. There's not really any reason for this comment to exist.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was a bit confused, since not all terminals seem to do that (at least when handling SOS/PM/APC), but I'll remove the comment.

Copy link
Member

@chrisduerr chrisduerr Mar 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I'll refer to https://vt100.net/emu/dec_ansi_parser for "general" guidance on Alacritty's parser here (even though it doesn't handle opaque strings). It's not really a SOS/PM/APC, but more of an "anywhere" thing. As such a SOS/PM/APC-specific comment is unnecessary.

dispatcher.opaque_end();
dispatcher.execute(byte);
self.state = State::Ground
},
0x1B => {
// Any escape code ends the SOS/APC/PM string. This is not standard behavior,
// but avoids having to keep additional state.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment seems misleading. Escape resetting ongoing escape sequences from anywhere is a de-facto standard implemented by many terminal emulators. It might not be explicitly called for in older specifications, but is not incorrect behavior either, it's mostly an implementation detail really.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry if I got my intentions wrong. The comments are not meant to be here to stay. They are mostly documenting my thoughts while looking for feedback to this pull request. I have little experience with de-facto terminal implementations, so I can only refer to the specifications (which are a bit vague).
If escape sequences resetting the state is common de-facto behavior, then I would be happy to keep it this way, since it keeps the implementation clean and simple.

dispatcher.opaque_end();
self.state = State::Escape
},
0x20..=0xFF => {
// Only dispatch valid characters.
dispatcher.opaque_put(byte)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this comment on a method call that dispatches bytes indiscriminately? It might be more appropriate on the match arm instead, but even then I question whether the 0x80 to 0xFF range can be considered "valid characters", considering they're not printable characters (so effectively the same as any other byte).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for that. This is a leftover comment from a previous implementation attempt that I gave up on and forgot to remove.

},
// Ignore all other control codes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Ignore all other control codes
// Ignore all other control bytes.

_ => (),
}
}

#[inline(always)]
fn anywhere<P: Perform>(&mut self, performer: &mut P, byte: u8) {
match byte {
Expand Down Expand Up @@ -743,7 +792,9 @@ enum State {
Escape,
EscapeIntermediate,
OscString,
SosPmApcString,
SosString,
ApcString,
PmString,
#[default]
Ground,
}
Expand Down Expand Up @@ -811,6 +862,41 @@ pub trait Perform {
/// subsequent characters were ignored.
fn esc_dispatch(&mut self, _intermediates: &[u8], _ignore: bool, _byte: u8) {}

/// Invoked when the beginning of a new SOS (Start of String) sequence is
/// encountered.
fn sos_start(&mut self) {}

/// Invoked when the beginning of a new APC (Application Program Command)
/// sequence is encountered.
fn apc_start(&mut self) {}

/// Invoked when the beginning of a new PM (Privacy Message) sequence is
/// encountered.
fn pm_start(&mut self) {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than sorting these by start/put/end, I'd prefer grouping them by sos/apc/pm. I think it makes more sense since the grouping into start/put/end is somewhat arbitrary considering they aren't really interconnected.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. I'll do it.


/// Invoked for every valid character (0x20-0xFF) in a SOS (Start of String)
/// sequence.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// Invoked for every valid character (0x20-0xFF) in a SOS (Start of String)
/// sequence.
/// Invoked for every byte (0x20-0xFF) in a SOS (Start of String) sequence.

Same comment applies for the other functions. Calling these "valid characters" could be misleading to consumers.

fn sos_dispatch(&mut self, _byte: u8) {}

/// Invoked for every valid character (0x20-0xFF) in an APC (Application
/// Program Command) sequence.
fn apc_dispatch(&mut self, _byte: u8) {}

/// Invoked for every valid character (0x20-0xFF) in a PM (Privacy Message)
/// sequence.
fn pm_dispatch(&mut self, _byte: u8) {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure the naming _dispatch is consistent with the rest of our methods. The existing _dispatch functions generally represent the dispatch of an entire escape sequence in full, while this function just represents the dispatch of a single byte in the sequence.

I think this is much closer to the way esc works in VTE, where it is split in hook, put, and unhook.

Other parts of this patch already make use of the _put nomenclature, so I think it's better to be consistent and rename these functions to sos/pm/apc_put.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're absolutely right, considering the method on the internal OpaqueDispatch trait is called opaque_put and then it ends up calling sos/apc/pm_dispatch()...


/// Invoked when the end of an SOS (Start of String) sequence is
/// encountered.
fn sos_string_end(&mut self) {}

/// Invoked when the end of an APC (Application Program Command) sequence is
/// encountered.
fn apc_string_end(&mut self) {}

/// Invoked when the end of a PM (Privacy Message) sequence is encountered.
fn pm_string_end(&mut self) {}

/// Whether the parser should terminate prematurely.
///
/// This can be used in conjunction with
Expand All @@ -825,6 +911,69 @@ pub trait Perform {
}
}

trait OpaqueDispatch {
fn execute(&mut self, byte: u8);
fn opaque_put(&mut self, byte: u8);
fn opaque_end(&mut self);
}
Comment on lines +902 to +906
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly this trait and its implementations is the main issue I have with this patch. It should work fine since it all just gets inlined and optimized out essentially, but it's still a whole lot of boilerplate.

I'm not sure I have any better ideas for now, but at the very least this trait needs a comment explaining that it's just a helper for dispatching over the opaque string escapes and in practice is inlined everywhere to function as static dispatch without conditional indirection.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This trait was the best solution I could think of until now, which means it is the least terrible one. I'll add a comment.


struct SosDispatch<'a, P: Perform>(&'a mut P);

impl<P: Perform> OpaqueDispatch for SosDispatch<'_, P> {
#[inline(always)]
fn execute(&mut self, byte: u8) {
self.0.execute(byte);
}

#[inline(always)]
fn opaque_put(&mut self, byte: u8) {
self.0.sos_dispatch(byte);
}

#[inline(always)]
fn opaque_end(&mut self) {
self.0.sos_string_end();
}
}

struct ApcDispatch<'a, P: Perform>(&'a mut P);

impl<P: Perform> OpaqueDispatch for ApcDispatch<'_, P> {
#[inline(always)]
fn execute(&mut self, byte: u8) {
self.0.execute(byte);
}

#[inline(always)]
fn opaque_put(&mut self, byte: u8) {
self.0.apc_dispatch(byte);
}

#[inline(always)]
fn opaque_end(&mut self) {
self.0.apc_string_end();
}
}

struct PmDispatch<'a, P: Perform>(&'a mut P);

impl<P: Perform> OpaqueDispatch for PmDispatch<'_, P> {
#[inline(always)]
fn execute(&mut self, byte: u8) {
self.0.execute(byte);
}

#[inline(always)]
fn opaque_put(&mut self, byte: u8) {
self.0.pm_dispatch(byte);
}

#[inline(always)]
fn opaque_end(&mut self) {
self.0.pm_string_end();
}
}

#[cfg(all(test, not(feature = "std")))]
#[macro_use]
extern crate std;
Expand Down