-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: Per-line IO Extension #2101
Conversation
text/0000-more-env-info.md
Outdated
# Motivation | ||
[motivation]: #motivation | ||
|
||
Programs sometimes need to know about their working environment to do their job properly. For example, linebreak convention differs for Windows and *NIX. Such discrepancy can lead to problems easily, especially when a program needs to communicate with aged third-party libraries. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please escape the *
here ("\*NIX
"), the rest of the document is badly highlighted on GitHub's diff view because of this.
text/0000-more-env-info.md
Outdated
```rust | ||
/// Platform architecture. | ||
/// e.g. `arm`, `x86_64` and `i686`. | ||
pub fn arch() -> String; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
text/0000-more-env-info.md
Outdated
pub fn linebreak() -> String; | ||
/// Current operating system. | ||
/// e.g. `windows`, `linux` and `darwin`. | ||
pub fn os() -> String; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
text/0000-more-env-info.md
Outdated
pub fn os() -> String; | ||
/// Word size in bits the program has been compiled into. | ||
/// Commonly 32 or 64. | ||
pub fn word_size() -> u32; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
std::mem::size_of::<usize>() * 8
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it, though? usize
is defined as a 'pointer-sized unsigned integer type', which connects it to the bit width of the memory address space, not of a general-purpose register (which need not be the same). Given this definition, under the x32 ABI for x86-64 usize
ought to be 32-bit, even though general-purpose registers are 64-bit. I also wonder how usize
should be defined on 32-bit x86 with segmented pointers...
I think there should be a separate integer type with the bit width of a general-purpose/arithmetic register.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fstirlitz If we use #1339 (comment), the usize
on x32 should be 32-bit. If it was 64-bit, the pointer-cast (x: usize) as *const T
would not be a one-to-one mapping, which existing code do rely on.
A separate integer type would require a change of the provided cfg
flags (e.g. #[cfg(target_feature = "64-bit register")] type ifast = i64;
, which is out-of-scope for this RFC.
text/0000-more-env-info.md
Outdated
pub const LINEBREAK: &'static str; | ||
/// Word size in bits the program has been compiled into. | ||
/// Commonly 32 or 64. | ||
pub const WORD_SIZE: &'static u32; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be a u32
and not a &u32
.
Also I think it's much more straightforward to use std::mem::size_of<usize>()
(which gives you the bytes, but that's usually what you want anyway I think, also a *8
doesn't warrant a new constant in the stdlib imo.
text/0000-more-env-info.md
Outdated
# Unresolved questions | ||
[unresolved]: #unresolved-questions | ||
|
||
The datatype returned by `word_size()` is not yet determined. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not a function anymore but a constant.
This makes me nervous. I would rather this information come through an API. Doing it through the compiler means that building a library on two different machines could now be guaranteed to not be reproducible. |
@mark-i-m I don't think that's what "reproducible" means. Besides, what OP proposed is no more than adding #[cfg(windows)]
pub const LINEBREAK: &str = "\r\n";
#[cfg(not(windows))]
pub const LINEBREAK: &str = "\n"; to libstd. The standard library already has a lot of |
@kennytm Hmm... I see what you mean. That's not how I understood the RFC. My understanding was that these values would be more like language items filled in magically by the compiler (which would not be reproducible). |
Why not off-load such things to |
@mark-i-m, it does not matter how it is set, it only matters what it depends on. This shall clearly only depend on host ( @Evrey, @PENGUINLIONG, the usual way to use the platform newline is via the text flag when opening files. In Rust standard library design it does not fit in |
@jan-hudec That's true. Thanks for the correction 👍 Still, I would prefer to minimize magic in the compiler. |
FWIW, C# soured me on platform-specific newlines. Using I feel like some kind of newline normalizing reader/writer wrapper might be a better option here, so nearly all code can continue to just use |
@jan-hudec Sometimes I just met the problem that the remote app only receive a certain type of line end. Communicating with many of these is just annoying. I'm pretty sure it's absurd and is not common, but it is an really existing problem. The wrapper provided an uniform interface for line end processing and it seems to work well, @scottmcm . Through @kennytm 's method we don't need to do any magic in the compiler, that's great! One more question. Should we enumerate all the Unicode line terminators? BTW, the discussion is on line ends now. Should I re-propose an RFC or just edit it? |
str::break_lines_with()
text/0000-break-lines-with.md
Outdated
# Detailed design | ||
[design]: #detailed-design | ||
|
||
This RFC mainly introduces method `break_lines_with()` to primitive `str`. The method replaces all recognizable linebreaks with new linebreak specified, and it will not consume any linebreaks. Here is a simplified implementation of it: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What advantage does this have over plain old replace
? I don't see any.
Also, most practical code never wants to have the complete text in a single string anyway. Usually one either pulls lines by one from a BufRead
or push them by one to a BufWrite
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say performance, especially when the incoming string's libebreaks are unknown. e.g. I want all LF
in my string to be replaced by CRLF
but the method accepts not only LF
-broken strings. In this case I can't simply replace LF
with CRLF
, since there is an LF
in CRLF
and the result will be like CRCRLF
. Then I have to replace all CRLF
to LF
temporary beforehand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been thinking about your suggestion also. I've tried to make a trait WriteLn
that allows users to specify linebreaks and write a line without using writeln!
, as it's linebreak is always '\n'.
But my opinion is, Rust didn't explicitly distinguish text and binary reading/writing, since OpenOption
didn't have a method like text()
and Read
, Write
have both binary and text reading/writing methods. It's not Read
or Write
's job to break lines. That's why I added the method to str
.
I will update the RFC when I come up with something better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@PENGUINLIONG There is BufRead::read_line
.
I don't see any point introducing WriteLn
since you could just write
write!(output, "blah{newline}", newline="\r\n");
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kennytm Sorry for missing that out.
But I think, it somehow reflected the need for WriteLn
and ReadLn
: We have a way to input per-line and we are seeking for a way to output per-line conveniently. This abstraction of per-line functionalities might be able to solve the problem.
str::break_lines_with()
@PENGUINLIONG Perhaps a pre-rfc issue might be useful to explore the design space? |
This RFC extends
std::io
to suit the need for uniform per-line reading/writing interface.Rendered