RFC: Per-line IO Extension #2101

PENGUINLIONG · 2017-08-09T06:36:23Z

This RFC extends std::io to suit the need for uniform per-line reading/writing interface.

kennytm · 2017-08-09T07:39:40Z

text/0000-more-env-info.md

+# Motivation
+[motivation]: #motivation
+
+Programs sometimes need to know about their working environment to do their job properly. For example, linebreak convention differs for Windows and *NIX. Such discrepancy can lead to problems easily, especially when a program needs to communicate with aged third-party libraries.


Please escape the * here ("\*NIX"), the rest of the document is badly highlighted on GitHub's diff view because of this.

kennytm · 2017-08-09T07:40:52Z

text/0000-more-env-info.md

+```rust
+/// Platform architecture.
+/// e.g. `arm`, `x86_64` and `i686`.
+pub fn arch() -> String;


std::env::consts::ARCH

kennytm · 2017-08-09T07:42:25Z

text/0000-more-env-info.md

+pub fn linebreak() -> String;
+/// Current operating system.
+/// e.g. `windows`, `linux` and `darwin`.
+pub fn os() -> String;


std::env::consts::OS

kennytm · 2017-08-09T07:42:56Z

text/0000-more-env-info.md

+pub fn os() -> String;
+/// Word size in bits the program has been compiled into.
+/// Commonly 32 or 64.
+pub fn word_size() -> u32;


std::mem::size_of::<usize>() * 8

Is it, though? usize is defined as a 'pointer-sized unsigned integer type', which connects it to the bit width of the memory address space, not of a general-purpose register (which need not be the same). Given this definition, under the x32 ABI for x86-64 usize ought to be 32-bit, even though general-purpose registers are 64-bit. I also wonder how usize should be defined on 32-bit x86 with segmented pointers...

I think there should be a separate integer type with the bit width of a general-purpose/arithmetic register.

@fstirlitz If we use #1339 (comment), the usize on x32 should be 32-bit. If it was 64-bit, the pointer-cast (x: usize) as *const T would not be a one-to-one mapping, which existing code do rely on.

A separate integer type would require a change of the provided cfg flags (e.g. #[cfg(target_feature = "64-bit register")] type ifast = i64;, which is out-of-scope for this RFC.

oli-obk · 2017-08-09T13:01:31Z

text/0000-more-env-info.md

+pub const LINEBREAK: &'static str;
+/// Word size in bits the program has been compiled into.
+/// Commonly 32 or 64.
+pub const WORD_SIZE: &'static u32;


This should be a u32 and not a &u32.

Also I think it's much more straightforward to use std::mem::size_of<usize>() (which gives you the bytes, but that's usually what you want anyway I think, also a *8 doesn't warrant a new constant in the stdlib imo.

oli-obk · 2017-08-09T13:02:11Z

text/0000-more-env-info.md

+# Unresolved questions
+[unresolved]: #unresolved-questions
+
+The datatype returned by `word_size()` is not yet determined.


It's not a function anymore but a constant.

mark-i-m · 2017-08-14T15:20:28Z

This makes me nervous. I would rather this information come through an API. Doing it through the compiler means that building a library on two different machines could now be guaranteed to not be reproducible.

kennytm · 2017-08-15T08:07:05Z

@mark-i-m I don't think that's what "reproducible" means. Besides, what OP proposed is no more than adding

#[cfg(windows)]
pub const LINEBREAK: &str = "\r\n";
#[cfg(not(windows))]
pub const LINEBREAK: &str = "\n";

to libstd. The standard library already has a lot of #[cfg] stuff that reproducible builds must take into account.

mark-i-m · 2017-08-15T17:38:42Z

@kennytm Hmm... I see what you mean. That's not how I understood the RFC. My understanding was that these values would be more like language items filled in magically by the compiler (which would not be reproducible).

Evrey · 2017-08-16T19:00:00Z

Why not off-load such things to OsString? Have a file that might use \r\n on Win or \n on Tux? Load it into an OsString and make an explicit "with line-breaks"-conversion to String. This way, no Rust-library will have to put mental load into being careful about line-break-style in normal String/str code, meaning that cross platform code is easier to do. (Or just let \r die already, just as \v.)

jan-hudec · 2017-08-22T19:38:18Z

@mark-i-m, it does not matter how it is set, it only matters what it depends on. This shall clearly only depend on host (-target) platform and that is not a problem for reproducible builds, because binaries for different platforms are obviously not expected to be equal.

@Evrey, OsString is the type passed to and from native system calls and that is very different from what is stored in files by convention.

@PENGUINLIONG, the usual way to use the platform newline is via the text flag when opening files.

In Rust standard library design it does not fit in std::fs::File, but is handled in a wrapper. So either a configuration should be added to std::io::BufReader and std::io::BufWriter to set the newline to use, or extended wrappers (std::io::TextReader and std::io::TextWriter) should be added to handle newline conversion and in future also character set conversion.

mark-i-m · 2017-08-22T20:02:06Z

@jan-hudec That's true. Thanks for the correction 👍

Still, I would prefer to minimize magic in the compiler.

scottmcm · 2017-08-23T06:53:01Z

FWIW, C# soured me on platform-specific newlines. Using WriteLine on a System.IO.StringWriter put CRLF into my string, even though I was just making a string that had nothing to do with my current OS.

I feel like some kind of newline normalizing reader/writer wrapper might be a better option here, so nearly all code can continue to just use \n as normal, but places that need something specific can choose which kind of newline to output, be that LineEnd::Lf, LineEnd::CrLf, LineEnd::Cr, or LineEnd::Platform.

PENGUINLIONG · 2017-08-25T10:29:11Z

@jan-hudec Sometimes I just met the problem that the remote app only receive a certain type of line end. Communicating with many of these is just annoying. I'm pretty sure it's absurd and is not common, but it is an really existing problem.

The wrapper provided an uniform interface for line end processing and it seems to work well, @scottmcm . Through @kennytm 's method we don't need to do any magic in the compiler, that's great!

One more question. Should we enumerate all the Unicode line terminators?

BTW, the discussion is on line ends now. Should I re-propose an RFC or just edit it?

jan-hudec · 2017-08-29T16:13:20Z

text/0000-break-lines-with.md

+# Detailed design
+[design]: #detailed-design
+
+This RFC mainly introduces method `break_lines_with()` to primitive `str`. The method replaces all recognizable linebreaks with new linebreak specified, and it will not consume any linebreaks. Here is a simplified implementation of it:


What advantage does this have over plain old replace? I don't see any.

Also, most practical code never wants to have the complete text in a single string anyway. Usually one either pulls lines by one from a BufRead or push them by one to a BufWrite.

I would say performance, especially when the incoming string's libebreaks are unknown. e.g. I want all LF in my string to be replaced by CRLF but the method accepts not only LF-broken strings. In this case I can't simply replace LF with CRLF, since there is an LF in CRLF and the result will be like CRCRLF. Then I have to replace all CRLF to LF temporary beforehand.

I've been thinking about your suggestion also. I've tried to make a trait WriteLn that allows users to specify linebreaks and write a line without using writeln!, as it's linebreak is always '\n'.

But my opinion is, Rust didn't explicitly distinguish text and binary reading/writing, since OpenOption didn't have a method like text() and Read, Write have both binary and text reading/writing methods. It's not Read or Write's job to break lines. That's why I added the method to str.

I will update the RFC when I come up with something better.

@PENGUINLIONG There is BufRead::read_line.

I don't see any point introducing WriteLn since you could just write

write!(output, "blah{newline}", newline="\r\n");

@kennytm Sorry for missing that out.

But I think, it somehow reflected the need for WriteLn and ReadLn: We have a way to input per-line and we are seeking for a way to output per-line conveniently. This abstraction of per-line functionalities might be able to solve the problem.

mark-i-m · 2017-09-03T14:27:19Z

@PENGUINLIONG Perhaps a pre-rfc issue might be useful to explore the design space?

Proposed more-env-info

b17efb0

kennytm reviewed Aug 9, 2017

View reviewed changes

PENGUINLIONG added 3 commits August 9, 2017 16:42

Removed discussion about target triplet.

9f22f0c

Turn functions to fields.

db8c24e

Removed word_size

71114fe

oli-obk reviewed Aug 9, 2017

View reviewed changes

nrc added A-libs T-libs-api Relevant to the library API team, which will review and decide on the RFC. and removed A-libs labels Aug 10, 2017

Updated RFC

4d8570c

PENGUINLIONG changed the title ~~Proposed more-env-info~~ RFC: Unify string linebreaks with str::break_lines_with() Aug 29, 2017

jan-hudec reviewed Aug 29, 2017

View reviewed changes

Updated RFC

7e29faa

PENGUINLIONG changed the title ~~RFC: Unify string linebreaks with str::break_lines_with()~~ RFC: Per-line IO Extension Aug 31, 2017

Preserved macro writeln!()

9d5ff14

PENGUINLIONG closed this Sep 3, 2017

PENGUINLIONG mentioned this pull request Sep 4, 2017

Per-line IO extension and possible re-design of std::io components #2139

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: Per-line IO Extension #2101

RFC: Per-line IO Extension #2101

PENGUINLIONG commented Aug 9, 2017 •

edited

Loading

kennytm Aug 9, 2017 •

edited

Loading

kennytm Aug 9, 2017

kennytm Aug 9, 2017

kennytm Aug 9, 2017

fstirlitz Aug 9, 2017

kennytm Aug 9, 2017

oli-obk Aug 9, 2017

oli-obk Aug 9, 2017

mark-i-m commented Aug 14, 2017

kennytm commented Aug 15, 2017

mark-i-m commented Aug 15, 2017

Evrey commented Aug 16, 2017 •

edited

Loading

jan-hudec commented Aug 22, 2017

mark-i-m commented Aug 22, 2017

scottmcm commented Aug 23, 2017

PENGUINLIONG commented Aug 25, 2017

jan-hudec Aug 29, 2017

PENGUINLIONG Aug 30, 2017

PENGUINLIONG Aug 30, 2017

kennytm Aug 30, 2017

PENGUINLIONG Aug 30, 2017

mark-i-m commented Sep 3, 2017

RFC: Per-line IO Extension #2101

RFC: Per-line IO Extension #2101

Conversation

PENGUINLIONG commented Aug 9, 2017 • edited Loading

kennytm Aug 9, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mark-i-m commented Aug 14, 2017

kennytm commented Aug 15, 2017

mark-i-m commented Aug 15, 2017

Evrey commented Aug 16, 2017 • edited Loading

jan-hudec commented Aug 22, 2017

mark-i-m commented Aug 22, 2017

scottmcm commented Aug 23, 2017

PENGUINLIONG commented Aug 25, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mark-i-m commented Sep 3, 2017

PENGUINLIONG commented Aug 9, 2017 •

edited

Loading

kennytm Aug 9, 2017 •

edited

Loading

Evrey commented Aug 16, 2017 •

edited

Loading