Implemented new more flexible readLineFrom #1801

Hejsil · 2018-11-28T10:20:49Z

I decided to break the API for this function as I think, taking a std.Buffer is a better approach.

It allows arbitrary sized lines
The read bytes are available through the buffer, even if error.EndOfStream is returned.
It can easily be wrapped to accept an []u8 or mem.Allocator

pub fn readLineSlice(slice: []u8) ![]u8 {
    var buf = try std.Buffer.fromOwnedSlice(debug.failing_allocator, slice);
    return try readLine(&buf);
}

pub fn readLineAlloc(allocator: *mem.Allocator) ![]u8 {
    var buf = try std.Buffer.initSize(allocator, 0);
    _ = try readLine(&buf);
    return buf.toOwnedSlice();
}

It now also returns the bytes read, instead of just a length.

daurnimator · 2018-11-29T01:00:20Z

Could you add an option to return the line ending as well?
Some line-based protocols need to know if the ending is \r\n vs \n.

andrewrk · 2018-11-29T01:10:49Z

This change breaks what I think is a reasonable use case: reading a line of input with no allocator. Previously one could pass a slice as a buffer, and get an error if the line of input is too long. The guess number example demonstrated this use case, and you can see that you had to add an allocator to make it work again, which I would consider a kludge.

Could we have both API interfaces?

Edit: I do think it's nice to return a slice instead of byte count.

daurnimator · 2018-11-29T01:26:33Z

This change breaks what I think is a reasonable use case: reading a line of input with no allocator.

Why is it a bad thing to get the user chose e.g. FixedBufferAllocator.init() here?
(Does that currently work at compile time? I think that might need fixing at the moment?)

I just had a closer look, why isn't this just a thin wrapper around InStream.readUntilDelimiterBuffer?

ghost · 2018-11-29T04:35:40Z

Why not read into an OutStream instead of a Buffer? That's what I did here https://github.com/dbandstra/zigutils/blob/master/src/LineReader.zig

This gives you the option of using SliceOutStream and thus no allocator.

Hejsil · 2018-11-29T06:37:58Z

@andrewrk A slice wrapper could easily be done with this code:

pub fn readLineSlice(slice: []u8) ![]u8 {
    var buf = try std.Buffer.fromOwnedSlice(debug.failing_allocator, slice);
    return try readLine(&buf);
}

This does no allocations, and returns error.OutOfMemory when the slice is full.

@daurnimator because of \r\n

@dbandstra Hmm, maybe this is a better idea. I could then make a BufferOutStream myself (which should probably be a thing in std anyways`)

Hejsil · 2018-11-29T07:27:36Z

@dbandstra If we wonna return a slice of the bytes read, then taking an OutStream is not really an option

Hejsil · 2018-11-29T08:19:43Z

Could you add an option to return the line ending as well?
Some line-based protocols need to know if the ending is \r\n vs \n.

We could always append the newlines, and force users to use mem.trimRight if they don't want it.

andrewrk · 2018-11-29T14:50:48Z

@Hejsil I'm fine with that implementation, I think all I want to be happy is

pub fn readLineSlice(slice: []u8) ![]u8 { ... }

and then the example updated to use it.

andrewrk · 2018-11-29T14:52:04Z

For the newlines, my suggestion would be:

public API which includes the newline(s)
public API with different name that calls the other one and does the trimming

Hejsil · 2018-11-29T14:58:20Z

This is turning into a large set of function.

fn readLineFullFrom(stream: var, buf: *std.Buffer) ![]u8 Returns the line + its acompanying newline chars
fn readLineFrom(stream: var, buf: *std.Buffer) ![]u8 Returns the line
fn readLineFull(buf: *std.Buffer) ![]u8 Uses stdin, Returns the line + its acompanying newline
fn readLine(buf: *std.Buffer) ![]u8 Uses stdin, Returns the line
Clones of these, but taking []u8 instead.

daurnimator · 2018-11-29T14:58:51Z

@daurnimator because of \r\n

Could we add a new readUntilDelimiterBuffer variety that takes a set of characters instead of just one? (compare std.mem.indexOfScalar vs std.mem.indexOf)

Edit: oops, I realised it was the old split that takes a set.

Hejsil · 2018-11-29T15:01:39Z

@daurnimator But we want to read both lines ending with \n and \r\n. Then we need a function that takes a set of sets, and returns once one of them match (I don't think it's a good idea).

daurnimator · 2018-11-29T15:04:48Z

@daurnimator But we want to read both lines ending with \n and \r\n. Then we need a function that takes a set of sets, and returns once one of them match (I don't think it's a good idea).

True. Okay lets think about what other languages do: often a file handle will contain a "normalise newlines" flag (e.g. in C's fopen you pass b in the mode to turn off normalisation). This should happen at the readByte level.

If that was done, we do only need a thin wrapper around readUntilDelimiterBuffer (possibly using a FixedBufferAllocator)

Hejsil · 2018-11-29T16:52:53Z

@daurnimator But if you normalize your newlines then:

Some line-based protocols need to know if the ending is \r\n vs \n.

andrewrk · 2018-11-29T18:02:30Z

By the way, I see std.io.readLine & related functions as being separate from InStream.readUntilDelimiter in one crucial way:

std.io.readLine is a user interface for the terminal. It would be appropriate to support fancy bells and whistles such as history, terminal codes, etc. readUntilDelimiter is for byte streams.

andrewrk · 2018-11-29T18:08:18Z

This has the implication that it should not distinguish between the different newline types.

Hejsil · 2018-11-29T18:11:57Z

By the way, I see std.io.readLine & related functions as being separate from InStream.readUntilDelimiter in one crucial way:

std.io.readLine is a user interface for the terminal. It would be appropriate to support fancy bells and whistles such as history, terminal codes, etc. readUntilDelimiter is for byte streams.

Makes sense. It would then also support deletion and stuff. For file reading, readUntilDelimiter is the way to go (this is how musl and rust does it).

I also just realized, that:

pub fn readLineSlice(slice: []u8) ![]u8 {
    var buf = try std.Buffer.fromOwnedSlice(debug.failing_allocator, slice);
    return try readLine(&buf);
}

or similar, is not the best wrapper, as std.Buffer wants to keep track of a null byte. This means that the callers buffer is one smaller than they expect.

andrewrk · 2018-11-29T18:18:39Z

That one byte smaller problem is solved by #265

This reverts commit 4b9a25a.

Hejsil · 2018-11-29T18:39:11Z

Alright. We don't include the newline chars in the buffer because

std.io.readLine is a user interface for the terminal.

This should be ready for merge now. Now I can go to all my projects and replace readLine with readUntilDelimiter :)

travisstaloch · 2018-12-02T17:53:06Z

Can anyone point me to or provide an example of using this with:

var file = try os.File.openRead(file_name);
defer file.close();
...

Or is there a better way to accomplish this? I just want to read a file line by line. My attempt isn't working. I am having some problems once buf memory runs out.

    var file = try os.File.openRead(file_name);
    defer file.close();

    var bytes: [128]u8 = undefined;
    var allocator = &std.heap.FixedBufferAllocator.init(bytes[0..]).allocator;

    var buf = try std.Buffer.initSize(allocator, 0);
    var file_in_stream = file.inStream();
    var buf_stream = io.BufferedInStream(os.File.ReadError).init(&file_in_stream.stream);
    var st = &buf_stream.stream;
    var lineno: u32 = 1;

    while (true) {
        if (io.readLineFrom(st, &buf)) |line| {
        ...
        } else |err| switch (err) {
            error.OutOfMemory => {
                warn("\n OUTOFMEMORY \n");
                allocator = &std.heap.FixedBufferAllocator.init(bytes[0..]).allocator;
                buf = try std.Buffer.initSize(allocator, 0);
            },
            error.EndOfStream => {
                warn("\n ENDOFSTREAM \n");
                break;
            },
            else => break,
        }
    }

Hejsil · 2018-12-02T17:58:17Z

@travisstaloch InStream has readUntilDelimiterBuffer and readUntilDelimiterAlloc

travisstaloch · 2018-12-02T18:46:38Z

Thank you. I was able to use readUntilDelimiterAlloc which seems simpler but I'm still running out of memory:

    var file = try os.File.openRead(file_name);
    defer file.close();

    var bytes: [128]u8 = undefined;

    var allocator = &std.heap.FixedBufferAllocator.init(bytes[0..]).allocator;

    var lineno: u32 = 1;
    while (true) {
        var line = file.inStream().stream.readUntilDelimiterAlloc(allocator, '\n', 1024 * 10) catch |e| {
            switch (e) {
                error.EndOfStream => break,
                error.OutOfMemory => {
                    warn("\n OUTOFMEMORY");
                    break;
                },
                else => {
                    warn("\n {}", e);
                    break;
                },
            }
        };
        warn("\nline {} {}", lineno, line);
        lineno += 1;
    }

This reads the first 16 lines of my file (which just contains one number per line) and then runs out of memory, showing OUTOFMEMORY . The file is 1026 lines long. Do I need to handle the OutOfMemory error somehow? Maybe allocate more memory?

travisstaloch · 2018-12-02T19:07:53Z

Nevermind, it works if I do the following. Thank you.

            error.OutOfMemory => {
                    allocator = &std.heap.FixedBufferAllocator.init(bytes[0..]).allocator;
                    warn("\n OUTOFMEMORY");
                    continue;
                },

If anyone comes across this and can recommend a better way to do this, I would like to know. Eventually there should be a much more succinct way to do this, such as a readlines() method that could be used with while for instance.

Hejsil · 2018-12-02T19:21:24Z

@travisstaloch You'd have to free the line you allocate. Sadly FixedBufferAllocator does not support freeing. ResettingFixedBufferAllocator is a solution, but it's not the prettiest (it could be better). I'd probably write it like this:

const std = @import("std");

const os = std.os;
const warn = std.debug.warn;

pub fn main() !void {
    var file = try os.File.openRead("test.t");
    defer file.close();

    var lineno: u32 = 1;
    var bytes: [128]u8 = undefined;
    var allocator = &std.heap.FixedBufferAllocator.init(bytes[0..]).allocator;
    var line_buf = try std.Buffer.initSize(allocator, 0);
    defer line_buf.deinit();

    // readUntilDelimiterBuffer will resize the buffer, and append the the bytes to it.
    while (file.inStream().stream.readUntilDelimiterBuffer(&line_buf, '\n', 1024 * 10)) {
        // We can then use line_buf.toSlice() to get our bytes, and print them
        warn("line {} {}\n", lineno, line_buf.toSlice());
        lineno += 1;
    } else |err| switch (err) {
        error.EndOfStream => {},
        else => warn("{}\n", err),
    }
}

travisstaloch · 2018-12-02T19:46:51Z

Thank you. That works for me.

Only thing I can add is maybe its better to return the error, depending on context:

        else => {
            warn("{}\n", err);
            return err;
        },

Looks a little better. Hope someday this could be much easier.

Implemented new more flexible readLineFrom and made readLine a wrapper

d433e8e

Hejsil added the breaking Implementing this issue could cause existing code to no longer compile or have different behavior. label Nov 28, 2018

guess_number example now uses new readLine

4b9a25a

Hejsil added 2 commits November 29, 2018 19:29

Revert "guess_number example now uses new readLine"

da9088e

This reverts commit 4b9a25a.

Implemented readLineSlice and readLineSliceFrom

96f7f75

andrewrk approved these changes Nov 29, 2018

View reviewed changes

Hejsil merged commit 823969a into master Nov 29, 2018

Hejsil deleted the new-readline branch November 29, 2018 21:38

This was referenced Nov 30, 2018

io.readLine() unusable because EOF doesn't return current line #1579

Closed

Unexpected readLine() / printing behaviour #1055

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implemented new more flexible readLineFrom #1801

Implemented new more flexible readLineFrom #1801

Hejsil commented Nov 28, 2018

daurnimator commented Nov 29, 2018

andrewrk commented Nov 29, 2018 •

edited

Loading

daurnimator commented Nov 29, 2018

ghost commented Nov 29, 2018 •

edited by ghost

Loading

Hejsil commented Nov 29, 2018

Hejsil commented Nov 29, 2018

Hejsil commented Nov 29, 2018

andrewrk commented Nov 29, 2018

andrewrk commented Nov 29, 2018

Hejsil commented Nov 29, 2018

daurnimator commented Nov 29, 2018 •

edited

Loading

Hejsil commented Nov 29, 2018

daurnimator commented Nov 29, 2018

Hejsil commented Nov 29, 2018

andrewrk commented Nov 29, 2018

andrewrk commented Nov 29, 2018

Hejsil commented Nov 29, 2018

andrewrk commented Nov 29, 2018

Hejsil commented Nov 29, 2018

travisstaloch commented Dec 2, 2018

Hejsil commented Dec 2, 2018

travisstaloch commented Dec 2, 2018 •

edited

Loading

travisstaloch commented Dec 2, 2018

Hejsil commented Dec 2, 2018 •

edited

Loading

travisstaloch commented Dec 2, 2018

Implemented new more flexible readLineFrom #1801

Implemented new more flexible readLineFrom #1801

Conversation

Hejsil commented Nov 28, 2018

daurnimator commented Nov 29, 2018

andrewrk commented Nov 29, 2018 • edited Loading

daurnimator commented Nov 29, 2018

ghost commented Nov 29, 2018 • edited by ghost Loading

Hejsil commented Nov 29, 2018

Hejsil commented Nov 29, 2018

Hejsil commented Nov 29, 2018

andrewrk commented Nov 29, 2018

andrewrk commented Nov 29, 2018

Hejsil commented Nov 29, 2018

daurnimator commented Nov 29, 2018 • edited Loading

Hejsil commented Nov 29, 2018

daurnimator commented Nov 29, 2018

Hejsil commented Nov 29, 2018

andrewrk commented Nov 29, 2018

andrewrk commented Nov 29, 2018

Hejsil commented Nov 29, 2018

andrewrk commented Nov 29, 2018

Hejsil commented Nov 29, 2018

travisstaloch commented Dec 2, 2018

Hejsil commented Dec 2, 2018

travisstaloch commented Dec 2, 2018 • edited Loading

travisstaloch commented Dec 2, 2018

Hejsil commented Dec 2, 2018 • edited Loading

travisstaloch commented Dec 2, 2018

andrewrk commented Nov 29, 2018 •

edited

Loading

ghost commented Nov 29, 2018 •

edited by ghost

Loading

daurnimator commented Nov 29, 2018 •

edited

Loading

travisstaloch commented Dec 2, 2018 •

edited

Loading

Hejsil commented Dec 2, 2018 •

edited

Loading