Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implemented new more flexible readLineFrom #1801

Merged
merged 4 commits into from
Nov 29, 2018
Merged

Implemented new more flexible readLineFrom #1801

merged 4 commits into from
Nov 29, 2018

Conversation

Hejsil
Copy link
Contributor

@Hejsil Hejsil commented Nov 28, 2018

Fixes #1709, #1579, #1055

I decided to break the API for this function as I think, taking a std.Buffer is a better approach.

  • It allows arbitrary sized lines
  • The read bytes are available through the buffer, even if error.EndOfStream is returned.
  • It can easily be wrapped to accept an []u8 or mem.Allocator
pub fn readLineSlice(slice: []u8) ![]u8 {
    var buf = try std.Buffer.fromOwnedSlice(debug.failing_allocator, slice);
    return try readLine(&buf);
}

pub fn readLineAlloc(allocator: *mem.Allocator) ![]u8 {
    var buf = try std.Buffer.initSize(allocator, 0);
    _ = try readLine(&buf);
    return buf.toOwnedSlice();
}

It now also returns the bytes read, instead of just a length.

@Hejsil Hejsil added the breaking Implementing this issue could cause existing code to no longer compile or have different behavior. label Nov 28, 2018
@daurnimator
Copy link
Contributor

Could you add an option to return the line ending as well?
Some line-based protocols need to know if the ending is \r\n vs \n.

@andrewrk
Copy link
Member

andrewrk commented Nov 29, 2018

This change breaks what I think is a reasonable use case: reading a line of input with no allocator. Previously one could pass a slice as a buffer, and get an error if the line of input is too long. The guess number example demonstrated this use case, and you can see that you had to add an allocator to make it work again, which I would consider a kludge.

Could we have both API interfaces?

Edit: I do think it's nice to return a slice instead of byte count.

@daurnimator
Copy link
Contributor

This change breaks what I think is a reasonable use case: reading a line of input with no allocator.

Why is it a bad thing to get the user chose e.g. FixedBufferAllocator.init() here?
(Does that currently work at compile time? I think that might need fixing at the moment?)


I just had a closer look, why isn't this just a thin wrapper around InStream.readUntilDelimiterBuffer?

@ghost
Copy link

ghost commented Nov 29, 2018

Why not read into an OutStream instead of a Buffer? That's what I did here https://github.com/dbandstra/zigutils/blob/master/src/LineReader.zig

This gives you the option of using SliceOutStream and thus no allocator.

@Hejsil
Copy link
Contributor Author

Hejsil commented Nov 29, 2018

@andrewrk A slice wrapper could easily be done with this code:

pub fn readLineSlice(slice: []u8) ![]u8 {
    var buf = try std.Buffer.fromOwnedSlice(debug.failing_allocator, slice);
    return try readLine(&buf);
}

This does no allocations, and returns error.OutOfMemory when the slice is full.

@daurnimator because of \r\n

@dbandstra Hmm, maybe this is a better idea. I could then make a BufferOutStream myself (which should probably be a thing in std anyways`)

@Hejsil
Copy link
Contributor Author

Hejsil commented Nov 29, 2018

@dbandstra If we wonna return a slice of the bytes read, then taking an OutStream is not really an option

@Hejsil
Copy link
Contributor Author

Hejsil commented Nov 29, 2018

Could you add an option to return the line ending as well?
Some line-based protocols need to know if the ending is \r\n vs \n.

We could always append the newlines, and force users to use mem.trimRight if they don't want it.

@andrewrk
Copy link
Member

@Hejsil I'm fine with that implementation, I think all I want to be happy is

pub fn readLineSlice(slice: []u8) ![]u8 { ... }

and then the example updated to use it.

@andrewrk
Copy link
Member

For the newlines, my suggestion would be:

  • public API which includes the newline(s)
  • public API with different name that calls the other one and does the trimming

@Hejsil
Copy link
Contributor Author

Hejsil commented Nov 29, 2018

This is turning into a large set of function.

  • fn readLineFullFrom(stream: var, buf: *std.Buffer) ![]u8 Returns the line + its acompanying newline chars
  • fn readLineFrom(stream: var, buf: *std.Buffer) ![]u8 Returns the line
  • fn readLineFull(buf: *std.Buffer) ![]u8 Uses stdin, Returns the line + its acompanying newline
  • fn readLine(buf: *std.Buffer) ![]u8 Uses stdin, Returns the line
  • Clones of these, but taking []u8 instead.

@daurnimator
Copy link
Contributor

daurnimator commented Nov 29, 2018

@daurnimator because of \r\n

Could we add a new readUntilDelimiterBuffer variety that takes a set of characters instead of just one? (compare std.mem.indexOfScalar vs std.mem.indexOf)

Edit: oops, I realised it was the old split that takes a set.

@Hejsil
Copy link
Contributor Author

Hejsil commented Nov 29, 2018

@daurnimator But we want to read both lines ending with \n and \r\n. Then we need a function that takes a set of sets, and returns once one of them match (I don't think it's a good idea).

@daurnimator
Copy link
Contributor

@daurnimator But we want to read both lines ending with \n and \r\n. Then we need a function that takes a set of sets, and returns once one of them match (I don't think it's a good idea).

True. Okay lets think about what other languages do: often a file handle will contain a "normalise newlines" flag (e.g. in C's fopen you pass b in the mode to turn off normalisation). This should happen at the readByte level.

If that was done, we do only need a thin wrapper around readUntilDelimiterBuffer (possibly using a FixedBufferAllocator)

@Hejsil
Copy link
Contributor Author

Hejsil commented Nov 29, 2018

@daurnimator But if you normalize your newlines then:

Some line-based protocols need to know if the ending is \r\n vs \n.

@andrewrk
Copy link
Member

By the way, I see std.io.readLine & related functions as being separate from InStream.readUntilDelimiter in one crucial way:

std.io.readLine is a user interface for the terminal. It would be appropriate to support fancy bells and whistles such as history, terminal codes, etc. readUntilDelimiter is for byte streams.

@andrewrk
Copy link
Member

This has the implication that it should not distinguish between the different newline types.

@Hejsil
Copy link
Contributor Author

Hejsil commented Nov 29, 2018

By the way, I see std.io.readLine & related functions as being separate from InStream.readUntilDelimiter in one crucial way:

std.io.readLine is a user interface for the terminal. It would be appropriate to support fancy bells and whistles such as history, terminal codes, etc. readUntilDelimiter is for byte streams.

Makes sense. It would then also support deletion and stuff. For file reading, readUntilDelimiter is the way to go (this is how musl and rust does it).

I also just realized, that:

pub fn readLineSlice(slice: []u8) ![]u8 {
    var buf = try std.Buffer.fromOwnedSlice(debug.failing_allocator, slice);
    return try readLine(&buf);
}

or similar, is not the best wrapper, as std.Buffer wants to keep track of a null byte. This means that the callers buffer is one smaller than they expect.

@andrewrk
Copy link
Member

That one byte smaller problem is solved by #265

@Hejsil
Copy link
Contributor Author

Hejsil commented Nov 29, 2018

Alright. We don't include the newline chars in the buffer because

std.io.readLine is a user interface for the terminal.

This should be ready for merge now. Now I can go to all my projects and replace readLine with readUntilDelimiter :)

@travisstaloch
Copy link
Contributor

Can anyone point me to or provide an example of using this with:

var file = try os.File.openRead(file_name);
defer file.close();
... 

Or is there a better way to accomplish this? I just want to read a file line by line. My attempt isn't working. I am having some problems once buf memory runs out.

    var file = try os.File.openRead(file_name);
    defer file.close();

    var bytes: [128]u8 = undefined;
    var allocator = &std.heap.FixedBufferAllocator.init(bytes[0..]).allocator;

    var buf = try std.Buffer.initSize(allocator, 0);
    var file_in_stream = file.inStream();
    var buf_stream = io.BufferedInStream(os.File.ReadError).init(&file_in_stream.stream);
    var st = &buf_stream.stream;
    var lineno: u32 = 1;

    while (true) {
        if (io.readLineFrom(st, &buf)) |line| {
        ...
        } else |err| switch (err) {
            error.OutOfMemory => {
                warn("\n OUTOFMEMORY \n");
                allocator = &std.heap.FixedBufferAllocator.init(bytes[0..]).allocator;
                buf = try std.Buffer.initSize(allocator, 0);
            },
            error.EndOfStream => {
                warn("\n ENDOFSTREAM \n");
                break;
            },
            else => break,
        }
    }

@Hejsil
Copy link
Contributor Author

Hejsil commented Dec 2, 2018

@travisstaloch InStream has readUntilDelimiterBuffer and readUntilDelimiterAlloc

@travisstaloch
Copy link
Contributor

travisstaloch commented Dec 2, 2018

Thank you. I was able to use readUntilDelimiterAlloc which seems simpler but I'm still running out of memory:

    var file = try os.File.openRead(file_name);
    defer file.close();

    var bytes: [128]u8 = undefined;

    var allocator = &std.heap.FixedBufferAllocator.init(bytes[0..]).allocator;

    var lineno: u32 = 1;
    while (true) {
        var line = file.inStream().stream.readUntilDelimiterAlloc(allocator, '\n', 1024 * 10) catch |e| {
            switch (e) {
                error.EndOfStream => break,
                error.OutOfMemory => {
                    warn("\n OUTOFMEMORY");
                    break;
                },
                else => {
                    warn("\n {}", e);
                    break;
                },
            }
        };
        warn("\nline {} {}", lineno, line);
        lineno += 1;
    }

This reads the first 16 lines of my file (which just contains one number per line) and then runs out of memory, showing OUTOFMEMORY . The file is 1026 lines long. Do I need to handle the OutOfMemory error somehow? Maybe allocate more memory?

@travisstaloch
Copy link
Contributor

Nevermind, it works if I do the following. Thank you.

            error.OutOfMemory => {
                    allocator = &std.heap.FixedBufferAllocator.init(bytes[0..]).allocator;
                    warn("\n OUTOFMEMORY");
                    continue;
                },
    

If anyone comes across this and can recommend a better way to do this, I would like to know. Eventually there should be a much more succinct way to do this, such as a readlines() method that could be used with while for instance.

@Hejsil
Copy link
Contributor Author

Hejsil commented Dec 2, 2018

@travisstaloch You'd have to free the line you allocate. Sadly FixedBufferAllocator does not support freeing. ResettingFixedBufferAllocator is a solution, but it's not the prettiest (it could be better). I'd probably write it like this:

const std = @import("std");

const os = std.os;
const warn = std.debug.warn;

pub fn main() !void {
    var file = try os.File.openRead("test.t");
    defer file.close();

    var lineno: u32 = 1;
    var bytes: [128]u8 = undefined;
    var allocator = &std.heap.FixedBufferAllocator.init(bytes[0..]).allocator;
    var line_buf = try std.Buffer.initSize(allocator, 0);
    defer line_buf.deinit();

    // readUntilDelimiterBuffer will resize the buffer, and append the the bytes to it.
    while (file.inStream().stream.readUntilDelimiterBuffer(&line_buf, '\n', 1024 * 10)) {
        // We can then use line_buf.toSlice() to get our bytes, and print them
        warn("line {} {}\n", lineno, line_buf.toSlice());
        lineno += 1;
    } else |err| switch (err) {
        error.EndOfStream => {},
        else => warn("{}\n", err),
    }
}

@travisstaloch
Copy link
Contributor

Thank you. That works for me.

Only thing I can add is maybe its better to return the error, depending on context:

        else => {
            warn("{}\n", err);
            return err;
        },

Looks a little better. Hope someday this could be much easier.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking Implementing this issue could cause existing code to no longer compile or have different behavior.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants