Using Point.Builder is memory intensive - support plain line protocol insertion #149

billevans1963 · 2016-02-24T15:51:17Z

I have been using the maven central 2.1 version of influxdb-java and am finding that the Point.Builder mechanism to insert records is unnecessarily memory intensive: Every point allocates 2 new Maps to build a single String line protocol entry. When you are inserting hundreds of thousands of records at a time, this is not particularly efficient.

Yesterday I discovered that the latest github source contains methods to write Line Protocol directly, so I downloaded it and modified my code to write directly. It seems a much better way to go.

Two questions: Do you intend to develop the write(String) method further, by for example enabling a similar kind of batching that you use for Points? When do you plan to push the next version onto Maven Central?

andrewdodd · 2016-02-24T16:39:53Z

Hi @billevans1963

I guess that it a bit inefficient now that I think about it.

Could you shed any light on how you are building the line protocol strings without going via the builder?

Additionally, if you have any input on #148, especially around the implication of #126 to batching, it would be great if you could comment.

andrewdodd · 2016-02-24T16:51:22Z

Any chance you have an example test case that demonstrates this issue? I.e. a case using Point.Builder and a case without?

billevans1963 · 2016-02-24T18:03:32Z

Hi @andrewdodd,

I replaced Point.Builder and 'enableBatching' with the following very simple class:

import java.util.ArrayList;
import java.util.List;

import org.influxdb.InfluxDB;
import org.influxdb.InfluxDB.ConsistencyLevel;

class InfluxBatchInserter {
    //
    private final String database;
    private final String retentionPolicyName;
    private final ConsistencyLevel consistencyLevel = ConsistencyLevel.ANY;
    private final InfluxDB influxConx;
    private final int batchSize;
    //
    private List<String> linesToWrite = new ArrayList<String>();



    InfluxBatchInserter(InfluxDB influxConx, String database, String retentionPolicyName, int batchSize) {
        this.influxConx = influxConx;
        this.database = database;
        this.retentionPolicyName = retentionPolicyName;
        this.batchSize = batchSize;
    }   
    protected void flush() {
        if (linesToWrite.size() > 0) {
            this.influxConx.write(database, retentionPolicyName, consistencyLevel, linesToWrite);
            linesToWrite.clear();           
        }
    }
    protected void write(String lineProtocol, boolean flushRegardless) {
        this.linesToWrite.add(lineProtocol);
        if (flushRegardless || linesToWrite.size() >= batchSize) {
            this.flush();           
        }
    }   
}

Whenever I want to write data I call the write method with the data already formatted to the line protocol.

billevans1963 · 2016-02-24T18:09:35Z

What I have written is a very basic batching mechanism which I doubt is as resilient as the BatchPoints mechanism you employed. So maybe a further enhancement would be a "BatchStrings" class (or other name), that would essentially manage batches of Strings instead of Points. What do you think?

andrewdodd · 2016-02-24T18:10:42Z

So do you think the memory issue is in the storage of 'Point' objects that contain the 2 Map objects?

andrewdodd · 2016-02-24T18:21:32Z

The original reason for the 'bare / raw strings' write method was the performance of converting to the line protocol (heavily debated in issue 126).

So I think there are actually two problems we might be talking about here:

The performance of converting from a 'Point' to a line protocol formatted string
The performance of storing Point objects in memory

I still have my doubts about issue 126 and thus problem 1 above. The reason being that at some point you have convert your application domain objects to line-protocol-strings, and I don't believe that the current implementation can be that bad.

However, I do agree that the current design of storing the Point objects will have the following two effects:

The Point objects in memory will hold refs to 2 map objects, which is probably unecessary
The Point objects will only be converted to Strings at the time of sending (which will be repeated if they fail); though they could be converted earlier.

Is this your understanding?

billevans1963 · 2016-02-24T18:48:03Z

Yes, that would be my understanding.

In my opinion, by-passing the Builder.Point Object creation should save big on:

Memory allocations. 100,000 Points => 200,000 short-lived HashMaps.
CPU. Because GC has to do less work.

andrewdodd · 2016-03-01T09:11:55Z

@billevans1963 , any chance you could have a look at this PR and see if a change like that would help you?

In the test that I added, the memory performance (unsurprisingly) depends a lot on what kind of data you have.

billevans1963 · 2016-03-01T21:39:35Z

@andrewdodd I like the simplicity of new Point2 Object. However, my preference is to avoid the use of the Builder to build the lines because it still keeps allocating 2 new TreeMaps for every Point. If you made the Builder re-use it's maps it would be better.

wcork · 2016-09-30T17:12:42Z

I may create a PR for this. A quick and dirty fix would be to just use Trove maps instead. Not sure what Point looked like in March, but using

Point pnt = Point.Builder...... build();

Looks like it allocates another pair of maps when build() is called (from calling new Point()). Possibly adding another constructor: Point(Map<String, String> tags, Map<String, String> fields) and calling that from build() will solve the issue.

~~Expect a PR soon.~~

Edit: TreeMap is allocated only in Builder. List<Pair<String, String>> instead of TreeMap should be used due to size and has a similar toString(). Linear in traversal too (So are RB trees).

wcork · 2016-09-30T20:56:09Z

After writing it to be List<Pair<String, String/Object> compatible, I've decided that I've made this code harder to understand. There's a reason why Pair isn't made standard in Java (I had to import javax).

Playing around, I saw very small changes in RAM with switching this data-type.

However, in doing so, I read through the tests and found that InfluxDB-Java is, in fact, capable of using the line-protocol:

testWriteMultipleStringData()

...
    this.influxDB.write(dbName, rp, InfluxDB.ConsistencyLevel.ONE, "cpu,atag=test1 idle=100,usertime=10,system=1\ncpu,atag=test2 idle=200,usertime=20,system=2\ncpu,atag=test3 idle=300,usertime=30,system=3");
...

testWriteMultipleStringDataLines()

...
    this.influxDB.write(dbName, rp, InfluxDB.ConsistencyLevel.ONE, Arrays.asList(
                "cpu,atag=test1 idle=100,usertime=10,system=1",
                "cpu,atag=test2 idle=200,usertime=20,system=2",
                "cpu,atag=test3 idle=300,usertime=30,system=3"
        ));
...

Problems solved.

majst01 · 2016-10-03T08:39:14Z

duplicate of #151 closing now

majst01 mentioned this issue Oct 3, 2016

Investigation of options for Issue #149 #151

Closed

majst01 closed this as completed Oct 3, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using Point.Builder is memory intensive - support plain line protocol insertion #149

Using Point.Builder is memory intensive - support plain line protocol insertion #149

billevans1963 commented Feb 24, 2016

andrewdodd commented Feb 24, 2016

andrewdodd commented Feb 24, 2016

billevans1963 commented Feb 24, 2016

billevans1963 commented Feb 24, 2016

andrewdodd commented Feb 24, 2016

andrewdodd commented Feb 24, 2016

billevans1963 commented Feb 24, 2016

andrewdodd commented Mar 1, 2016

billevans1963 commented Mar 1, 2016

wcork commented Sep 30, 2016 •

edited

Loading

wcork commented Sep 30, 2016

majst01 commented Oct 3, 2016

Using Point.Builder is memory intensive - support plain line protocol insertion #149

Using Point.Builder is memory intensive - support plain line protocol insertion #149

Comments

billevans1963 commented Feb 24, 2016

andrewdodd commented Feb 24, 2016

andrewdodd commented Feb 24, 2016

billevans1963 commented Feb 24, 2016

billevans1963 commented Feb 24, 2016

andrewdodd commented Feb 24, 2016

andrewdodd commented Feb 24, 2016

billevans1963 commented Feb 24, 2016

andrewdodd commented Mar 1, 2016

billevans1963 commented Mar 1, 2016

wcork commented Sep 30, 2016 • edited Loading

wcork commented Sep 30, 2016

majst01 commented Oct 3, 2016

wcork commented Sep 30, 2016 •

edited

Loading