I want to use influx to store _statistics_ not _events_. Basically, my data points are tag-sets and counts.
There are several ways to achieve this; for example, you can send the events to influx and have continuous queries to gather the statistics. That doesn't work well when you have a lot of events, and where they arrive out of order and at high latency, etc.
So what you typically end up having to build is a stats thing that sits in front of influx and tracks the counts of events with particular tag sets in particular time buckets, and then keep uploading these to influx.
And there are two ways to do that:
1) you are not stateful and you keep uploading deltas and incrementing the nanoseconds to avoid data-point collision; you can then get the data out of influx with sum() on the fields and grouping by whatever the time bucket is. I tried this and influx grinds to a halt eventually.
2) you are stateful and track the totals outside influx, and keep uploading a newly-written data-point to overwrite the fields for that bucket in influx. This is much less data in influx and much easier to query, avoiding sum() etc. Its like I end up with something in front of influx doing what I want influx to do.
What would greatly simplify life is if the line format which looks like this:
and in the second case, where the line is prefixed with a + sign, influx knows to add the fields if the data-point collides with another rather than overwrite them.
This would mean that people trying to store statistics in influx could add to those statistics statelessly. A massive simplification.
I've had other problems, like I have way more than 1M series. Its painful. My influx boxes hit iowait far too often, which is weird because the boxes have more RAM than the total dataset.
I want to use influx to store _statistics_ not _events_. Basically, my data points are tag-sets and counts.
There are several ways to achieve this; for example, you can send the events to influx and have continuous queries to gather the statistics. That doesn't work well when you have a lot of events, and where they arrive out of order and at high latency, etc.
So what you typically end up having to build is a stats thing that sits in front of influx and tracks the counts of events with particular tag sets in particular time buckets, and then keep uploading these to influx.
And there are two ways to do that:
1) you are not stateful and you keep uploading deltas and incrementing the nanoseconds to avoid data-point collision; you can then get the data out of influx with sum() on the fields and grouping by whatever the time bucket is. I tried this and influx grinds to a halt eventually.
2) you are stateful and track the totals outside influx, and keep uploading a newly-written data-point to overwrite the fields for that bucket in influx. This is much less data in influx and much easier to query, avoiding sum() etc. Its like I end up with something in front of influx doing what I want influx to do.
What would greatly simplify life is if the line format which looks like this:
measurement,tag1=x,tag2=y,tag3=z f1=total,f2=total timestamp
could look like this:
+measurement,tag1=x,tag2=y,tag3=z f1=delta,f2=delta timestamp
and in the second case, where the line is prefixed with a + sign, influx knows to add the fields if the data-point collides with another rather than overwrite them.
This would mean that people trying to store statistics in influx could add to those statistics statelessly. A massive simplification.
I've had other problems, like I have way more than 1M series. Its painful. My influx boxes hit iowait far too often, which is weird because the boxes have more RAM than the total dataset.