The way I would view it would be to outsource it until you can afford to hire someone to work on that full time. Search is a bitch. Elasticsearch makes it a bit easier, but if you're a startup and search isn't your primary business it's not a bad idea to outsource it to experts.
> The way I would view it would be to outsource it until you can afford to hire someone to work on that full time.
Exactly. I would outsource whatever the problem is that can be outsourced, in this case search to Algolia until we are far enough along that we can tackle it ourselves.
I don't think there's a real equation for it until you get to the point you can no longer afford it, but at that point you probably waited too long (excepting the cases where you've run into stratospheric growth).
I think as a team you should be looking ahead at growth estimates and making the judgement call to begin working on bringing it in house. Ideally you want the opportunity to run both side by side for a while.
And, honestly, what if you architect it or don't grow enough to make the cost a pain point? As long as the service provider is doing a good job, you could use the opportunity to extend your product into various other directions. Why build search if your focus is on something else and your provider is affordable?
I tend to think of Time Series data as being several orders of magnitude larger than 23 million data points per week (38 per second) but now I can't seem to find a good definition of Time Series data. Anyone have thoughts on the rough threshold between event data and time series data? I think of arrays of hundreds/thousands of individual sensors that take 10 measurements a second as "different" than user generated data that is time-ordered.
I agree, time series should be more like 1000 measurements taken 100 times a second. Industrial acquisition data is not the same thing as timestamped web log data.
Median might solve some of that issue, unfortunately it is computationally heavy to do median on a rolling basis, unlike average. Part of the reason the filters work so quickly is because when I add or remove items from the active set, I can just add/subtract from the total and the count for that one item. With median, I'd have to keep the active list sorted which even with a binary tree under the covers is still more expensive than two math operations. The filtering library under the cover is crossfilter.
You don't need any add-ons to selectively block flash nowadays on Firefox... Menu - Add-ons - Plugins - Flash "Ask to activate" instead of "Always activate"