Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
First Monthly Challenge: Elasticsearch (trackmaven.com)
37 points by fheisler on Oct 22, 2014 | hide | past | favorite | 17 comments


> Provide real-time text search over a large corpus (ie, some subset of Project Gutenburg, a bunch of product reviews, etc.) That would be nice. But have you looked at the PG export yet? It's a million of individual files in the semantic web notation with duplication for the author information, etc. I would LOVE for somebody to convert that into the full-fidelity form suitable for import into the search engine. So, please do it for real, not just as a 'possible'.

Another dataset with similar problem is Google Mail takeout. Theoretically in an mbox form, there is apparently enough quirks in there to not be parsable by the standalone 3rd party libraries. Somebody said python might be able to do it, but I haven't seen the confirmation specifically for Google Mail.

If you manage to do either and document the steps/share the code, I'll personally sing your praises at the search engine workshops/meetup/lectures I do (usually Solr rather ES).


I actually just loaded emails from both the GMail API and an mbox file into ES. Haven't gotten around to do advanced analytics yet, mainly using it to load into Mixpanel / Intercom to track events such as "customer has replied to one of our emails". We then use this info to do funnel analysis in Mixpanel, and exclude users we're currently in touch with via email from intercom onboarding emails.


Well, don't just brag. Write a blog post, publish a GitHub repo. Share it somehow.


I love the idea of analyzing your own gmail - will take a look into parsing MBOX data!



I think you can also dump it in json, which would be ideal for this.


Which one of these can you dump into JSON? Do you have references? Could be nice.


You could use the Inbox APIs! :)

More info w/ docs: https://www.inboxapp.com/

Open-source version: https://github.com/inboxapp/inbox


I use elastic search + Python scripts(running as bolt on Storm)+ Kibana (with better map) for analytics and its uber cool . Only things where Kibana miss out is 'unique counts' , multiple dashboards across different Document types etc. I hope these get addressed in near future .


Kibana 4 should have some of that:

http://www.elasticsearch.org/blog/kibana-4-beta-1-released/

It exposes quite a bit of the new aggregation functionality added to ES 1.x.


Unfortunately Kibana 4 is missing a lot of functionality from 3.x. One feature that we really want back is the global search scope per dashboard.

Example: Say you've got a dashboard showing hits on a web service. You've got a pie chart showing HTTP return codes, a bar chart showing response times, and another few graphs and charts detailing various data out of the requests themselves.

You could click on, say, the "500" in your return code pie chart, and then every visualization on the page would redraw and show you stats for just requests that that were 500s. (What's unique about the requests that return server errors?)

Or turn it around - click on the section of a chart that denotes requests that took longer than 100ms to process, and now you see info about those requests only. (What makes these long-time requests so special?)

This was a jaw-droppingly awesome troubleshooting tool, and now it's gone. I hope they return it before Kibana4 gets out of beta!


That sounds like an interesting stack, I use python quite a bit and have been dabbling with elasticsearch, glad to see the options for combining the two!


Great idea Fletcher. We <3 ElasticSearch here at Lanetix as well and I think some of our developers that haven't worked with it as much will give your challenge a shot.


Awesome, send along links to anything they come up with! We're thinking of live streaming the meetup, so that might be a good way to share around as well.


Will do. And we'd love to watch if you live stream it.


Great article as I now feel the need to research Elasticsearch vs Postgres Full Text Search to see which one would benefit my side project.


I had the same question and went with Elasticsearch. Also check this re: postgres... http://blog.lostpropertyhq.com/postgres-full-text-search-is-...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: