TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Elasticsearch for Beginners: Indexing Your GMail Inbox

216 pointsby SuperKlausover 10 years ago

16 comments

geckoover 10 years ago
I&#x27;ve been doing a whole blog series on doing this also: <a href="http://bitquabit.com/post/having-fun-python-and-elasticsearch-part-1/" rel="nofollow">http:&#x2F;&#x2F;bitquabit.com&#x2F;post&#x2F;having-fun-python-and-elasticsearc...</a> . It&#x27;s intereting to see a different take on it.
评论 #8824455 未加载
jptotoover 10 years ago
This is a totally shameless plug but if you&#x27;d like to learn Elasticsearch from scratch, I&#x27;ve got an introductory course up on Pluralsight. <a href="http://www.pluralsight.com/courses/elasticsearch-for-dotnet-developers" rel="nofollow">http:&#x2F;&#x2F;www.pluralsight.com&#x2F;courses&#x2F;elasticsearch-for-dotnet-...</a>
jrgnsdover 10 years ago
It&#x27;s the first time I see github&#x27;s Readme&#x27;s being used as a blogging tool. Is this common? I&#x27;ve started to link to a Vagrant&#x2F;Ansible repo for my setup &#x2F; code intensive posts, but having the code and the text encapsulated as a repo is quite novel.
评论 #8825058 未加载
评论 #8824250 未加载
评论 #8824166 未加载
评论 #8824520 未加载
chdirover 10 years ago
There are a couple of libraries listed below. Would using any of them make life easier with ElasticSearch + Python?<p>- <a href="https://github.com/elasticsearch/elasticsearch-py" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;elasticsearch&#x2F;elasticsearch-py</a> (low level lib, from ES)<p>- <a href="https://github.com/elasticsearch/elasticsearch-dsl-py" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;elasticsearch&#x2F;elasticsearch-dsl-py</a> (high level lib, from ES)<p>- <a href="https://github.com/mozilla/elasticutils" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;mozilla&#x2F;elasticutils</a> (high level lib from Mozilla)<p>There are a few more, but they are either obsolete or don&#x27;t have much traction. There&#x27;s also django-haystack, but that&#x27;s specific to django.
评论 #8825902 未加载
评论 #8825793 未加载
评论 #8827101 未加载
评论 #8826707 未加载
Fritsdehackerover 10 years ago
I&#x27;ve been thinking about making my own email searchable with elasticsearch. The main thing holding me back is security. With elasticsearch listening on localhost:9200, anyone with local access can read all your mail. Even if you would do this on a computer over which you have full control, even a tiny breach would leak all your mails.<p>I realize this tutorial is just meant to get started with elasticsearch and not meant as a tool to make your email searchable. Still would be interesting to take this to the next level.
spaceman10over 10 years ago
Not sure if people are still here. I tried moving through this and it appears to be failing on the import... I am running a vagrant and get everything installed just fine.<p>I don&#x27;t know how to invoke the script properly...<p>I&#x27;ve tried so many ways. This seems like it would give results... though it does nothing much.<p>python index_emails.py test.mbox<p>Any help or tips are appreciated! This has been a fun project so far. Stumbling at the end. Thanks!
评论 #8870726 未加载
_4giwover 10 years ago
Just a word of caution: elasticsearch allows everyone access to the indexed data, by default. If you&#x27;re doing this on a world-reachable machine with sensitive data, you should probably lock it down or make sure it&#x27;s locked down.<p>There are a number of authentication solutions, and they will require additional configuration -plugins like jetty and elasticsearch-http-basic.
Animatsover 10 years ago
The whole point of GMail was supposed to be that it was searchable. Did Google break that, or what?<p>If there&#x27;s a demand for this, it might be worthwhile to build IMAP servers with more indexing. It&#x27;s easy to request searches with IMAP, but the performance can be a problem for IMAP servers that aren&#x27;t real databases.
评论 #8826684 未加载
superasnover 10 years ago
Very interesting. This is a very useful and practical way of learning new things instead of reading an article about it. I don&#x27;t know python programming but I was able to understand each and every bit of it and I will be coming back to this if I ever need to incorporate Elasticsearch.
gcrover 10 years ago
The &#x27;notmuch&#x27; mail indexing system uses Xapian. I can grep through my 200k messages in seconds.<p><a href="http://notmuchmail.org/" rel="nofollow">http:&#x2F;&#x2F;notmuchmail.org&#x2F;</a><p>Since it&#x27;s implemented as a &quot;library&quot; of sorts, there are interfaces for emacs, command line, GTK, mutt, ...
ladzoppelinover 10 years ago
Wow little tutorials like this with easy attainable data are so helpful. Thanks for posting.
bluefoxover 10 years ago
Analysing the &quot;Turn mbox into JSON&quot; section<p><a href="http://paste.lisp.org/display/145050" rel="nofollow">http:&#x2F;&#x2F;paste.lisp.org&#x2F;display&#x2F;145050</a>
tterraceover 10 years ago
What was the performance like for those queries?
评论 #8826367 未加载
thrownaway2424over 10 years ago
Couldn&#x27;t this be &quot;Indexing your mbox files&quot;? It seems applicable to any mailbox that is in or can be in that format. Except for the x-gmail-labels part, of course.<p>Anyway if you do feel like you want to accomplish the stated purpose of finding which emails are taking up space, you can search in gmail with the word &quot;larger&quot;, as in &quot;larger:20MB&quot;.
curiouslyover 10 years ago
so when should you use elasticsearch? can&#x27;t you get away with doing<p><pre><code> SELECT id FROM pages WHERE title LIKE &quot;%elastic&quot;</code></pre>
评论 #8826530 未加载
评论 #8826550 未加载
piratebroadcastover 10 years ago
Would LOVE to see this in Ruby rather than Python. My boss wants me to learn ElasticSearch.
评论 #8825781 未加载