TechEcho

5 comments

Strilancabout 9 years ago

This post starts off good, but...- The motivation behind this post is storing generated IDs for a site the author is working on. Generated IDs don't have the "smaller values are exponentially more likely" property that make varints useful in the first place.- The post rejects UTF-8's variable-length encoding because "it can only store numbers up to 1,112,064". That's the maximum number of unicode code points (...ish). The UTF-8 encoding's actual limit is >10^16. A much better reason to not use UTF-8's encoding is that it was created under pretty heavy backwards-compatibility restrictions that the author doesn't have. For example, there's no need for being self-synchronizing or to be compatible with Boyer-Moore substring searches.- The post goes with a length-prefixed encoding, but then uses that length prefix as part of the folder structure (it's using the generated IDs as filepaths). Which is a great way to create an exponential distribution in the sizes of your directories, instead of the uniform one that was the goal.It's not a bad post, there's facts in there, but I wouldn't recommend bothering with it.

评论 #11656987 未加载

JonathonWabout 9 years ago

Something probably not particularly important to the author (given that he's just using these essentially as a serial number), but worth mentioning for general use:> Varints can represent negative numbers too, by casting them to unsigned numbers first.But, assuming you're working with twos-complement numbers, it's going to be really inefficient, because small negative numbers are going to end up represented as extremely large positive numbers (e.g. -1 will map to MAX_INT - 1, where MAX_INT is the largest unsigned value representable in your underlying integer type).You also lose the unlimited length property if you need to handle twos-complement integers, and your encoding is dependent on the underlying integer type you're serializing from/deserializing to (especially problematic in languages where types are architecture-dependent, like C/C++).Protobuf handles this by doing what they call "ZigZag encoding" if you declare a value as a signed integer type. This maps small negative numbers to small positive numbers, alternating back and forth like so:<pre><code> Original Encoded 0 0 -1 1 1 2 -2 3 2 4 </code></pre> and so on.The catch is that you lose sortability if you do this... if you're going to have a bunch of negative numbers and need their encoded versions to sort correctly, you'll need to do something else. (Of course, you'll need to with twos-complement numbers, too, given that -1 will appear to be greater than 1).Protobuf documentation on ZigZag encoding: <a href="https://developers.google.com/protocol-buffers/docs/encoding#signed-integers" rel="nofollow">https://developers.google.com/protocol-buffers/docs/encoding...</a>

评论 #11656973 未加载

评论 #11657637 未加载

ccleveabout 9 years ago

The references in the article don't reflect the latest research. So far as I know, this is close to state of the art:<a href="https://github.com/lemire/JavaFastPFOR" rel="nofollow">https://github.com/lemire/JavaFastPFOR</a>The links to the papers in the readme are useful if you want to learn more.

hyperpapeabout 9 years ago

"No “u”, (and no “o” or “i”) so accidental profanity cannot happen."h0ly fvcking sh1t!It's a good try, and making all the letters lowercase makes the numbers as vowels less legible. But if you're creating random strings you will create profanity and slurs.

评论 #11657248 未加载

评论 #11657493 未加载

btraskabout 9 years ago

Varints are widely used in databases. <a href="https://www.sqlite.org/src4/doc/trunk/www/varint.wiki" rel="nofollow">https://www.sqlite.org/src4/doc/trunk/www/varint.wiki</a>

5 comments

Strilancabout 9 years ago

评论 #11656987 未加载

JonathonWabout 9 years ago

评论 #11656973 未加载

评论 #11657637 未加载

ccleveabout 9 years ago

hyperpapeabout 9 years ago

评论 #11657248 未加载

评论 #11657493 未加载

btraskabout 9 years ago

Varints are widely used in databases. <a href="https://www.sqlite.org/src4/doc/trunk/www/varint.wiki" rel="nofollow">https://www.sqlite.org/src4/doc/trunk/www/varint.wiki</a>

Let's Make a Varint

5 comments

Let's Make a Varint

5 comments