科技回声

7 条评论

judofyr3 个月前

I looked a bit into this a few years back and found it quite interesting. Despite them calling them "Tiny Pointers" I would say it's closer to a open addressing hash map. You have a specific key, and then you can "allocate" an entry in the hash map. This gives you back a "pointer". You can then later use the original key and the pointer together to determine the index of the entry. There's also a slight chance that the allocation will fail (similar to a hash collision in a hash map). The "trick" here is that two different keys can end up having the exact same pointer (because we're always dereferencing with the key). This makes them more compact.I was struggling a bit to come up with good use cases for it. Their examples are all around combining them with existing data structures and they show that the space complexity is smaller, but it wasn't completely clear to me how feasible this would actually be in practice.

评论 #43027057 未加载

评论 #43029246 未加载

评论 #43027384 未加载

评论 #43031048 未加载

mont_tag3 个月前

Didn't Python's compact dictionary implementation do this a decade ago?"The dict type now uses a “compact” representation based on a proposal by Raymond Hettinger which was first implemented by PyPy. The memory usage of the new dict() is between 20% and 25% smaller compared to Python 3.5." -- <a href="https://docs.python.org/3.6/whatsnew/3.6.html#whatsnew36-compactdict" rel="nofollow">https://docs.python.org/3.6/whatsnew/3.6.html#whatsnew36-com...</a>"Note, the sizeof(index) can be as small as a single byte for small dicts, two bytes for bigger dicts and up to sizeof(Py_ssize_t) for huge dict." -- <a href="https://mail.python.org/pipermail/python-dev/2012-December/123028.html" rel="nofollow">https://mail.python.org/pipermail/python-dev/2012-December/1...</a>The "tiny pointers" are in the _make_index method in the proof of concept code. -- <a href="https://code.activestate.com/recipes/578375-proof-of-concept-for-a-more-space-efficient-faster/" rel="nofollow">https://code.activestate.com/recipes/578375-proof-of-concept...</a><pre><code> @staticmethod def _make_index(n): 'New sequence of indices using the smallest possible datatype' if n <= 2**7: return array.array('b', [FREE]) * n # signed char if n <= 2**15: return array.array('h', [FREE]) * n # signed short if n <= 2**31: return array.array('l', [FREE]) * n # signed long return [FREE] * n </code></pre> The logic is still present today in CPython. -- <a href="https://raw.githubusercontent.com/python/cpython/3e222e3a15959690a41847a1177ac424427815e5/Objects/dictobject.c" rel="nofollow">https://raw.githubusercontent.com/python/cpython/3e222e3a159...</a><pre><code> dk_indices is actual hashtable. It holds index in entries, or DKIX_EMPTY(-1) or DKIX_DUMMY(-2). Size of indices is dk_size. Type of each index in indices varies with dk_size: * int8 for dk_size <= 128 * int16 for 256 <= dk_size <= 2**15 * int32 for 2**16 <= dk_size <= 2**31 * int64 for 2**32 <= dk_size</code></pre>

评论 #43027022 未加载

评论 #43026809 未加载

评论 #43026770 未加载

kragen3 个月前

Note that this is not the paper by Krapivin that <a href="https://news.ycombinator.com/item?id=43002511">https://news.ycombinator.com/item?id=43002511</a> is about.

评论 #43027310 未加载

kazinator3 个月前

> How large do the pointers need to be? The natural answer is that each pointer uses log nbits. However, the fact that each pointer has a distinct owner makes it possible to compress the pointers to o(log n) bits.What if you have to debug the whole situation, such that you don't always know who is the owner of a pointer you are looking at?> A user k can call Allocate(k) in order to get a tiny pointer p; they can dereference the tiny pointer pby computing a function Dereference(k,p) whose value depends only on k, p, and random bits; and they can free a tiny pointer p by calling a function Free(k,p).That is tantamount ot saying that the pointer is not actually p but the tuple <k, p>, and so its size consists of the number of bits in k, the number of bits in p plus an indication of where the division between these bits lie: where k ends and p begins.We can abbreviate <k, p> to p in contexts where k can be implicitly understood.

BenoitEssiambre3 个月前

Here's a somewhat related post where I argue that logic that _can_ have small pointers has lower entropy and is more optimal as bayesian model of the domain: <a href="https://benoitessiambre.com/entropy.html" rel="nofollow">https://benoitessiambre.com/entropy.html</a>

levzettelin3 个月前

Can someone ELI5?

评论 #43026376 未加载

评论 #43026198 未加载

评论 #43028007 未加载

kittikitti3 个月前

Is there a peer reviewed version of this or does Hacker News exclusively post non peer reviewed works?

评论 #43029373 未加载

评论 #43029579 未加载