科技回声

9 条评论

nkurz将近 12 年前

Time to stop guessing where the speed problems were, and start instrumenting. Time to trot out gprof, the Gnu profiler. I fired it up on the slow example, and waited. And waited, and waited, and waited. I waited overnight. It pole-axed my Ubuntu box, I had to cold boot it. The test case was so big that it plus gprof was too much. gprof slows things down a lot, so I had to cut things way down to get it to work.It's been a long time since I've used gprof. I switched to Valgrind and OProfile about 10 years ago, and more recently to 'perf' and 'likwid'. If the goal is finding hot-spots, these last might be more convenient since they run with minimal overhead --- a couple percent rather than 100x.Are there benefits to gprof that I've forgotten?Are there newer and better profiling tools I don't know about?

评论 #6104669 未加载

评论 #6105078 未加载

评论 #6104903 未加载

WalterBright将近 12 年前

This article chronicles some fun I had boosting DMD's speed by doing some simple changes.

评论 #6104136 未加载

评论 #6104573 未加载

评论 #6104353 未加载

评论 #6105150 未加载

chondl将近 12 年前

Have you considered or tested using either closed hashing or linear array lookups as a replacement for you linked list open hashing implementation. Years ago I significantly improved the speed of a color quantization operation that several other engineers had already optimized by replacing it with a simpler closed hashing algorithm straight out of Knuth. More recently I've had success for small collections using arrays and performing linear search. This technique is used in Redis (see <a href="http://redis.io/topics/memory-optimization" rel="nofollow">http://redis.io/topics/memory-optimization</a>)

评论 #6104557 未加载

aidenn0将近 12 年前

A lot of people underestimate the performance impact of malloc(). It is dog slow. In addition if you use a poor malloc() implementation heavily with varying sized data, you can easily end up using more memory than had you used a copying GC!

评论 #6104932 未加载

评论 #6104827 未加载

评论 #6104830 未加载

jongraehl将近 12 年前

I wondered why you don't store the reciprocal w/ the hash table object. Obviously it wastes some space, but it wouldn't be any slower than your specific checks for 4 and 31, I think. (If most of the tables have size 4 or 31, then I'd use your code).

评论 #6104294 未加载

martin_将近 12 年前

Changing the modulus to use known constants is an awesome trick! Great read

评论 #6104365 未加载

shasta将近 12 年前

Walter, could you explain why lexing was a bottleneck? That's very surprising to me. You don't re-lex template instantiations do you?

评论 #6104456 未加载

gridspy将近 12 年前

A massive advantage of your new linear allocator is that it keeps your memory access continuous. This means that the processor is more likely to have the most recently used memory locations already in cache.You might see further improvements if you split your allocations between two (or more) allocators. One for memory you expect to remain hot (core to the compiler) and one for stuff you think is one-off. That might improve access locality further.

oh_teh_meows将近 12 年前

Does your compiler perform any transformations at all? I imagine it can run out of memory pretty quickly if you're performing multiple transformations in succession on large code base, unless you recycle some of those used memory.Granted...since you explicitly stated that your compiler focused on compile speed, I guess optimized code generation isn't your main concern, since the two are more or less mutually exclusive.

评论 #6104441 未加载

评论 #6104491 未加载

9 条评论

nkurz将近 12 年前

评论 #6104669 未加载

评论 #6105078 未加载

评论 #6104903 未加载

WalterBright将近 12 年前

This article chronicles some fun I had boosting DMD's speed by doing some simple changes.

评论 #6104136 未加载

评论 #6104573 未加载

评论 #6104353 未加载

评论 #6105150 未加载

chondl将近 12 年前

评论 #6104557 未加载

aidenn0将近 12 年前

评论 #6104932 未加载

评论 #6104827 未加载

评论 #6104830 未加载

jongraehl将近 12 年前

评论 #6104294 未加载

martin_将近 12 年前

Changing the modulus to use known constants is an awesome trick! Great read

评论 #6104365 未加载

shasta将近 12 年前

Walter, could you explain why lexing was a bottleneck? That's very surprising to me. You don't re-lex template instantiations do you?

Increasing the D Compiler Speed by Over 75%

9 条评论

Increasing the D Compiler Speed by Over 75%

9 条评论