TechEcho

7 comments

dpkpalmost 9 years ago

kafka-python maintainer here. Our library is designed to be correct first, easy to use second, and fast third. It should not be surprising to anyone that using C extensions improves python performance. I have avoided requiring C compilation in kafka-python primarily because I've found that very few python users care about processing >10K messages per second per core (remember in python w/o C extensions you are generally bound to a single CPU, so spinning up multiple processes usually improves performance. see multiprocessing). I've also found the python infrastructure for distributing C extensions to be not easy (see goal #2 above). But that is changing! I would definitely consider leveraging C extensions for wire protocol decoding given the recent improvements to wheel distribution on linux. I'm not sure whether I would go so far as to delegate the entire client to a C extension. Part of the fun of python is that you can play with all of the guts at runtime. I've found users are very willing to hack up kafka-python internals to help debug issues. I dont think I could expect the same community involvement if it was all distributed as a complied C extension. But I could be wrong.Anyways, always fun to read benchmarks. I hope kafka-python makes someone out there smile. That's the best benchmark in my book.

评论 #11921830 未加载

pixelmonkeyalmost 9 years ago

My team at Parse.ly also did a benchmark comparing pykafka (pure Python) to pykafka with the librdkafka C extension enabled. That C module is clearly a huge win for Kafka consumer/producer performance on Python and other dynamic languages.<a href="http://blog.parsely.com/post/3886/pykafka-now/" rel="nofollow">http://blog.parsely.com/post/3886/pykafka-now/</a>Unfortunately, as the OP illustrates, there are now 2 widely-used Python + Kafka drivers (pykafka and kafka-python), and as of recently, a third, confluent-kafka-python, which is a thin wrapper over librdkafka.The reason there's all this fragmentation is because Kafka was quite the moving target for non-JVM languages for the past three years. We have used it in production since Kafka 0.7, so we've had to live through it all blow-by-blow. I'm hoping that with Kafka 0.10 recently released, we can finally unify the community around a single driver (somehow).

评论 #11916686 未加载

评论 #11917630 未加载

评论 #11917015 未加载

iamspoiltalmost 9 years ago

I ran a couple of Kafka client benchmarks using Python, Jython and Java and got pretty interesting results. Check them here: <a href="http://mrafayaleem.com/2016/03/31/apache-kafka-producer-benchmarks/" rel="nofollow">http://mrafayaleem.com/2016/03/31/apache-kafka-producer-benc...</a>

评论 #11915869 未加载

willvarfaralmost 9 years ago

Ah this reminds me of one of the very most tricky bugs I ever tracked down: <a href="https://github.com/dsully/pykafka/pull/15" rel="nofollow">https://github.com/dsully/pykafka/pull/15</a>

评论 #11915120 未加载

fluentialalmost 9 years ago

After a quick glance, first thing that strikes me is using docker for measuring network bound application performance. Across different versions docker handles networking differently and by default it may have quite significant impact on your results, good example comes from percona guys <a href="https://www.percona.com/blog/2016/02/05/measuring-docker-cpu-network-overhead/" rel="nofollow">https://www.percona.com/blog/2016/02/05/measuring-docker-cpu...</a> I wonder what would results be without using docker, or using docker with --net=host

评论 #11917823 未加载

评论 #11915453 未加载

nerdwalleralmost 9 years ago

Has anyone tried much with the aiokafka library for asyncio (<a href="https://github.com/aio-libs/aiokafka" rel="nofollow">https://github.com/aio-libs/aiokafka</a>)?

sheeshkebabalmost 9 years ago

>I ran these tests within Vagrant hosted on a MacBook Pro 2.2Ghz i7.Good ole laptop benchmarks

7 comments

dpkpalmost 9 years ago

评论 #11921830 未加载

pixelmonkeyalmost 9 years ago

评论 #11916686 未加载

评论 #11917630 未加载

评论 #11917015 未加载

iamspoiltalmost 9 years ago

评论 #11915869 未加载

willvarfaralmost 9 years ago

Ah this reminds me of one of the very most tricky bugs I ever tracked down: <a href="https://github.com/dsully/pykafka/pull/15" rel="nofollow">https://github.com/dsully/pykafka/pull/15</a>

评论 #11915120 未加载

fluentialalmost 9 years ago

评论 #11917823 未加载

评论 #11915453 未加载

nerdwalleralmost 9 years ago

Has anyone tried much with the aiokafka library for asyncio (<a href="https://github.com/aio-libs/aiokafka" rel="nofollow">https://github.com/aio-libs/aiokafka</a>)?

sheeshkebabalmost 9 years ago

>I ran these tests within Vagrant hosted on a MacBook Pro 2.2Ghz i7.Good ole laptop benchmarks

Python Kafka Client Benchmarking

7 comments

Python Kafka Client Benchmarking

7 comments