科技回声

8 条评论

nerfhammer超过 14 年前

> Well, if the table is in InnoDB format it will result in one disk-seek, because the data is stored next to the primary key and can be deleted in one operation. If the table is MyISAM it will result in at least two seeks, because indexes and data are stored in different files. A hard drive can only do one seek at a time, so this detail can make the difference between 1X or 2X transactions per second.Nonsense.Innodb has clustered primary keys, which means that the row data is attached to the leaf nodes of the primary key index as the author correctly states. However, the leaf nodes and the non-leaf index nodes are actually stored in different segments of the table space! While in the same (giant) file, it is unlikely that they would ever be in contiguous space on disk enough to be read in a single random IO operation.But it's more complicated than that: if any of the index pages or data pages have been read recently they will probably still be in the buffer pool, which means that they will require no disk operations.But that's just the seek operation to find the row. The write operation is a different story yet.What innodb will do is modify the row by marking it with the transaction id in which it was deleted. It will keep the row in place so readers with an older transaction id will still see it until all those transactions are complete. The change in the row and the row page will be written to the copies of the affected pages in memory only. Eventually the data pages and any affected index pages will be flushed to disk, potentially grouped with other changes to the same pages. IO operations occur on the level of reading and overwriting whole pages only, if not more.Concurrently it will record in the log buffer every change it makes to the pages in memory. This won't get written to disk right away either, it will flush the log buffer to disk once per second in the default configuration.So there are many more potential disk operations required of innodb than myisam. Generally innodb is preferably because it is vastly more reliable, and because it can handle concurrent read/writes to the same data -- MyISAM basically can't. MyISAM will in fact generally be FASTER for any single operation than InnoDB, because it simply does less.

评论 #1964651 未加载

tom_b超过 14 年前

I started as a performance software person, from memory and cache conscious algorithm design in grad school, to network stack testing, to web server/db server/client-side performance testing and optimization. It is an excellent way to develop that feel for the big picture of how things are working. In enterprise software, there used to be a lot of low hanging fruit and it was fun to get "heroic" results with simple SQL tuning.The flipside of that experience and mindset is that now, as I try to shift my way towards more functional and declarative programming styles is that I sometimes get sucked into the premature optimization black hole - I overthink rather than just doing some simple exploratory programming.I keep telling myself I'm going to implement some fairly large project using nothing but lists and a Lisp to break it down.

评论 #1963780 未加载

评论 #1963608 未加载

tedunangst超过 14 年前

I don't think I agree with the conclusion that performance will "depend almost completely on the random seek time." You can store the entire 10TB library on 5 2TB spinning drives. 5 drives can easily serve up 500mbps (that's only 60MB/s, one drive territory). So, on to seeking.2000 streams, 5 drives, that's 400 per disk. Let's say we have the world's worst disks, that can only do 10 seeks per second. 400 / 10 means we have to buffer 40 seconds of data per stream (per seek) and we have to read it in 0.1 seconds before moving to the next stream. 300kbps * 40s / 8 = 1500K of data. 1.5M / 60MB/s disk transfer takes 0.025 seconds, well under 0.1.I guess that's alluded to by "non standard prefetching", but I don't think it's that advanced. Especially since in a streaming video application, the client software is already going to be doing buffering for you. The bottleneck is bandwidth.Check my math please? :)

评论 #1963721 未加载

评论 #1966539 未加载

评论 #1963836 未加载

brown9-2超过 14 年前

Reading this makes me really curious how Netflix's Instant Watching service is architected. Anyone have any details?

评论 #1964225 未加载

评论 #1964634 未加载

评论 #1969498 未加载

aw3c2超过 14 年前

This is very interesting and well written so someone "dumb" like me can learn a lot about planning and optimisation from this. Riht in the third paragraph I just realised why some SQL query I wrote is so slow, embarassing but true, heh!

评论 #1965052 未加载

ajays超过 14 年前

A much more readable, FB-free version of this is at his blog, <a href="http://carlos.bueno.org/" rel="nofollow">http://carlos.bueno.org/</a>

评论 #1966625 未加载

efsavage超过 14 年前

A company well-stocked with full-stack engineers who can communicate makes the execution of good ideas so frictionless and the death of bad ones so quick and painless that it cannot help but to succeed.If this is the caliber of engineering talent that Facebook values, there should be no surprise they are taking over the world.

gcb超过 14 年前

From the ex-yahoo who got 'famous' by using a microwave near his wifi laptop to simulate dropped packages.

8 条评论

nerfhammer超过 14 年前

评论 #1964651 未加载

tom_b超过 14 年前

评论 #1963780 未加载

评论 #1963608 未加载

tedunangst超过 14 年前

评论 #1963721 未加载

评论 #1966539 未加载

评论 #1963836 未加载

brown9-2超过 14 年前

Reading this makes me really curious how Netflix's Instant Watching service is architected. Anyone have any details?

评论 #1964225 未加载

评论 #1964634 未加载

评论 #1969498 未加载

aw3c2超过 14 年前

评论 #1965052 未加载

ajays超过 14 年前

A much more readable, FB-free version of this is at his blog, <a href="http://carlos.bueno.org/" rel="nofollow">http://carlos.bueno.org/</a>

评论 #1966625 未加载

efsavage超过 14 年前

gcb超过 14 年前

From the ex-yahoo who got 'famous' by using a microwave near his wifi laptop to simulate dropped packages.

The Full Stack, Part I

8 条评论

The Full Stack, Part I

8 条评论