Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> Even if the DB server selectively read columns of each row (none of the common open source SQL databases do), they do so via the OS, which works in terms of pages. Reading a single byte of a page will cause a minimum of 4kb of IO to be made to the disk.

This is why I said it was high level, but hopefully illustrated the point. In addition to the disk page size, you also have all the various metadata associated with the file(s). So, reading a byte from a page can imply reading even more data than the block size (4KiB current).

> Now, unless you're using a DB server that uses O_DIRECT or POSIX_FADV_RANDOM (I just checked and Postgres doesn't), Linux will aggressively readahead at least (it's tunable) 128kb for any random read by default, so even issuing a one byte read to the kernel, device IO will still only occur in a minimum of 128kb chunks, with the remainder living in the page cache until userspace requests it.

AFAIK, Linux only reads ahead if it detects a sequential pattern, or if you specify POSIX_FADV_SEQUENTIAL (double normal). But, as far as the query is concerned, all of the data read that isn't necessary is effectively subtracted from the overall throughput.

I was trying to illustrate the importance of seek latency (~80us vs. ~9-14ms), but yes there are a myriad of other concerns when you're trying to maximize disk throughput.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: