2 Comments

Reading the bit about using write optimised data stores. At what point do write perf requirements make you think about using a log structured storage vs something based on B trees? Any numbers for write throughput etc. that can be used to reason about this?

Expand full comment

This is a good question and there probably isn't a hard number value that you can assign to your throughput to decide at what point you may want to start considering either of the alternatives. I would instead suggest thinking in terms of access patterns on your data.

If the writes in your application are orders of magnitude more than the reads then log-structured storage may be a natural fit for your use-case since the write-amplification of B-trees (node splits, rebalancing) may pose a performance bottleneck.

In case of a transactional system where reads and writes are comparable - it may be well suited to use a b-tree based solution since random read performance for B-trees is generally better (due to a more structured on-disk layout).

Finally, empirical analysis might be the best way to judge the suitability of either solution to your use-case since the background operations in either system may affect your specific usage patterns in different ways as well.

Expand full comment