Why is Kafka FAST? Part 1

By ByteByteGo

Share:

Key Concepts:

  • High Throughput
  • Sequential I/O
  • Random I/O
  • Append-only Log
  • Hard Disk Drives (HDDs)
  • Solid State Drives (SSDs)
  • Latency

Why Kafka is Fast: Sequential I/O

The term "fast" in the context of Kafka refers to its high throughput, meaning its ability to move a large number of records in a short amount of time. Kafka is optimized for this.

Sequential vs. Random I/O

A key design decision contributing to Kafka's performance is its reliance on sequential I/O. While disk access is generally considered slower than memory access, the speed depends on the data access pattern: random or sequential.

  • Random I/O: Involves accessing data in a non-contiguous manner. For hard drives, this requires the physical arm to move to different locations on the magnetic discs, making it slow.
  • Sequential I/O: Involves accessing data in a contiguous manner, one block after another. This is much faster because the arm doesn't need to jump around.

Kafka's Append-only Log

Kafka uses an append-only log as its primary data structure. This means new data is added to the end of the file, ensuring a sequential access pattern.

Performance Numbers and Cost Considerations

On modern hardware with an array of hard discs, sequential writes can reach hundreds of megabytes per second, while random writes are measured in hundreds of kilobytes per second. This demonstrates that sequential access is several orders of magnitude faster.

Hard disk drives (HDDs) offer a cost advantage over solid-state drives (SSDs). HDDs cost about one-third the price of SSDs but offer about three times the capacity. This allows Kafka to cost-effectively retain messages for a long period of time, a feature that was uncommon in messaging systems before Kafka.

Conclusion

Kafka's speed, specifically its high throughput, is largely attributed to its use of sequential I/O through its append-only log data structure. This design choice, combined with the cost-effectiveness of hard disk drives, allows Kafka to efficiently handle large volumes of data and retain messages for extended periods.

Chat with this Video

AI-Powered

Hi! I can answer questions about this video "Why is Kafka FAST? Part 1". What would you like to know?

Chat is based on the transcript of this video and may not be 100% accurate.

Related Videos

Ready to summarize another video?

Summarize YouTube Video