THIS IS A TEST INSTANCE. ALL YOUR CHANGES WILL BE LOST!!!!
...
My primary concerns would be (1) not introducing any new bugs, and (2) getting acceptable performance out of the index implementation.
Implementation Phases
- Physical changes to message and log to retain key and offset. This should ideally go out with 0.8 since it will be a compatibility change.
- Change to logical offsets.
- Implement simple "brute force" log compaction for keyed messages
- Nice-to-have
- Implement a generational compaction scheme to lessen the I/O in compaction. This is a nice to have and could be done much later.
- Key-based auditing
Questions and Comments
- What are the implications for replication? I am not 100% sure there are no other cases where we conflate the use off offset as a byte offset and as a message identifier except in the log search.
- Is it odd to have a Message.key() field? If the name of our message class was Record it would seem natural for a Record to have a primary key, but perhaps less so for Message. The other alternative would be to have the key be a field in the log rather than in the message. I am not sure which is better.
- Changing to sequential offsets would imply we should change the terminology from "offset" to a more typical "message id" or "log change number" or something like that, but changing all references to offset is sort of overwhelming.
- I am not completely clear on the interaction with our compression implementation.