...
Unfortunately, the script uses the does not use the slice method of FileRecords object which reads the whole segment log files(s) and then it outputs the resultcould partially read a part of the segment log fiel instead of the whole file.
Reading the whole file(s) drastically reduce the usage of this script as this potentially affect a production environment when reading several files in a short period of time, and at the end just reading a few MB or batches will give us the needed information of the current pattern on the topic.
...
The kafka-dump-log.sh uses the DumpLogSegments.scala which uses the object FileRecords for reading the content of the segment log.
I added a new open the slice method which supports to open a new segment log passing the end parameter (already supported by FileRecords object)passing the amount of bytes to be read (end ) parameter.
Then the end the batch iterator will return the InputStream only reading the amount of bytes passed as parameter instead of the whole file
...
As I mentioned above the change requires a new open to call the slice method in FileRecords class to allow to pass the end parameter.
Code Block | ||||
---|---|---|---|---|
| ||||
val /** * Allows to limit the batches on the file record in bytes */ public static FileRecords open(File file, int end) throws IOException { FileChannel channel = openChannelfileRecords = FileRecords.open(file, false, false, 0, false); return new FileRecords(file, channel, 0, end, false); }).slice(startOffset, maxMessageSize) |
- The code changes can be seen here in the open PR
...
- There is not any impact on existing users, it is adding a new feature
- I am using Integer.MAX_VALUE as a default value as FileRecords accepts Integer as end parameter, that means we previously had a limitation of this value. So when the new parameter is not passed it will send the integer.MAX_VALUE which will be the equivalent of reading the whole file (if it does not exceeds the 2 GB) right now based on FileRecord class this limitation already exists