...
Here's a link to a page about Java performance coding - avoiding unnecessary allocation for Java code. The same principles apply to Scala code: http://blog.takipi.com/5-coding-hacks-to-reduce-gc-overhead/
Avoid Unnecessary Allocation
Many things in Scala cause allocation of objects on the heap. This involves quite a lot of overhead to allocate the object (which has extra locations in it beyond the members), initialize memory, call the constructor, etc.
...
When writing in Scala, it usually feels natrual to treat Arrays Array[T] as a sequence of T's and Strings as a sequence of characters. For example
...
While this is convenient and feel's like "correct" Scala, in such cases Scala will implicity box the string String with a StringOps to provide that extra functionality, requiring which requires an allocation. Similarly, using Seq-like functions on an Array will also box he underlying array with an ArrayOps, again requiring allocation. Note that even simple things like the String apply()
function, e.g. str(4)
, will cause such boxing. Instead, you should use the equivalent str.charAt(4)
. This is also a key difference between calling .size
on an array vs .length
. The size method requires allocating an ArrayOps, while length will diretly acess directly access the length from the java array primitive.
Note that in most cases, these allocations are so efficient that it likely won't affect performaneperformance. However, it's possible it could have an affect in a tight inner loop. At , and at the very least, it avoids noise when profiling.
Examining Bytecode
As is apparent from the above suggestions, minimizing allocations is often key to improving Daffodil performance and making profiling less noisy. Often times an allocation will occur but it isn't clear based on the source why such an allocation might be happening. In these cases, it is often necessary to inspect the bytecode. To do so, the use of the javap
function can be invaluable. The following will convert a class to bytecode, including some helpful comments:
...
java -p -c path/to/class/file.class
It can also be useful to search the entire code base for certain allocations by looking through the disassemble code. A useful script to determine is the following:
...
find daffodil.git -name '*.class' -exec javap -p -c '{}' \; > disassembled.txt
Consider Cloning vs Creating a New Instance
In some cases, it can be relatively expensive to create a new instance of an object. In such cases, it might be worth considering if clone()
ing an existing instance and mutating it is faster.
One case where this appears to be beneficial is with ICU4J Calendar objects. Creating a new Calendar object via Calendar.getInstance(...)
is a fairly expensive process with lots of different object allocations. Instead, it is reccommend that one consider something like the following, which minimizes allocations and initialization computations:
Code Block | ||||
---|---|---|---|---|
| ||||
object SomeClass {
val emptyCalendar = {
val c = Calendar.getInstance(TimeZone.UNKNOWN_ZONE)
c.clear()
c
}
}
def functionThatNeedsANewCalendar = {
val cal = SomeClass.emptyCalendar.clone.asInstanceOf[Calendar]
...
cal.set(...)
...
cal
} |
Examining Bytecode
As is apparent from many of the above suggestions, minimizing allocations is often key to improving Daffodil performance and making profiling less noisy. Often times an allocation will occur but it isn't clear based on the source why such an allocation might be happening. In these cases, it is often necessary to inspect the bytecode. To do so, the use of the javap
function can be invaluable. The following will convert a class to bytecode, including some helpful bytecode interpretations in comments:
Code Block | ||||
---|---|---|---|---|
| ||||
java -p -c path/to/class/file.class |
It can also be useful to search the entire code base for certain allocations by looking through the disassembled code. A useful script to decompile all class files is the following:
Code Block | ||||
---|---|---|---|---|
| ||||
find daffodil.git -name '*.class' -exec javap -p -c '{}' \; > disassembled.txt |
From there, you can grep this file and determine where unexpected allocations may be taking place. For example, to find allocations of java.math.BigInteger:
Code Block | ||||
---|---|---|---|---|
| ||||
grep -a "new" -n disassembled.txt | grep "java/math/BigInteger" |
Profiling & Timing
Often time it is useful to use a profiling to example memory allocations and CPU usage to determine where to target optimizations. However, due to the nested nature of Daffodil parsers/unparser, some profilers can make it difficult to determine how long certain sections of code take, or they incur too make overhead and skew the results. For this reason a speical timer is added to Daffodil's utilties to track sections of code. This timer is the TimeTracker
in Timer.scala. A common use of this timer is to track the time of all the parsers. Do enable this, adjust the parse1()
method in Parser.scala
to like like this:
Code Block | ||||
---|---|---|---|---|
| ||||
TimeTracker.track(parserName) {
parse(pstate)
} |
Then add this section to the end of however your are trigger parsing (e.g. Daffodil CLI code, unit test, performance rig)
Code Block | ||||
---|---|---|---|---|
| ||||
TimeTracker.logTimes(LogLevel.Error) |
This will result in something that looks like the following, where the time is in seconds, the average is nanoseconds, and count is the number of times that section was executed.
Code Block |
---|
[error] Name Time Pct Average Count
[error] LiteralNilDelimitedEndOfDataParser 3.330 34.03% 4030 826140
[error] StringDelimitedParser 2.455 25.09% 4184 586640
[error] DelimiterTextParser 1.038 10.61% 879 1180480
[error] SimpleNilOrValueParser 0.985 10.07% 1192 826140
[error] OrderedSeparatedSequenceParser 0.806 8.23% 10232 78720
[error] ElementParser 0.404 4.13% 342 1180520
[error] DelimiterStackParser 0.308 3.15% 244 1259220
[error] ChoiceParser 0.226 2.31% 5750 39360
[error] SeqCompParser 0.113 1.15% 318 354300
[error] ConvertTextNumberParser 0.060 0.61% 1489652 40
[error] OrderedUnseparatedSequenceParser 0.058 0.60% 2922016 20
[error] ConvertTextCombinatorParser 0.000 0.00% 8825 40 |
This gives a clear breakown of how much time was spent in each parser (excluding nested child parsers) and gives a rough idea of were to focus optimizations. Note that it often sometimes helpto to add additional tracked sections within a parser to determine what parts of a parser are the bottlenecks.
From there, you can grep this file and determine where unexpected allocations may be taking place. For example, to find allocation
...