Here's a link to a page about Java performance coding - avoiding unnecessary allocation for Java code. The same principles apply to Scala code: http://blog.takipi.com/5-coding-hacks-to-reduce-gc-overhead/

Avoid Unnecessary Allocation

Many things in Scala cause allocation of objects on the heap. This involves quite a lot of overhead to allocate the object (which has extra locations in it beyond the members), initialize memory, call the constructor, etc.

...

When writing in Scala, it usually feels natrual to treat Arrays Array[T] as a sequence of T's and Strings as a sequence of characters. For example

...

While this is convenient and feel's like "correct" Scala, in such cases Scala will implicity box the string String with a StringOps to provide that extra functionality, requiring which requires an allocation. Similarly, using Seq-like functions on an Array will also box he underlying array with an ArrayOps, again requiring allocation. Note that even simple things like the String apply() function, e.g. str(4) , will cause such boxing. Instead, you should use the equivalent str.charAt(4). This is also a key difference between calling .size on an array vs .length. The size method requires allocating an ArrayOps, while length will diretly acess directly access the length from the java array primitive.

Note that in most cases, these allocations are so efficient that it likely won't affect performaneperformance. However, it's possible it could have an affect in a tight inner loop. At , and at the very least, it avoids noise when profiling.

...

In some cases, it can be relatively expensive to create a new instance of an object. In such cases, it might be worth considering if clone()ing an existing instance and mutating it is faster.

Once One case where this appears to be beneficial is with ICU4J Calendar objects. Creating a new Calendar object via Calendar.getInstance(...) is a fairly expensive process with lots of different object allocations. Instead, it is reccommend that one consider something like the following, which minimizes allocations and initialization computations:

Code Block

	scala
	scala

object SomeClass {
  val emptyCalendar = {
    val c = Calendar.getInstance(TimeZone.UNKNOWN_ZONE)
    c.clear()
    c
  }
}

def functionThatNeedsANewCalendar = {
  val cal = SomeClass.emptyCalendar.clone.asInstanceOf[Calendar]
  ...
  cal.set(...)
  ...
  cal
}

Examining Bytecode

As is apparent from many of the above suggestions, minimizing allocations is often key to improving Daffodil performance and making profiling less noisy. Often times an allocation will occur but it isn't clear based on the source why such an allocation might be happening. In these cases, it is often necessary to inspect the bytecode. To do so, the use of the javap function can be invaluable. The following will convert a class to bytecode, including some helpful bytecode interpretations in comments:

Code Block

	bash
	bash

java -p -c path/to/class/file.class

It can also be useful to search the entire code base for certain allocations by looking through the disassemble disassembled code. A useful script to determine decompile all class files is the following:

Code Block

	bash
	bash

find daffodil.git -name '*.class' -exec javap -p -c '{}' \; > disassembled.txt

From there, you can grep this file and determine where unexpected allocations may be taking place. For example, to find allocationallocations of java.math.BigInteger:

Code Block

	bash
	bash

grep -a "new" -n disassembled.txt | grep "java/math/BigInteger"

Profiling & Timing

Often time it is useful to use a profiling to example memory allocations and CPU usage to determine where to target optimizations. However, due to the nested nature of Daffodil parsers/unparser, some profilers can make it difficult to determine how long certain sections of code take, or they incur too make overhead and skew the results. For this reason a speical timer is added to Daffodil's utilties to track sections of code. This timer is the TimeTracker in Timer.scala. A common use of this timer is to track the time of all the parsers. Do enable this, adjust the parse1() method in Parser.scala to like like this:

Code Block

	scala
	scala

TimeTracker.track(parserName) {
  parse(pstate)
}

Then add this section to the end of however your are trigger parsing (e.g. Daffodil CLI code, unit test, performance rig)

Code Block

	scala
	scala

TimeTracker.logTimes(LogLevel.Error)

This will result in something that looks like the following, where the time is in seconds, the average is nanoseconds, and count is the number of times that section was executed.

Code Block

[error] Name                                 Time     Pct  Average    Count
[error] LiteralNilDelimitedEndOfDataParser  3.330  34.03%     4030   826140
[error] StringDelimitedParser               2.455  25.09%     4184   586640
[error] DelimiterTextParser                 1.038  10.61%      879  1180480
[error] SimpleNilOrValueParser              0.985  10.07%     1192   826140
[error] OrderedSeparatedSequenceParser      0.806   8.23%    10232    78720
[error] ElementParser                       0.404   4.13%      342  1180520
[error] DelimiterStackParser                0.308   3.15%      244  1259220
[error] ChoiceParser                        0.226   2.31%     5750    39360
[error] SeqCompParser                       0.113   1.15%      318   354300
[error] ConvertTextNumberParser             0.060   0.61%  1489652       40
[error] OrderedUnseparatedSequenceParser    0.058   0.60%  2922016       20
[error] ConvertTextCombinatorParser         0.000   0.00%     8825       40

This gives a clear breakown of how much time was spent in each parser (excluding nested child parsers) and gives a rough idea of were to focus optimizations. Note that it often sometimes helpto to add additional tracked sections within a parser to determine what parts of a parser are the bottlenecks.

Page tree

Versions Compared

Old Version 9

New Version Current

Key

Avoid Unnecessary Allocation

Examining Bytecode

Profiling & Timing

Page tree

Page History

Versions Compared

Old Version 9

New Version Current

Key

Avoid Unnecessary Allocation

Examining Bytecode

Profiling & Timing