Distributed Test Recommendations

Note: All examples can be found here.

The goals of the Green Team

  • Improve green consistency of distributed tests
  • Mass test runs determined which tests we worked on
    • Prioritized any tests that failed more than once in our mass test runs
  • Fixed about 35 issues (working on more)
  • Spent about a quarter of our time fixing product bugs
  • Anecdotal improvements for Geode developers
    • “CIO board is cleaner”
    • “Seeing more Green pull requests”

Green Team Progress

"If I had a nickel for every thread sleep related failure..."

What was helpful vs unhelpful

Helpful

  • Clean test code
  • All callback and domain classes are specific to test and organized as inner classes within the test class
  • Use of non-deprecated APIs
  • Use of AssertJ instead of Assert

Unhelpful

  • Asynchronous activity
  • Thread.sleep or Wait.pause calls
  • Major refactorings such as modularization
  • Non-thread-safe static product test hooks
  • TestHelper classes and custom startup rules that configure processes or perform actions
  • Generalized callback and domain classes reused by more than one test
  • Tests that use dunit working directories instead of TemporaryFolder Rule
  • Catching unexpected exceptions
  • Instantiating a dunit class again from a static method

"What you can do to make things better?"

Avoid thread sleeping

  • Awaitility -- await for some state to change
  • CountDownLatch -- coordinate between VMs and/or threads
  • Mockito Spy and Verify
    • Verify with timeout can be used instead of Awaitility
    • Wrap anything with spy or use it to implement a callback such as CacheListener
    • Verify is used to validate number of invocations with optional timeout
  • All timeouts should use GeodeAwaitility.getTimeout()
  • Possible exceptions to the rule
    • Busy waiting in loops (Awaitility is usually a better option than custom looping)
    • Slow receiver/listener type testing (there’s probably a better way to do it)

Examples: SleepDistributedTest, MultiThreadedIntegrationTest

"It’s not polite to eat all the exceptions..."

Exception handling

  • Expected vs unexpected exceptions in tests
  • Never catch unexpected exceptions. That is bad... way bad...
    • Special case: Asynchronous actions may need to catch checked exceptions and use ErrorCollector
  • Use method throws clause for unexpected exceptions 
    • The alternative serves to make failures more opaque. 
    • Fixing test failures (as you will have to do at some point) is all about visibility into the problems of the test. 
  • Use AssertJ catchThrowable or assertThatThrown for expected exceptions
    • Avoid using JUnit expected exception support such as ExpectedException Rule or Test annotation expected element. Example: @Test(expected = Exception.class) 
    • Remember to catch specific exceptions, not whole parent classes. If someone adds an additional exception, will it break your test?

Examples: ErrorHandlingTest, ExpectedExceptionTest

"Sometimes you think they’re helping..."

Minimize helper classes

  • All test code should be in the test class
  • Test should directly use Geode User APIs without helper classes
    • Most Geode developers already know the Geode APIs or should learn them
    • See how Geode is configured and what’s happening directly in the test code
    • Easier to debug especially if product bugs are suspected
    • Developers understand usability issues in the Geode User APIs
    • Every layer between the test and what you are testing obscures the test and make debugging more difficult for you and everyone else
  • Avoid generalizing classes such as CacheListeners for many tests
    • Use specialized inner classes that are specific to the tests in the test class
    • Flexibility and generalization are sources of complexity
    • Complexity is a source of bugs and difficult debugging
    • Complexity and generalization is a wall that prevents people from making educated changes
    • All code including configuration should be in the test class
    • Write your code for the next person to look at it.
  • Helper classes cause problems
    • Obscure what’s happening
    • Proliferate bad anti-patterns
    • Combine test classes that should be separate

"Challenges when handling things asynchronously..."

Use ExecutorServiceRule for multithreading

  • Use ExecutorServiceRule or DistributedExcutorServiceRule
    • Provides debugging support for hangs
    • Cleans up threads on tear down
    • Test task code should be interruptible
  • Use Future or CompletableFuture for submitted runnable/callable
  • Always invoke get() on any Future or CompletableFuture

Examples: AsyncErrorHandlingTest, ExecutorSyncHangIntegrationTest, ExecutorLockHangIntegrationTest, ExecutorErrorHandlingDistributedTest

Handling exceptions in callbacks

Examples: AsyncErrorHandlingIntegrationTest, AsyncErrorHandlingDistributedTest

AsyncInvocation usage

  • Timeout now gets a remote stack trace to use as the cause and dumps stack traces for that JVM’s threads
  • Always use await() or get()
    • Both check and throw any remote exceptions
    • Both use GeodeAwaitility Timeout and will throw TimeoutException if it’s exceeded
  • Use await() for Void types and get() when expecting a non-null value

Examples: AsyncInvocationDistributedTest, AsyncInvocationHangDistributedTest

Know your DUnit Rules

Testing tools for Cloud testing

Mass test run pipeline

  • Find and prioritize problematic tests
  • Defined under Geode CI
  • Matches the CI pipelines

gemfire/gemfire-deployments/dunitrunner

Thoughts to take with you

  • Use clean code with meaningful names
  • Describe the point of the test in a comment
  • Use Awaitility and CountDownLatches instead of sleeping/pausing
  • Use AssertJ instead of JUnit Assert
  • Use Geode defaults or configure using Geode APIs directly within the test
  • Use parameterization instead of test class inheritance
  • Avoid sharing callback or domain classes across multiple tests
  • Use ErrorCollector for assertions and exception handling in callbacks instead of rethrowing or setting state to check later
  • Use factory based injection instead of mutable/static product test hooks
  • Use Geode user APIs instead of adapter/helper/junit-rule APIs
  • Avoid unnecessary use of invokeAsync or multithreading
  • Use AsyncInvocation or Future to wait when using invokeAsync
    • Use AsyncInvocation.await() for void types
    • Use AsyncInvocation.get() for non-void types
  • Use only non-deprecated Geode user APIs in tests (unless the test is specifically for deprecated APIs)
  • Use custom logging when debugging hard-to-reproduce problems
  • Know if the call you are making is asynchronous and wait for it, you would be surprised at how many are asynchronous


  • No labels