The Green Report

Distributed Test Recommendations

Note: All examples can be found here.

The goals of the Green Team

Improve green consistency of distributed tests
Mass test runs determined which tests we worked on

Prioritized any tests that failed more than once in our mass test runs

Fixed about 35 issues (working on more)
Spent about a quarter of our time fixing product bugs
Anecdotal improvements for Geode developers

“CIO board is cleaner”
“Seeing more Green pull requests”

Green Team Progress

If I had a nickel for every thread sleep related failure...

What was helpful vs unhelpful

Helpful

Clean test code
All callback and domain classes are specific to test and organized as inner classes within the test class
Use of non-deprecated APIs
Use of AssertJ instead of Assert

Unhelpful

Asynchronous activity
Thread.sleep or Wait.pause calls
Major refactorings such as modularization
Non-thread-safe static product test hooks
TestHelper classes and custom startup rules that configure processes or perform actions
Generalized callback and domain classes reused by more than one test
Tests that use dunit working directories instead of TemporaryFolder Rule
Catching unexpected exceptions
Instantiating a dunit class again from a static method

What you can do to make things better?

Avoid thread sleeping

Awaitility -- await for some state to change
CountDownLatch -- coordinate between VMs and/or threads
Mockito Spy and Verify
- Verify with timeout can be used instead of Awaitility

Wrap anything with spy or use it to implement a callback such as CacheListener
Verify is used to validate number of invocations with optional timeout

All timeouts should use GeodeAwaitility.getTimeout()
Possible exceptions to the rule

Busy waiting in loops (Awaitility is usually a better option than custom looping)
Slow receiver/listener type testing (there’s probably a better way to do it)

Examples: SleepDistributedTest, MultiThreadedIntegrationTest

It’s not polite to eat all the exceptions...

Exception handling

Expected vs unexpected exceptions in tests
Never catch unexpected exceptions. That is bad... way bad...

Special case: Asynchronous actions may need to catch checked exceptions and use ErrorCollector

Use method throws clause for unexpected exceptions

The alternative serves to make failures more opaque.
Fixing test failures (as you will have to do at some point) is all about visibility into the problems of the test.

Use AssertJ catchThrowable or assertThatThrown for expected exceptions

Avoid using JUnit expected exception support such as ExpectedException Rule or Test annotation expected element. Example: @Test(expected = Exception.class)
Remember to catch specific exceptions, not whole parent classes. If someone adds an additional exception, will it break your test?

Examples: ErrorHandlingTest, ExpectedExceptionTest

Sometimes you think they’re helping...

Minimize helper classes

All test code should be in the test class
Test should directly use Geode User APIs without helper classes

Most Geode developers already know the Geode APIs or should learn them
See how Geode is configured and what’s happening directly in the test code
Easier to debug especially if product bugs are suspected
Developers understand usability issues in the Geode User APIs
Every layer between the test and what you are testing obscures the test and make debugging more difficult for you and everyone else

Avoid generalizing classes such as CacheListeners for many tests
- Use specialized inner classes that are specific to the tests in the test class
- Flexibility and generalization are sources of complexity
- Complexity is a source of bugs and difficult debugging
- Complexity and generalization is a wall that prevents people from making educated changes
- All code including configuration should be in the test class
- Write your code for the next person to look at it.
Helper classes cause problems
- Obscure what’s happening
- Proliferate bad anti-patterns
- Combine test classes that should be separate

Challenges when handling things asynchronously...

Use ExecutorServiceRule for multithreading

Use ExecutorServiceRule or DistributedExcutorServiceRule

Provides debugging support for hangs
Cleans up threads on tear down
Test task code should be interruptible

Use Future or CompletableFuture for submitted runnable/callable
Always invoke get() on any Future or CompletableFuture

Examples: AsyncErrorHandlingTest, ExecutorSyncHangIntegrationTest, ExecutorLockHangIntegrationTest, ExecutorErrorHandlingDistributedTest

Handling exceptions in callbacks

Avoid catching exception and setting some test state to check later
Use ErrorCollector in unit and integration tests
Use SharedErrorCollector in distributed tests

Examples: AsyncErrorHandlingIntegrationTest, AsyncErrorHandlingDistributedTest

AsyncInvocation usage

Timeout now gets a remote stack trace to use as the cause and dumps stack traces for that JVM’s threads
Always use await() or get()

Both check and throw any remote exceptions
Both use GeodeAwaitility Timeout and will throw TimeoutException if it’s exceeded

Use await() for Void types and get() when expecting a non-null value

Examples: AsyncInvocationDistributedTest, AsyncInvocationHangDistributedTest

Know your DUnit Rules

DistributedRule -- simply launches DUnit and greps for suspect strings after each test method
DistributedExecutorServiceRule -- provides an ExecutorService for all VMs
DistributedRestoreSystemProperties -- restores system properties in all VMs
SharedCountersRule -- shares counters across all VMs
SharedErrorCollector -- shares one JUnit ErrorCollector across all VMs

Testing tools - Cloud Testing

Mass test run pipeline

Find and prioritize problematic tests
Defined under Geode CI
Matches the CI pipelines

gemfire/gemfire-deployments/dunitrunner

Scripts for running and debugging tests in GCP
See https://sites.google.com/a/pivotal.io/gem-engineering/home/how-to/testing-non-h/dunit-testing-using-gcloud

Thoughts to take with you

Use clean code with meaningful names
Describe the point of the test in a comment
Use Awaitility and CountDownLatches instead of sleeping/pausing
Use AssertJ instead of JUnit Assert
Use Geode defaults or configure using Geode APIs directly within the test
Use parameterization instead of test class inheritance
Avoid sharing callback or domain classes across multiple tests
Use ErrorCollector for assertions and exception handling in callbacks instead of rethrowing or setting state to check later
Use factory based injection instead of mutable/static product test hooks
Use Geode user APIs instead of adapter/helper/junit-rule APIs
Avoid unnecessary use of invokeAsync or multithreading
Use AsyncInvocation or Future to wait when using invokeAsync
- Use AsyncInvocation.await() for void types
- Use AsyncInvocation.get() for non-void types
Use only non-deprecated Geode user APIs in tests (unless the test is specifically for deprecated APIs)
Use custom logging when debugging hard-to-reproduce problems
Know if the call you are making is asynchronous and wait for it, you would be surprised at how many are asynchronous

Space shortcuts

Page tree

Distributed Test Recommendations

Green Team Progress

`If I had a nickel for every thread sleep related failure...`

What was helpful vs unhelpful

What you can do to make things better?

Avoid thread sleeping

`It’s not polite to eat all the exceptions...`

Exception handling

`Sometimes you think they’re helping...`

Minimize helper classes

`Challenges when handling things asynchronously...`

Use ExecutorServiceRule for multithreading

Handling exceptions in callbacks

AsyncInvocation usage

Know your DUnit Rules

Testing tools - Cloud Testing

Mass test run pipeline

gemfire/gemfire-deployments/dunitrunner

Thoughts to take with you