Distributed Test Recommendations
Note: All examples can be found here.
The goals of the Green Team
- Improve green consistency of distributed tests
- Mass test runs determined which tests we worked on
- Prioritized any tests that failed more than once in our mass test runs
- Fixed about 35 issues (working on more)
- Spent about a quarter of our time fixing product bugs
- Anecdotal improvements for Geode developers
- “CIO board is cleaner”
- “Seeing more Green pull requests”
Green Team Progress
If I had a nickel for every thread sleep related failure...
What was helpful vs unhelpful
Helpful
| Unhelpful
|
---|
What you can do to make things better?
Avoid thread sleeping
- Awaitility -- await for some state to change
- CountDownLatch -- coordinate between VMs and/or threads
- Mockito Spy and Verify
- Verify with timeout can be used instead of Awaitility
- Wrap anything with spy or use it to implement a callback such as CacheListener
- Verify is used to validate number of invocations with optional timeout
- All timeouts should use GeodeAwaitility.getTimeout()
- Possible exceptions to the rule
- Busy waiting in loops (Awaitility is usually a better option than custom looping)
- Slow receiver/listener type testing (there’s probably a better way to do it)
Examples: SleepDistributedTest, MultiThreadedIntegrationTest
It’s not polite to eat all the exceptions...
Exception handling
- Expected vs unexpected exceptions in tests
- Never catch unexpected exceptions. That is bad... way bad...
- Special case: Asynchronous actions may need to catch checked exceptions and use ErrorCollector
- Use method throws clause for unexpected exceptions
- The alternative serves to make failures more opaque.
- Fixing test failures (as you will have to do at some point) is all about visibility into the problems of the test.
- Use AssertJ catchThrowable or assertThatThrown for expected exceptions
- Avoid using JUnit expected exception support such as ExpectedException Rule or Test annotation expected element. Example: @Test(expected = Exception.class)
- Remember to catch specific exceptions, not whole parent classes. If someone adds an additional exception, will it break your test?
Examples: ErrorHandlingTest, ExpectedExceptionTest
Sometimes you think they’re helping...
Minimize helper classes
- All test code should be in the test class
- Test should directly use Geode User APIs without helper classes
- Most Geode developers already know the Geode APIs or should learn them
- See how Geode is configured and what’s happening directly in the test code
- Easier to debug especially if product bugs are suspected
- Developers understand usability issues in the Geode User APIs
- Every layer between the test and what you are testing obscures the test and make debugging more difficult for you and everyone else
- Avoid generalizing classes such as CacheListeners for many tests
- Use specialized inner classes that are specific to the tests in the test class
- Flexibility and generalization are sources of complexity
- Complexity is a source of bugs and difficult debugging
- Complexity and generalization is a wall that prevents people from making educated changes
- All code including configuration should be in the test class
- Write your code for the next person to look at it.
- Helper classes cause problems
- Obscure what’s happening
- Proliferate bad anti-patterns
- Combine test classes that should be separate
Challenges when handling things asynchronously...
Use ExecutorServiceRule for multithreading
- Use ExecutorServiceRule or DistributedExcutorServiceRule
- Provides debugging support for hangs
- Cleans up threads on tear down
- Test task code should be interruptible
- Use Future or CompletableFuture for submitted runnable/callable
- Always invoke get() on any Future or CompletableFuture
Examples: AsyncErrorHandlingTest, ExecutorSyncHangIntegrationTest, ExecutorLockHangIntegrationTest, ExecutorErrorHandlingDistributedTest
Handling exceptions in callbacks
- Avoid catching exception and setting some test state to check later
- Use ErrorCollector in unit and integration tests
- Use SharedErrorCollector in distributed tests
Examples: AsyncErrorHandlingIntegrationTest, AsyncErrorHandlingDistributedTest
AsyncInvocation usage
- Timeout now gets a remote stack trace to use as the cause and dumps stack traces for that JVM’s threads
- Always use await() or get()
- Both check and throw any remote exceptions
- Both use GeodeAwaitility Timeout and will throw TimeoutException if it’s exceeded
- Use await() for Void types and get() when expecting a non-null value
Examples: AsyncInvocationDistributedTest, AsyncInvocationHangDistributedTest
Know your DUnit Rules
- DistributedRule -- simply launches DUnit and greps for suspect strings after each test method
- DistributedExecutorServiceRule -- provides an ExecutorService for all VMs
- DistributedRestoreSystemProperties -- restores system properties in all VMs
- SharedCountersRule -- shares counters across all VMs
- SharedErrorCollector -- shares one JUnit ErrorCollector across all VMs
Testing tools - Cloud Testing
Mass test run pipeline
- Find and prioritize problematic tests
- Defined under Geode CI
- Matches the CI pipelines
gemfire/gemfire-deployments/dunitrunner
- Scripts for running and debugging tests in GCP
- See https://sites.google.com/a/pivotal.io/gem-engineering/home/how-to/testing-non-h/dunit-testing-using-gcloud
Thoughts to take with you
- Use clean code with meaningful names
- Describe the point of the test in a comment
- Use Awaitility and CountDownLatches instead of sleeping/pausing
- Use AssertJ instead of JUnit Assert
- Use Geode defaults or configure using Geode APIs directly within the test
- Use parameterization instead of test class inheritance
- Avoid sharing callback or domain classes across multiple tests
- Use ErrorCollector for assertions and exception handling in callbacks instead of rethrowing or setting state to check later
- Use factory based injection instead of mutable/static product test hooks
- Use Geode user APIs instead of adapter/helper/junit-rule APIs
- Avoid unnecessary use of invokeAsync or multithreading
- Use AsyncInvocation or Future to wait when using invokeAsync
- Use AsyncInvocation.await() for void types
- Use AsyncInvocation.get() for non-void types
- Use only non-deprecated Geode user APIs in tests (unless the test is specifically for deprecated APIs)
- Use custom logging when debugging hard-to-reproduce problems
- Know if the call you are making is asynchronous and wait for it, you would be surprised at how many are asynchronous