You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 6 Next »

This is a list of key concepts that you should be aware of before doing too much Solr development.

The Rules of Solr

1. The Rules of Solr may be changed by consensus.
2. Community over code.
We will consider our fellow and future developers before our spiffy code change/fix/feature. We share this project.
3. Develop for long term health.
We will work towards solutions that can maintain - via enforcement, documentation, and consideration. I'm not as smart as you, I don't have as much time as you, and I'm not here for the same reasons as you - but there is work I need to do.
4. Code reviews for anything beyond trivial.
We will not be a kitchen sink or a pet project or test bed because of ill timing or current developer level health.

Topics to Familiarize Yourself With.

To be successful with Solr, all you really need is a good attitude. But if you want to go deeper, this is the required reading list:

  1. Java Concurrency
  2. Good Java Practices
  3. Lucene
  4. HTTP 1.1 and 2
  5. Jetty
  6. RandomizedTesting
  7. Gradle
  8. Apache Solr Reference Guide
  9. Developer Docs

Java Concurrency

This is a very complex topic and I recommend you check out a classic book on the topic. A reminder to everyone to please follow the proper rules for object publication - the sharing of an object between two threads. Even if those threads do not access the object concurrently, we must follow these rules. Concurrency is a topic we cannot meet half way.

Proper publication requires that an objects reference AND state are made visible at the same time with the state fully constructed.

A properly constructed object can be shared in one of these ways:

Initializing the object via a static initializer.
Using the final keyword  on a field to ensure that other threads see a fully constructed object after the constructor returns. The value of the field must be effectively immutable or thread safe. Use Collections.unmodifiable and the like to ensure immutability.
Using the volatile keyword to ensure that threads read the most up to date value. The volatile keyword can be tricky, but is very cheap when update are rare.
Guarding the object via another memory barrier like synchronized or a Lock.

Solr uses two annotations to indicate to developers whether a class is thread safe or intended for single threaded use.

Good Java Practices

When we make shortcuts, like violating encapsulation, we create situations that may be reasonable now, but cause problems later. We don't want to be pedantic, but many of the lessons from Object Oriented programming prevent problems down the line.

We also need to pay attention to our class dependency graph - complicated back and forth dependencies can lead to tough code.

We should generally not start or do heavy resource manipulation in an objects constructor. That threads and various interactions before the object is even constructed can very problematic. Prefer a start() method.

HTTP 1.1 and 2

We have to take special care with our use of HTTP as it was not necessarily designed for our use case. Importantly, we don't want to close connections, because we really need and want to reuse them. This means we don't want to close Servlet response streams or use response.sendError calls. Instead we should return a error to the client in the format it asks Solr for as well as the proper response code. We also don't want to flush the response because it will interfere with chunked encoding. We also want to avoid our clients running into stale connections (they hit connection reset exception) - with HTTP 1.1 we can only do this by having the Jetty idle timeout higher the client idle timeout so that the clients control the collection. The other options involving trying to detect a stale connection involve an intrinsic race and are not solid.

HTTP does not have this race problem and has much hardier connections that multiple requests can multiplexed over. Solr currently uses a combination of HTTP1.1 and 2.

Lucene

https://lucene.apache.org/

Jetty

Get started here

https://www.eclipse.org/jetty/documentation/current/index.html

https://www.eclipse.org/jetty/documentation/current/optimizing.html

RandomizedTesting

https://labs.carrotsearch.com/randomizedtesting.html

Gradle

Apache Solr Reference Guide

https://lucene.apache.org/solr/guide/

Developer Docs (will link out)

  • No labels