You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 14 Next »

This is a list of key concepts that you should be aware of before doing too much Solr development.

The Rules of Solr

1. The Rules of Solr may be changed by consensus.
2. Community over code.
We will consider our fellow and future developers before our spiffy code change/fix/feature. We share this project.
3. Develop for long term health.
We will work towards solutions that can maintain - via enforcement, documentation, good logging, and consideration. I'm not as smart as you, I don't have as much time as you, and I'm not here for the same reasons as you - but there is work I need to do.
4. Code reviews for anything beyond trivial.
We will not be a kitchen sink or a pet project or test bed because of ill timing or current developer level health.
5. Code to your level.
We will not commit code we don't understand just because tests pass and will not be reckless in areas we are not qualified yet to be reckless in.

Topics to Familiarize Yourself With.

To be successful with Solr, all you really need is a good attitude. But if you want to go deeper, this is the required reading list:

  1. Java Concurrency
  2. Good Java Practices
  3. Lucene
  4. HTTP 1.1 and 2
  5. Jetty
  6. RandomizedTesting
  7. Gradle
  8. Zookeeper
  9. Apache Solr Reference Guide
  10. Developer Docs

Java Concurrency

This is a very complex topic and I recommend you check out a classic book on the topic.

Object Publishing

A reminder to everyone to please follow the proper rules for object publication - the sharing of an object between two threads. Even if those threads do not access the object concurrently, we must follow these rules. Concurrency is a topic we cannot meet half way.

Proper publication requires that an objects reference AND state are made visible at the same time with the state fully constructed.

A properly constructed object can be shared in one of these ways:

Initializing the object via a static initializer.
Using the final keyword  on a field to ensure that other threads see a fully constructed object after the constructor returns. The value of the field must be effectively immutable or thread safe. Use Collections.unmodifiable and the like to ensure immutability.
Using the volatile keyword to ensure that threads read the most up to date value. The volatile keyword can be tricky, but is very cheap when update are rare.
Guarding the object via another memory barrier like synchronized or a Lock.

(32bit primitives can be slightly different?)

Class Annotations

Solr uses two annotations to indicate to developers whether a class is thread safe or intended for single threaded use. TODO

Efficiency

We should try and use concurrency strategies that are efficient. For example, see ConcurrentHashMap, ConcurrentHashMap#newKeySet.

Good Java Practices

When we make shortcuts, like violating encapsulation, we create situations that may be reasonable now, but cause problems later. We don't want to be pedantic, but many of the lessons from Object Oriented programming prevent problems down the line.

We also need to pay attention to our class dependency graph - complicated back and forth dependencies can lead to tough code.

We should generally not start or do heavy resource manipulation in an objects constructor. Threads starting up and various interactions before the object is even constructed can be very problematic. Prefer a start() method.

HTTP 1.1 and 2

We have to take special care with our use of HTTP as it was not necessarily designed for our use case. Importantly, we don't want to close connections, because we really need and want to reuse them. This means we don't want to close Servlet response streams or use response.sendError calls. Instead we should return a error to the client in the format it asks Solr for as well as the proper response code. We also don't want to flush the response because it will interfere with chunked encoding. We also want to avoid our clients running into stale connections (they hit connection reset exception) - with HTTP 1.1 we can only do this by having the Jetty idle timeout higher the client idle timeout so that the clients control the connection. The other options involving trying to detect a stale connection involve an intrinsic race and are not good enough for our use case.

HTTP does not have this race problem and has much hardier connections that multiple requests can be multiplexed over. Solr currently uses a combination of HTTP1.1 and 2.

Lucene

https://lucene.apache.org/

Jetty

Get started here

https://www.eclipse.org/jetty/documentation/current/index.html

https://www.eclipse.org/jetty/documentation/current/optimizing.html

RandomizedTesting

https://labs.carrotsearch.com/randomizedtesting.html

Gradle

Apache Solr Reference Guide

https://lucene.apache.org/solr/guide/

Developer Docs (will link out)

  • No labels