Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

This guide does not contain info specific to Solr installs that are using HDFS for index storage. If anyone has any info about how to achieve effective index data caching with HDFS, please share that information with the solr-user mailing list or the #solr IRC channel on freenode channel in the ApacheSolr slack space so it can be incorporated here.

...

There is a performance bug that makes *everything* slow in versions 6.4.0 and 6.4.1. The problem is fixed in 6.4.2. It is described by SOLR-10130. This is highly version specificspecifica and only applies to VERY old versions, so if you are not running one of the affected versions, don't worry about it. The rest of this document outside of this one paragraph is not specific to ANY version of Solr.

...

Regardless of the number of nodes or available resources, SolrCloud begins to have stability problems when the number of collections reaches the low hundreds. With thousands of collections, any little problem or change to the cluster can cause a stability death spiral that may not recover for tens of minutes. Try to keep the number of collections as low as possible. When each collection has many shards, the problem can multiply.  These problems are due to how SolrCloud updates cluster state in ZooKeeper in response to cluster changes. Work is underway to try and improve this situation. This problem surfaced in Solr 4.x where the state is kept in a single "clusterstate.json" file. Subsequent Solr versions (5x and above) by default store each collection's data in an individual "state.json" as a child of each collection's znode (e.g. /collections/my_collection/state.json). If you started with a Solr 4x installation, the MIGRATESTATE command will change to the newer, more scalable state. That said, the load on Zookeeper certainly increases as the number of collections (and replicas) increases. Recent Solr versions perform well with thousands of replicas.

Because SolrCloud relies heavily on ZooKeeper, it can be very unstable if you have underlying performance issues that result in operations taking longer than the zkClientTimeout. Increasing that timeout can help, but addressing the underlying performance issues will yield better results. The default timeout (15 seconds internally, and 30 seconds in most recent example configs) is quite long and should be more than enough for a well-tuned SolrCloud install.

...

Other possible problems that cause slow indexing include committing after every update request, sending one document at a time in each update request instead of batching them, and only using one thread/connection to index. These are problems that are external to Solr. Possible workaround is using the IgnoreCommitOptimizeUpdateProcessorFactory to ignore all commits from client and instead setup autoCommit.

One part of that needs emphasis.  Using only one thread or process for indexing is very often the cause of slow indexing.  Using multiple threads or processes to do indexing in parallel will be faster than a single thread.  For best results, both the indexing system and the system(s) running Solr should have as many CPU cores as you can get.

Further help

If you need additional help with any of the issues discussed on this page, Solr has a very active community. Be sure that you can provide relevant information before asking for help.

...