Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Why is Solr performance so bad?
  • Why does Solr take so long to start up?
  • Why is SolrCloud acting like my servers are failing when they are fine?

This is an attempt to give basic information only. For a better understanding of the issues involved, read the included links, look for other resources, and ask well thought out questions via Solr support resources.

This guide does not contain info specific to Solr installs that are using HDFS for index storage. If anyone has any info about how to achieve effective index data caching with HDFS, please share that information with the solr-user mailing list or the #solr IRC channel on freenode channel in the ApacheSolr slack space so it can be incorporated here.

Table of Contents

General information

There is a performance bug that makes *everything* slow in versions 6.4.0 and 6.4.1. The problem is fixed in 6.4.2. It is described by SOLR-10130. This is highly version specificspecifica and only applies to VERY old versions, so if you are not running one of the affected versions, don't worry about it. The rest of this document outside of this one paragraph is not specific to ANY version of Solr.

...

It is strongly recommended that Solr runs on a 64-bit Java. A 64-bit Java requires a 64-bit operating system, and a 64-bit operating system requires a 64-bit CPU. There's nothing wrong with 32-bit software or hardware, but a 32-bit Java is limited to a 2GB heap, which can result in artificial limitations that don't exist with a larger heap. .  It is very easy to build an index that will not function at all if the heap cannot be made larger than 2GB.  The Java heap is discussed in a later section of this page.

...

Regardless of the number of nodes or available resources, SolrCloud begins to have stability problems when the number of collections reaches the low hundreds. With thousands of collections, any little problem or change to the cluster can cause a stability death spiral that may not recover for tens of minutes. Try to keep the number of collections as low as possible. When each collection has many shards, the problem can multiply.  These problems are due to how SolrCloud updates cluster state in ZooKeeper in response to cluster changes. Work is underway to try and improve this situation. This problem surfaced in Solr 4.x where the state is kept in a single "clusterstate.json" file. Subsequent Solr versions (5x and above) by default store each collection's data in an individual "state.json" as a child of each collection's znode (e.g. /collections/my_collection/state.json). If you started with a Solr 4x installation, the MIGRATESTATE command will change to the newer, more scalable state. That said, the load on Zookeeper certainly increases as the number of collections (and replicas) increases. Recent Solr versions perform well with thousands of replicas.

Because SolrCloud relies heavily on ZooKeeper, it can be very unstable if you have underlying performance issues that result in operations taking longer than the zkClientTimeout. Increasing that timeout can help, but addressing the underlying performance issues will yield better results. The default timeout (15 seconds internally, and 30 seconds in most recent example configs) is quite long and should be more than enough for a well-tuned SolrCloud install.

...

  • A large index.
  • Frequent updates.
  • Super large documents.
  • Extensive use of faceting.
  • Using a lot of different sort parameters.
  • Very large Solr caches
  • A large RAMBufferSizeMB.
  • Use of Lucene's RAMDirectoryFactory.

How much heap space do I need?

...

  • Take a large index and make it distributed - break your index into multiple shards.
    • One very easy way to do this is to switch to SolrCloud. You may need to reindex but SolrCloud will handle all the sharding for you.  This doesn't actually reduce the overall memory requirement for a large index (it may actually increase it slightly), but spreads it a sharded index can be spread across multiple servers, so with each server will have having lower memory requirements. For redundancy, there should be multiple replicas on different servers.
    • If the query rate is very low, putting multiple shards on a single server will perform well. As the query rate increases, it becomes important to only have one shard replica per server.
  • Don't store all your fields, especially the really big ones.
    • Instead, have your application retrieve detail data from the original data source, not Solr.
    • Note that doing this will mean that you cannot use Atomic Updates.
  • You can also enable docValues on fields used for sorting/facets and reindex.
  • Reduce the number of different sort parameters. Just like for facets, docValues can have a positive impact on both performance and memory usage for sorting.
  • Reduce the size of your Solr caches.
  • Reduce RAMBufferSizeMB. The default in recent Solr versions is 100.
    • This value can be particularly important if you have a lot of cores, because a buffer will be used for each core.
  • Don't use RAMDirectoryFactory - instead, use the default and install enough system RAM so the OS can cache your entire index as discussed above.

GC pause problems

When you have a large heap (larger than 2GB), garbage collection pauses can be a major problem. This is usually caused by occasionally required full garbage collections that must "stop the world" – pause all program execution to clean up memory. There are two main solutions: One is to use a commercial low-pause JVM like Zing, which does come with a price tag. The other is to tune the free JVM you've already got. GC tuning is an art form, and what works for one person may not work for you.

...

Manually tuning the sizes of the various heap generations is very important with CMS. The G1 collector automatically tunes the sizes of the generations as it runs, and forcing the sizes will generally result in lower performance.

...

Asking for millions of rows with e.g. rows=9999999 in combination with high query rate is a known combination that can also cause lots of full GC problems on moderate size indexes (5-10mill). Even if the number of actual hits are very low, the fact that the client requests a huge number of rows will cause the allocation of tons of Java objects (one ScoreDoc per row requested) and also reserve valuable RAM (28 bytes per row). So asking for "all" docs using a high rows param does not come for free. You will see lots of garbage collection going on, and memory consumption rising until the point where a full GC is triggered. Increasing heap may help sometimes, but eventually you'll end up with long pauses, so we need to fix the root problem. Read Toke Eskildsen's blog post about the details of the problem and some his suggestions for improvementsimproving Solr's code.

The simple solution is to ask for fewer rows, or if you need to get a huge number of docs, switch to either /export or , cursorMark (, or streaming).

If you have no control over the client you can instead try to set rows in an invariants section of solrconfig, or if it needs to be dynamic, set a cap on the max value allowed through a custom SearchComponent, such as the RequestSanitizerComponent.

...

SSD

Solid State Disks are amazing. They have high transfer rates and pretty much eliminate the latency problems associated with randomly accessing data.

...

One potential problem with SSD is that operating system TRIM support is required for good long-term performance. For single disks, TRIM is usually well supported, but if you want to add any kind of hardware RAID (and most software RAID as well), TRIM support disappears. At the time of this writing, it seems that only Intel supports a solution and that is limited to Windows 7 or later and RAID 0. One way to make this less of a problem with Solr is to put your OS and Solr itself on a RAID of regular disks, and put your index data on a lone SSD. If On a proper Solr setup, if the SSD fails, your redundant server(s) will still be there to handle requests.

...

This will involve the utility named "top". There are some other variants of this program available, like htop, which do not provide the information desired. Run the "top" utility. If it's the version of top produced by the Gnu project, you can press shift-M to sort the listing by RES memorythe %MEM column, descending. If it's another version of top, getting the appropriate sort may require research. Once the correct sort is achieved, grab a screenshot. Share the screenshot with a file sharing website.

Example, with 28GB heap and over 700 GB of index data:

linux-top-screenshot.png]]Image Added

Process listing on Windows

...

This screenshot example is from a machine that's NOT actually running Solr, but other than that detail, shows what's required:

windows-memory-screenshot.png]]Image Added

Extreme scaling

...

Turning on autoCommit in your solrconfig.xml update handler definition is the solution:

No Format

<updateHandler class="solr.DirectUpdateHandler2">
  <autoCommit>
    <maxDocs>25000</maxDocs>
    <maxTime>300000</maxTime>
    <openSearcher>false</openSearcher>
  </autoCommit>
  <updateLog />
</updateHandler>

...

  • Large autowarmCount values on Solr caches.
  • Heap size issues. Problems from the heap being too big will tend to be infrequent, while problems from the heap being too small will tend to happen consistently.
  • Extremely frequent commits.
  • Not enough OS memory for disk caching, discussed above.

If you have large autowarmCount values on your Solr caches, it can take a very long time to do that cache warming. The filterCache is particularly slow to warm. The solution is to reduce the autowarmCount, reduce the complexity of your queries, or both.

...

Other possible problems that cause slow indexing include committing after every update request, sending one document at a time in each update request instead of batching them, and only using one thread/connection to index. These are problems that are external to Solr. Possible workaround is using the IgnoreCommitOptimizeUpdateProcessorFactory to ignore all commits from client and instead setup autoCommit.

One part of that needs emphasis.  Using only one thread or process for indexing is very often the cause of slow indexing.  Using multiple threads or processes to do indexing in parallel will be faster than a single thread.  For best results, both the indexing system and the system(s) running Solr should have as many CPU cores as you can get.

Further help

If you need additional help with any of the issues discussed on this page, Solr has a very active community. Be sure that you can provide relevant information before asking for help.

...