...

Current state: Under Discussion

Discussion threads:

Main: https://lists.apache.org/thread

...

.html/r3bf5981894e5b94f1897f4946b1652f068d3090dbccc84bfe90e130d%40%3Cdev.lucene.apache.org%3E
RefGuide dogfooding: https://

...

lists.apache.org/

...

thread.html/r7b4f3acce10fc3e1eceedb32cbd7349ace4af70632ec6f4def16f4ab%40%3Cdev.lucene.apache.org%3E

JIRA:

Jira

server	ASF JIRA
serverId	5aa69414-a9e9-3523-82ec-879b028fb15b
key	SOLR-14726

, many others, TBD

Released: TBD (target 9.0)

Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast). Confluence supports inline comments that can also be used.

...

Table of Contents

Motivation

As Solr has grown, the examples have become a mix of ancient documents, kitchen-sink additions with complicated - and often confusing - interplay of definitions, left-over configurations conflicting or out of sync with documentation. The "more info" links mostly point into legacy wiki that is two generations of redirects behind the current Reference Guide. Solutions were introduced with a fix-the-pain approach, that have also caused magic paths or pushed demonstration configurations into consolidated defaults. The new features are often not demonstrated as adding new example requires understanding the existing one.

...

Go through the default configuration files line by line.
1. Ensure that any documentation and explanation not yet in the Reference Guide are moved there. Delete any significant passage and replace them with Ref Guide links to ensure a single-source of truth (
  Jira
  server ASF JIRA
  serverId 5aa69414-a9e9-3523-82ec-879b028fb15b
  key SOLR-11875
  Jira
  server ASF JIRA
  serverId 5aa69414-a9e9-3523-82ec-879b028fb15b
  key SOLR-14841
  Jira
  server ASF JIRA
  serverId 5aa69414-a9e9-3523-82ec-879b028fb15b
  key SOLR-14834
  )
2. Delete any default blocks that do not use parameter substitutions and point them to RefGuide for the section and to the API to get the real defaults as appropriate
3. Delete legacy sections that 'no longer work' (e.g. jmx, possibly EditorialMarkerFactory)
4. Delete workaround explanations for those migration from Solr prior to Solr 7? (Document them on RefGuide ?)
Review directory layouts current state
1. Compare:
  1. Out-of-the-box for default install
  2. Out-of-the-box example install and hacks (e.g. in bin/solr)
  3. serviceinstall scripts
  4. docker setup (
    Jira
    server ASF JIRA
    serverId 5aa69414-a9e9-3523-82ec-879b028fb15b
    key SOLR-11245
    )
  5. Existing issues:
    Jira
    server ASF JIRA
    serverId 5aa69414-a9e9-3523-82ec-879b028fb15b
    key SOLR-13035
    
    Jira
    server ASF JIRA
    serverId 5aa69414-a9e9-3523-82ec-879b028fb15b
    key SOLR-6671
2. Clarify naming for locations of:
  1. Static O/S global part of running solr
  2. Writable O/S global part of running solr (only pid file or more?)
  3. Server/Node level information (start.in.sh?, logs? configsets? solr.xml) - there may be several of this on a physical server, such as in cloud example. Or put all those in solr.home and have cores one level lower under coreRootDirectory (in solr.xml, but see
    Jira
    server ASF JIRA
    serverId 5aa69414-a9e9-3523-82ec-879b028fb15b
    key SOLR-14097
  4. Collection/Core level information (core.properties)
  5. Individual directories per core (conf, data) - some of these already can be in other locations
Refactor example directory and associated commands to reduce magic
1. This mainly affects log configuration and logging directory locations and figuring out what is the directory above solr home
2. May also involve exploration about configsets and environmental override directories
Create new examples (
Jira
server ASF JIRA
serverId 5aa69414-a9e9-3523-82ec-879b028fb15b
key SOLR-10329
, testable?
Jira
server ASF JIRA
serverId 5aa69414-a9e9-3523-82ec-879b028fb15b
key SOLR-11352
)
1. Create a base learning config that is either based on default or has even simpler its own (
  Jira
  server ASF JIRA
  serverId 5aa69414-a9e9-3523-82ec-879b028fb15b
  key SOLR-13652
  )
2. Setup new dataset (https://www.fakenamegenerator.com can generate 100k records with many interesting fields under CC license (https://creativecommons.org/licenses/by-sa/3.0/us/, similar to CC license used by films example already)
  1. Split records into different formats to demonstrate XML, CSV, multiple JSONs, nested records, etc
3. Create a number of additive configurations+examples, that augment base configuration to demonstrate specific features with point precision
4. Move non-essential schema definitions (e.g. languages) from default into alternative schema (new kitchen-sink). Should it be copy/paste XML or API commands, To Be Explored (
  Jira
  server ASF JIRA
  serverId 5aa69414-a9e9-3523-82ec-879b028fb15b
  key SOLR-11033
  )
5. Update documentation to use new examples to demonstrate features that used to use older configsets
6. Use short names for analyzer/filter/tokenizer wherever possible (
  Jira
  server ASF JIRA
  serverId 5aa69414-a9e9-3523-82ec-879b028fb15b
  key SOLR-13691
  ) - make sure they are easily discoverable in documentation as well
Rewrite Getting Started guide that focuses on simplest path through
1. Start from standalone mode
2. Explain what is happening with cross-references for more details (teach troubleshooting skills early)
3. Use API as much as possible, but not at a cost of readability/comprehension
4. Demonstrate recent APIs/features
5. Build up to the cloud example
Bigger changes that needs further discussion
1. Delete ALL DIH examples in bulk - DONE (JIRA TBC
  Jira
  server ASF JIRA
  serverId 5aa69414-a9e9-3523-82ec-879b028fb15b
  key SOLR-14066
  ,
  Jira
  server ASF JIRA
  serverId 5aa69414-a9e9-3523-82ec-879b028fb15b
  key SOLR-14783
  )
2. Delete Tika configuration and refer to the manual for configuration and warning (JIRA TBC
  Jira
  server ASF JIRA
  serverId 5aa69414-a9e9-3523-82ec-879b028fb15b
  key SOLR-13973
  )
3. Move schemaless mode into learning chain (JIRA TBC
  Jira
  server ASF JIRA
  serverId 5aa69414-a9e9-3523-82ec-879b028fb15b
  key SOLR-14701
  Jira
  server ASF JIRA
  serverId 5aa69414-a9e9-3523-82ec-879b028fb15b
  key SOLR-11741
  )
4. Delete (refactor) techproducts example and its files (but what about tests?)
5. Delete Velocity example (discussed somewhere else?
  Jira
  server ASF JIRA
  serverId 5aa69414-a9e9-3523-82ec-879b028fb15b
  key SOLR-14065
  )
6. V2 vs V1 API for examples (V2 is not available for standalone mode in 8.6.1)
7. post tool vs curl
8. Interplay with Admin UI changes in progress (e.g. how much to leverage/demonstrate it)
9. Neither default nor techproducts are realistic production schemes - a whole separate but related discussion (Jira exists?)
10. It seems that even though Velocity/DIH/others have been deprecated, they have not actually been removed from code/documentation for 9.0 yet. Are there Jiras for that already?
Other cleanup
1. Fix the dead/legacy wiki.apache.org links (
  Jira
  server ASF JIRA
  serverId 5aa69414-a9e9-3523-82ec-879b028fb15b
  key SOLR-14834
  )

Compatibility, Deprecation, and Migration Plan

...

It may be possible to create just a minimal learning schema and/or a couple of examples, but this would still not address that, once the person tries to add new functionality or test new features, they are not supported. Nor will it address kitchen-sink production deploys.

Related previous explorations and feature tests

Learning vs Production vs kitchen sink setup

Learning config

Should be as small as possible and still load in both standalone and cloud configurations
Should have every line to have a purpose and be explained with RefGuide references
managed-schema should be ordered in the order of reading comprehension (fieldType, related fields, uniqueKey declaration next to ID)
Additional examples should layer on top of learning schema to demonstrate different features
schemaless mode (to be rewritten to be learning mode) is a separate example
Related issues:
- Jira
  server ASF JIRA
  serverId 5aa69414-a9e9-3523-82ec-879b028fb15b
  key SOLR-13652

Production config

managed-schema should be minimal to allow users to include what is actually needed
solrconfig.xml
- should be fairly comprehensive, but obscure defaults and detailed explanation should live in RefGuide. From experience, nobody updates the schema files unless forced to (it still points to wiki)
- there should be some easy way to tell solrconfig.xml nested structure where a new configuration needs to go (or focus on configoverlay and config API if it is fixed )

Kitchen sink config

Is there a point to have a kitchen sink config that is basically a reference of field type definitions? That's where all the language variants could go.
managed-schema points
- having kitchen-sink default configset allows us to put some inline comments that make no sense in either production or learning schema as their files may get rewritten on use
- may be write locked to clearly indicate it is not for real use
- kitchen sink may be the only one with commented out analyzer lines

Lessons learned

From DIH Cleanup (
Jira
server ASF JIRA
serverId 5aa69414-a9e9-3523-82ec-879b028fb15b
key SOLR-14783
)

To get DIH to work, we had to add permissions into solr/server/etc/security.policy, which is very low-level. Is it going to be an issue? Do we need a way for packages to explain such needs on install? Are there more examples like that? Also, it is great that somebody commented it properly, otherwise it would just be sitting there forever

Space shortcuts

Page tree

Versions Compared

Old Version 2

New Version Current

Key

Motivation

Compatibility, Deprecation, and Migration Plan

Related previous explorations and feature tests

Learning vs Production vs kitchen sink setup

Learning config

Production config

Kitchen sink config

Lessons learned

From DIH Cleanup (
Jira
server ASF JIRA
serverId 5aa69414-a9e9-3523-82ec-879b028fb15b
key SOLR-14783
)

Space shortcuts

Page tree

Page History

Versions Compared

Old Version 2

New Version Current

Key

Motivation

Compatibility, Deprecation, and Migration Plan

Related previous explorations and feature tests

Learning vs Production vs kitchen sink setup

Learning config

Production config

Kitchen sink config

Lessons learned

From DIH Cleanup ( JiraserverASF JIRAserverId5aa69414-a9e9-3523-82ec-879b028fb15bkeySOLR-14783)

From DIH Cleanup (
Jira
server ASF JIRA
serverId 5aa69414-a9e9-3523-82ec-879b028fb15b
key SOLR-14783
)