Repository Scanning in Archiva
Scanning of a repository is done periodically to ascertain what has changed in the repository.
- On the first scan, the entire repository is scanned.
- On subsequent scans, only content that is new or changed since the last scan is picked up.
- The scan is required to pick up content that arrives into the repository via non-monitored means.
- Content that arrives via a WebDAV PUT is automatically processed.
- Content that arrives via a Proxy Request is automatically processed.
The Scan Lifecycle.
- All content falls into 3 categories CONSUMED, IGNORED, UNKNOWN.
- CONSUMED content is content that is managed by Archiva.
- IGNORED content consists of generated content or transient content.
- UKNOWN content is what falls throught the cracks in the above 2 categories. Typically, this means the content doesn't conform to the repository structure, or is generally unknown.
The lifecycle of a scan is as follows.
- Perform a SCAN with an inclusion filter of
"*/"
and an exclusion filter containing those elements predetermined to be IGNORED. - On identification of a file, attempt to resolve it to an Artifact object.
- If a valid Artifact object is created, flag as CONSUMED, store it in the Database.
- If not able to convert to an Artifact object, flag as UNKNOWN, create report entry in ARTIFACT_HEALTH database table.
CONSUMED Files
Include Pattern | Type | Consumed By | |||||
---|---|---|---|---|---|---|---|
| MavenProject | Convert to Project Model. | |||||
| Artifact (jar) | Convert to Artifact Model. | |||||
| Artifact (ear) | (same as jar) | |||||
| Artifact (war) | (same as jar) | |||||
| Artifact (car) | (same as jar) | |||||
| Artifact (sar) | (same as jar) | |||||
| Artifact (mar) | (same as jar) | |||||
| Artifact (rar) | (same as jar) | |||||
| Artifact (dtd) | Convert to Artifact Model. | |||||
| Artifact (dtd) | Convert to Artifact Model. | |||||
| Artifact (distribution) | Convert to Artifact Model. | |||||
| Artifact (distribution) | (same as *.tar.gz) | |||||
| Artifact (distribution) | (same as *.tar.gz) | |||||
| Hashcode | Report on Saved Hashcode to Actual Hashcode. | |||||
| Hashcode | Report on Saved Hashcode to Actual Hashcode. | |||||
| Signature | Report on signature validation. | |||||
| Repository Metadata | Convert to Repository Model | |||||
| Site Metadata | Lucene file contents. | |||||
| Xml Content | Lucene file contents. | |||||
| Html Content | Lucene file contents. | |||||
| Auto-Xml/Text Content | Lucene file contents. | |||||
| Auto-Xml/Text Content | Lucene file contents. | |||||
| Xml Content | Lucene file contents. | |||||
| Text Content | Lucene file contents. | |||||
| Text Content | Lucene file contents. | |||||
| Binary Content | - no direct consumption - | |||||
| Binary Content | - no direct consumption - |
IGNORED Content
Content in this category is never indexed, nor reported as bad or unknown. It exists on disk solely for the benefit of the client using Archiva.
Pattern | Reason | |||||
---|---|---|---|---|---|---|
| Web server specific content control mechanism. | |||||
| GPG Signatures File. Not used by Archiva directly. | |||||
| Ruby script file. | |||||
| Shell screipt file. | |||||
| Subversion Control Directory. | |||||
| DAV Server Control Directory. |
UNKNOWN / BAD Content
Content that does not fit into the above categories are automatically placed into this category.
However, some UNKNWON / BAD Content is well understood, and can have a 'Quick Fix' associated with it.
Pattern | Type | Quick Fix Option | |||||
---|---|---|---|---|---|---|---|
| Backup File | Remove from repository | |||||
| Backup File | Remove from repository | |||||
| Backup File | Remove from repository | |||||
| Distribution Artifact from M1 | Rename to *.tar.gz | |||||
| Distribution Artifact from M1 | Rename to *.zip | |||||
| Plugin from M1 | Rename to *.jar |