Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Repository Scanning in Archiva

Scanning of a repository is done periodically to ascertain what has changed in the repository.

  1. On the first scan, the entire repository is scanned.
  2. On subsequent scans, only content that is new or changed since the last scan is picked up.
  3. The scan is required to pick up content that arrives into the repository via non-monitored means.
    • Content that arrives via a WebDAV PUT is automatically processed.
    • Content that arrives via a Proxy Request is automatically processed.

The Scan Lifecycle.

  1. All content falls into 3 categories CONSUMED, IGNORED, UNKNOWN.
    • CONSUMED content is content that is managed by Archiva.
    • IGNORED content consists of generated content or transient content.
    • UKNOWN content is what falls throught the cracks in the above 2 categories. Typically, this means the content doesn't conform to the repository structure, or is generally unknown.

...

  1. Perform a SCAN with an inclusion filter of "*/" and an exclusion filter containing those elements predetermined to be IGNORED.
  2. On identification of a file, attempt to resolve it to an Artifact object.
    1. If a valid Artifact object is created, flag as CONSUMED, store it in the Database.
    2. If not able to convert to an Artifact object, flag as UNKNOWN, create report entry in ARTIFACT_HEALTH database table.

CONSUMED Files

Include Pattern

Type

Consumed By

No Format
nopaneltrue
**/*.pom

MavenProject

Convert to Project Model.
Save Model to Database.
Auto Convert embedded <repositories>
Auto Convert embedded <pluginRepositories>
Lucene XML contents.
Lucene Effective POM contents.

No Format
nopaneltrue
**/*.jar

Artifact (jar)

Convert to Artifact Model.
Generate Missing Hashcodes.
Compute JDK Revision.
Determine Sealed.
Save Model to Database.
Lucene Archive TOC.
Lucene Classnames.
Lucene Public Methods.

No Format
nopaneltrue
**/*.ear

Artifact (ear)

(same as jar)

No Format
nopaneltrue
**/*.war

Artifact (war)

(same as jar)

No Format
nopaneltrue
**/*.car

Artifact (car)

(same as jar)

No Format
nopaneltrue
**/*.sar

Artifact (sar)

(same as jar)

No Format
nopaneltrue
**/*.mar

Artifact (mar)

(same as jar)

No Format
nopaneltrue
**/*.rar

Artifact (rar)

(same as jar)

No Format
nopaneltrue
**/*.dtd

Artifact (dtd)

Convert to Artifact Model.
Generate Missing Hashcodes.
Save Model to Database.
Lucene DTD contents.

No Format
nopaneltrue
**/*.tld

Artifact (dtd)

Convert to Artifact Model.
Generate Missing Hashcodes.
Save Model to Database.
Lucene TLD contents.

No Format
nopaneltrue
**/*.tar.gz

Artifact (distribution)

Convert to Artifact Model.
Generate Missing Hashcodes.
Save Model to Database.
Lucene Archiva TOC.

No Format
nopaneltrue
**/*.tar.bz2

Artifact (distribution)

(same as *.tar.gz)

No Format
nopaneltrue
**/*.zip

Artifact (distribution)

(same as *.tar.gz)

No Format
nopaneltrue
**/*.sha1

Hashcode

Report on Saved Hashcode to Actual Hashcode.

No Format
nopaneltrue
**/*.md5

Hashcode

Report on Saved Hashcode to Actual Hashcode.

No Format
nopaneltrue
**/*.asc

Signature

Report on signature validation.

No Format
nopaneltrue
**/maven-metadata.xml

Repository Metadata

Convert to Repository Model
Cross Validate listed versions to available versions in  repository.
Save Model to Database.
Lucene XML contents.

No Format
nopaneltrue
**/*\-site.xml

Site Metadata

Lucene file contents.

No Format
nopaneltrue
**/*.xml

Xml Content

Lucene file contents.

No Format
nopaneltrue
**/*.html

Html Content

Lucene file contents.

No Format
nopaneltrue
**/*.block

Auto-Xml/Text Content

Lucene file contents.

No Format
nopaneltrue
**/*.config

Auto-Xml/Text Content

Lucene file contents.

No Format
nopaneltrue
**/*.xsd

Xml Content

Lucene file contents.

No Format
nopaneltrue
**/*.txt

Text Content

Lucene file contents.

No Format
nopaneltrue
**/*.TXT

Text Content

Lucene file contents.

No Format
nopaneltrue
**/*.bar

Binary Content

- no direct consumption -

No Format
nopaneltrue
**/*.nbm

Binary Content

- no direct consumption -

IGNORED Content

Content in this category is never indexed, nor reported as bad or unknown. It exists on disk solely for the benefit of the client using Archiva.

Pattern

Reason

No Format
nopaneltrue
**/.htaccess

Web server specific content control mechanism.

No Format
nopaneltrue
**/KEYS

GPG Signatures File.  Not used by Archiva directly.

No Format
nopaneltrue
**/*.rb

Ruby script file.

No Format
nopaneltrue
**/*.sh

Shell screipt file.

No Format
nopaneltrue
**/.svn/**/*

Subversion Control Directory.

No Format
nopaneltrue
**/.DAV/**/*

DAV Server Control Directory.

UNKNOWN / BAD Content

Content that does not fit into the above categories are automatically placed into this category.
However, some UNKNWON / BAD Content is well understood, and can have a 'Quick Fix' associated with it.

...