High Availability Archiva

General considerations running multiple instances (cluster)

What options/strategies do we have for clustering Archiva instances?

Option 1: master-slave-strategy (not really a cluster)

  • one primary server, one secondary server
  • the fs of the primary server is replicated/synchronized to the secondary server every n minutes (e. g. using rsync)
  • in case the primary server goes down, developers need to modify their settings.xml to direct to the secondary server

advantage: easy to set up
drawback: no transpararent fail-over, no load-balancing

Option 2: two Archiva-Servers + httpd-balancer + shared fs

Can it work with multiple instances using the same filesystem?

Yes, but keep the index files (and probably the database) separate.
Notice that if you have separate file-system, you must ensure to have them synchronized.

What things can be shared, and what must remain separate?
Repository filesystem can be shared
Lucene indexes should not be shared (too much file locking to make it work)

How can upgrades be handled to minimize downtime?
no general solution. it depends on 2 factors:

  • release changes
  • customization

Example architectures

Example configuration

Two servers running Archiva with a virtual IP device in front. Shared filesystem. Everyone uses
a single url that goes to the VIP, which monitors port 8080 on the servers to see which one is available. Right now it's done as
failover to a hot standby, but we eventually want it load balanced, and will add more repository servers as necessary. We've also
switched to MySQL for the user database, to take advantage of its replication and admin features.

Example architecture: two Archiva-Servers + httpd-balancer

  • two Archiva-servers (tomcat) running with identical config
  • both servers connect to the same mysql-database; the mysql-database may itself be clustered
  • both servers share the fs for the managed repos; this fs is RAID5-based and therefore higly resistant to crash
  • important is to ensure that only one of those instances do write-operations (to the fs and db)
  • single httpd-server configured to do read-operations for both-servers (load-balancing and fail-over), but write-operations only to one defined server

Benefits:

  • distributed HA-architecture (2 Archiva-servers, mysql-cluster, RAID5)
  • both Archiva-instances are always in the same state - no replication necessary
  • load-balancing allows for maximum performance
  • no concurrency issues because only one server does write operations at a time

What are write-operations
usually triggered by deploy-goals
the release-plugin might also do some write operation

Brief Explanation of the Repository Index

Please see http://www.nabble.com/Location-of-index-files-in-1.0--tf3901605s177.html#a11064078

  • No labels