Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migrated to Confluence 5.3

...

There are currently three stats system on "trunk" in Apache Traffic Server. I'd like to eliminate one of them, and deprecate another, leaving a single stats system used by all components. As it turns out, this "migration" had already started a long time ago, and I'd say 95% of the code is already on the standard stats and config system; librecords. I'm not arguing that librecords doesn't need some serious rework and improvements, but a step in the right direction is to convert all stats system into one, using librecords only. This document is a proposal for this cleanup.

The problem:

...

plugin stats ("InkAPI")

The only main problematic stats area is InkAPIin the plugin APIs, which currently has one old (and very rather poor) stats system, as well as a new attempt at a rewrite. The first should simply be deprecated for the v2.2 release, which is an easy task (see bug TS-266). The new rewrite of a stats system for InkAPI (lovingly referred to as "V2") is very scalable (10's of thousands of stats can be handled), but suffers from a few problems currently (these could obviously be fixed if time is spent on it):

  • It does not integrate with the existing librecords stats at all. This makes XML synthesized stats less than functional for example, i.e. you can't combine stats from the two systems using the already existing features of Apache TS.
  • It runs its own "control port" (TCP), and does not use the CLI as provided by librecords. This makes for a poor management experience (basically, two systems to manage the stats and configs).
  • It does not support persistent stats, so every restart will reset these V2 stats. This might not seem like a serious problem, but many counters really benefit from continuously increasing, and it can make stats collection much easier (i.e. you don't have to deal with the stats going from a large value to zero). It's also the way many of our current stats metrics work, where you can specify if a specific statistics should be persistent or not, preserving this functionality going forward is important IMO.
  • It does not support any "synchronization" handler other than the "SUM". librecords allows for specifying different synchronization handlers, where when the stats thread collects stats from the various threads, it can apply different logic on how the aggregation should be handled, e.g. SUM, AVG etc.' This is again a feature the ATS core uses heavily, and that we should continue to support.
  • V2 does not support the same "data types" for stats that librecord does. I'm a little torn on this issue, I think it might be acceptable to limit to only "integer" stats, if people have strong opinions on this, I'll certainly not oppose removing this feature from the new InkAPI proposal now, and from ATS core later.

I believe that creating new APIs for InkAPI that exposes the librecords APIs is the way to go, and it allows us to focus on one area further down the road to improve the stats and configuration system (which at least ought to be separated I think, if we're going to move along with the "virtual host" style configs). Making a new set of stable APIs is important (so we don't have to keep deprecating APIs constantly), if the underlying implementation changes the APIs should hopefully not.

My proposal will keep the "V2" code as is, but disabled by default. It can be enabled via a configure option for anyone already using it, and it also allows us to reuse the code and ideas that "V2" implements in a future rewrite of the librecords stats system.

Feature comparison

The following table summarizes the features (or lack thereof) for the various InkAPI "stats" features. Please see the section below on how librecords can be improved to implement some of the features the "V2" implementation adds. I firmly believe that adding these features to librecords will improve the ATS stats system in general, and also make the InkAPI stats APIs based on librecords very flexible. 

Image Added

Proposal 1: Deprecate old InkAPI stats

...

Going forward, if the "V2" implementation matures, and can replace librecords (both internal and for APIs), it would be good to rename / replace the APIs above with the API names from Proposal #3 (below). This avoids having to migrate to yet another stats API, so while reviewing this, make sure the proposed APIs below will suffice.

Proposal

...

3: Make new librecords based stats APIs

Finally, I'd like to propose adding a new set of APIs to ts/ts.h, exposing the librecords APIs to plugin writers. My proposed APIs are as follows:

Code Block
  typedef enum
    {
      TS_STAT_TYPE_NULLERROR = -1,
      TS_SUCCESS = 0,
    } TSReturnCode;

  typedef enum
    {
      TS_STAT_TYPE_INT = 1,
      // The following are currently not supported.
      TS_STAT_TYPE_FLOAT,
      TS_STAT_TYPE_STRING,
      TS_STAT_TYPE_COUNTER,
    } TSStatDataType;

  typedef enum
    {
      TS_STAT_PERSISTENT_NULL = 01,
      TS_STAT_PERSISTENT,
      TS_STAT_NON_PERSISTENT
    } TSStatPersistence;

  typedef enum
    {
      TS_STAT_SYNC_SUM = 0,
      TS_STAT_SYNC_COUNT,
      TS_STAT_SYNC_AVG,
      TS_STAT_SYNC_TIMEAVG,
    } TSStatSync;

  inkapi int TSRegisterStatTSStatCreate(const char *the_name, TSStatDataType the_type, TSStatPersistence persist, TSStatSync sync);

  inkapi INKReturnCodeTSReturnCode TSStatIntIncrement(int the_stat, INK64 amount);
  inkapi INKReturnCodeTSReturnCode TSStatIntDecrement(int the_stat, INK64 amount);
  // Currently not supported.
  // inkapi INKReturnCodeTSReturnCode TSStatFloatIncrement(int the_stat, float amount);
  // inkapi INKReturnCodeTSReturnCode TSStatFloatDecrement(int the_stat, float amount);

  inkapi INKReturnCodeTSReturnCode TSStatIntGet(int the_stat, INK64* value);
  inkapi INKReturnCodeTSReturnCode TSStatIntSet(int the_stat, INK64 value);
  // Currently not supported
  // inkapi INKReturnCodeTSReturnCode TSStatFloatGet(int the_stat, float* value);
  // inkapi INKReturnCodeTSReturnCode TSStatFloatSet(int the_stat, float value);

  inkapi int TSStatFindName(const char* name);

I'm using the "TS" prefix here for the APIs, to distinguish them from the old APIs. This also avoids a "rename" later on, but I'd be open to changing this to INK* if that is preferable. In addition, my proposal adds a new records.config setting:

...

Both the API proposed above and the V2 APIs beat the old (hopefully soon deprecated) APIs in both scalability and performance. The old APIs are limited to 253 stats as well.

Problems with librecords

I have an implementation for the APIs above that does work. There are four issues with my patch, that would need to be addressed long term (but I think it's OK for a v2.2 release). Most of these are very easy (a few lines of code).

  1. The librecords stats system has an internal (static) limitation of 3000 stats. This is easy to bump up to say 108,000 (which my patch does), but it does waste a little memory. There are some multi-dimensional arrays that allocates unnecessary "slots" I think, which we could optimize. These static arrays needs to be modified to be dynamically allocated (and preferably "optimized" to waste less memory), based on the proxy.config.stat_api.max_stats_allowed setting above. This is merely an issue of changing a static allocation (stats[8000]) to dynamic allocation at startup (malloc(8000 * sizeof(stats-bucket)).
  2. My patch disables the new stats for per hook / plugin counters (they are tied to enabling the V2 stats system). My suggestion would be to rewrite these areas to use the new stats APIs above, once it has been landed. Or even better, use the internal librecords APIs instead, at least where possible (we generally shouldn't use the public InkAPI interfaces in "core" code).
  3. I've noticed a noticeable (~30s) time to create 50,000 stats using the new APIs. it's fine for 10,000 stats (which takes less than 0.5 seconds), so it gets progressively worse as more stats are added. This should be examined for sure.
  4. There are no librecords APIs for "searching" for a stat by name. We should add that, and expose it via InkAPI as well. The CLI also needs to support listing all stats, preferably with substrings and regexes filtering.