Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migration of unmigrated content due to installation of a new plugin

...

  • Distribute/Partition/Replicate the NN functionality across multiple computers
    • Read-only replicas of the name node
      • What is the ratio of Rs to Ws - get data from Simon
      • Note: RO replicas can be useful for the HA solution and for checkpoint rolling
    • Partition by function (also scales namespace and addressible storage space)
      • E.g. move block management and processing to slave NN.
      • E.g. move Replica management to slave NN
    • Partition by name space - ie different parts of the name space are handled by different NN (see below)
      • this helps in scaling the performance of NN and also the Name space scaling
  • RPC and Timeout issues
    • When load spikes occur, the clients timeout and the spiral of death occurs
    • See Hadoop Protocol RPC
  • Higher concurrency in Namespace access (more sophisticated Namespace locking)
    • This is probably an issue only on NN restart, not during normal operation
    • Improving concurrency is hard since it will require redesign and testing
      • Better to do this when NN is being redesigned for other reasons.
  • Journaling and Sync
    • *Benefits*: improves latency, client utilization, less timeouts, greater throughput
    • Improve Remote syncs
      • Approach 1 - NVRM NFS file system - investigate this
      • Approach 2 - If flush on NFS pushes the data to the NFS server, this may be good eough if there is a local sync - investigate
    • Lazy syncs - need to investigate the benefit and cost (latency)
      • Delay the reply by a few milliseconds to take allow for more bunching of syncs
      • This increases the latency
    • NVRAM for journal
    • Wiki Markup
      Async sysncs \[No!!!\]

      • reply as soon as memory is updated
      • This changes semantics
        • If it is good enough for Unix then isn't it good enough for HDFS?
          • For a single machine, its failure implies failure of client and fs *together*
          • In a distributed file system, there is partial failure; further more one expects HA'ed NN to not loose data
  • Move more functionality to data node
    • Distributed replica creation - not simple
  • Improve Block report processing HADOOP-2448
    2K nodes mean a block report every 3 sec.
    • Currently: Each DN sends Full BR are sent as array of longs every hour. Initial BR has random backoff (configurable)
    • Incremental and Event based B-reports - HADOOP-1079
      • E.g when disk is lost. or blocks are deleted, etc
      • DN can determine what if anything has changed and send only of there are changes
    • Send only checksums
      • NN recalculates the checksum, OR has rolling checksum
    • Make intial block report's random backoff to be dynamicaly set via NN when DNs register. - HADOOP-2444



...

  • Statically Partition the namespace hierarchically and mount the volumes
    • In this scheme, there are multiple namespace volumes in a cluster.
    • All the name space volumes share the physical block storage (i.e. One storage pool)
    • Optionally All namespaces (ie volumes) are mounted at top level using an automounter like approach
    • A namepace can be explicitly mounted on to a node in another namename (a la mount in Posix)
      • Wiki Markup
        Note the Cepf file system \[ref\] partitions automatically and mounts the partition
        \\

  • A truly distributed name service that partitions the namespace dynamically.
  • Only keep part of the namespace in memory.
    • This like a tradional file system where the entire namepsace is stored in secondary and page-in as needed.
  • Reduce accidental space growth - name space quotas

...