THIS IS A TEST INSTANCE. ALL YOUR CHANGES WILL BE LOST!!!!
...
- Distribute/Partition/Replicate the NN functionality across multiple computers
- Read-only replicas of the name node
- What is the ratio of Rs to Ws - get data from Simon
- Note: RO replicas can be useful for the HA solution and for checkpoint rolling
- Partition by function (also scales namespace and addressible storage space)
- E.g. move block management and processing to slave NN.
- E.g. move Replica management to slave NN
- Partition by name space - ie different parts of the name space are handled by different NN (see below)
- this helps in scaling the performance of NN and also the Name space scaling
- this helps in scaling the performance of NN and also the Name space scaling
- Read-only replicas of the name node
- RPC and Timeout issues
- When load spikes occur, the clients timeout and the spiral of death occurs
- See Hadoop Protocol RPC
- Higher concurrency in Namespace access (more sophisticated Namespace locking)
- This is probably an issue only on NN restart, not during normal operation
- Improving concurrency is hard since it will require redesign and testing
- Better to do this when NN is being redesigned for other reasons.
- Better to do this when NN is being redesigned for other reasons.
- Journaling and Sync
- *Benefits*: improves latency, client utilization, less timeouts, greater throughput
- Improve Remote syncs
- Approach 1 - NVRM NFS file system - investigate this
- Approach 2 - If flush on NFS pushes the data to the NFS server, this may be good eough if there is a local sync - investigate
- Lazy syncs - need to investigate the benefit and cost (latency)
- Delay the reply by a few milliseconds to take allow for more bunching of syncs
- This increases the latency
- NVRAM for journal
Wiki Markup Async sysncs \[No!!!\]
- reply as soon as memory is updated
- This changes semantics
- If it is good enough for Unix then isn't it good enough for HDFS?
- For a single machine, its failure implies failure of client and fs *together*
- In a distributed file system, there is partial failure; further more one expects HA'ed NN to not loose data
- If it is good enough for Unix then isn't it good enough for HDFS?
- Move more functionality to data node
- Distributed replica creation - not simple
- Distributed replica creation - not simple
- Improve Block report processing HADOOP-2448
2K nodes mean a block report every 3 sec.- Currently: Each DN sends Full BR are sent as array of longs every hour. Initial BR has random backoff (configurable)
- Incremental and Event based B-reports - HADOOP-1079
- E.g when disk is lost. or blocks are deleted, etc
- DN can determine what if anything has changed and send only of there are changes
- Send only checksums
- NN recalculates the checksum, OR has rolling checksum
- Make intial block report's random backoff to be dynamicaly set via NN when DNs register. - HADOOP-2444
...
- Statically Partition the namespace hierarchically and mount the volumes
- In this scheme, there are multiple namespace volumes in a cluster.
- All the name space volumes share the physical block storage (i.e. One storage pool)
- Optionally All namespaces (ie volumes) are mounted at top level using an automounter like approach
- A namepace can be explicitly mounted on to a node in another namename (a la mount in Posix)
Wiki Markup Note the Cepf file system \[ref\] partitions automatically and mounts the partition \\
- A truly distributed name service that partitions the namespace dynamically.
- Only keep part of the namespace in memory.
- This like a tradional file system where the entire namepsace is stored in secondary and page-in as needed.
- This like a tradional file system where the entire namepsace is stored in secondary and page-in as needed.
- Reduce accidental space growth - name space quotas
...