Introduction

This proposal is intended for enabling users to visualize MXNet data using the TensorFlow's TensorBoard. We plan to develop a logging tool bundled in MXNet python package for users to log data in the format that the TensorBoard can later render in browsers. The typical flow of using the logging tool is explained in the following figure. Users would need to install MXNet and TensorFlow's TensorBoard to visualize the data. The project will be divided into two phases:

1. Synchronized logger. This is straightforward implementation in Python. The downside is that logging NDArrays is blocking in the main Python thread as it internally call asnumpy() to convert NDArrays to numpy.ndarrays for logging.

2. Asynchronized logger. This implementation requires much more engineering work in C++ and still have many unresolved difficulties to be discussed.

We will focus our efforts in the first phase and explore further the possibility of impelmemnting an asynchronized logger.

Synchronized Logger

This work will be based upon the contributions from the following three GitHub repositories to whose authors we should give credits.

TeamHG-Memex/tensorboard_logger. The author of this repo implemented the encrypting algorithm for logging data in event files loaded by the TensorBoard for rendering. This is the key that enable us to develop a logger independent of TensorFlow.
dmlc/tensorboard. Zihao Zheng is the primary author of this repo and also a DMLC member. The idea of making a simple logging tool comes from our multiple discussions with him. He carved out from TensorFlow necessary protobuf definitions and designed low level logging interfaces for building a standalone logging and rendering tool.
lanpa/tensorboard-pytorch. The author of this repo adopted the idea from dmlc/tensorboard and implemented a standalone logging tool for Pytorch users. Our synchronized logger will be implemented based upon the basic design of this tool to support MXNet data types.

High Level Design

We plan to support most of data types in TensorBoard: audio, embedding, histogram, image, scalar, text, and graph, where the interface of logging graph is TBD since it depends on the implementation of converting between MXNet symbols and onnx format is done. The user level APIs is defined in the following figure. The naming follows the convention in TensorFlow.

summary: A placeholder of any NDArray, scalar, symbols that are loggable in MXNet including their metadata
event: A placeholder of objects to be written to an event file. It may contain summary, LogMessage, SessionLog, etc. In our use case, even though we only care about summary data types and the last two are TensorFlow related, we keep the naming aligned with the TensorFlow.

The way it works are as the following:

The user define a SummaryWriter object instance by providing the constructor with a path representing the location where data is going to be logged. For example: sw = SummaryWriter(logdir='./logs').
The user call the corresponding API to push the data to be logged into the event queue. For example, sw.add_histogram(tag='my_hist', values=grad, bins=100). Once the loggable is pushed into the event queue, the function returns and the python main thread continues to run the rest of the code.
In parallel, a logging thread is constantly checking whether the event queue is empty or not. If it's not empty, it pops the item from the queue and starts writing it to the event file; if empty, it blocks until there are new items pushed into the queue.

Page tree

Logging MXNet Data for Visualization in TensorBoard

Introduction

Synchronized Logger

High Level Design