Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Code Block
terminate called after throwing an instance of 'dmlc::Error'
  what():  [02:08:14] ../src/io/iter_csv.cc:125: Check failed: row.length == shape.Size() (4 vs. 40) The data size in CSV do not match size of shape: specified shape=[4,10], the csv row-length=4
Stack trace returned 10 entries:
[bt] (0) /home/ubuntu/sparse_support/mxnet/python/mxnet/../../build/libmxnet.so(_ZN4dmlc10StackTraceB5cxx11Ev+0x54) [0x7febfb693f5b]
[bt] (1) /home/ubuntu/sparse_support/mxnet/python/mxnet/../../build/libmxnet.so(_ZN4dmlc15LogMessageFatalD1Ev+0x2a) [0x7febfb694242]
[bt] (2) /home/ubuntu/sparse_support/mxnet/python/mxnet/../../build/libmxnet.so(_ZN5mxnet2io7CSVIter7AsTBlobERKN4dmlc3RowIjEERKN4nnvm6TShapeE+0x14a) [0x7febfe0d9832]
[bt] (3) /home/ubuntu/sparse_support/mxnet/python/mxnet/../../build/libmxnet.so(_ZN5mxnet2io7CSVIter4NextEv+0x25e) [0x7febfe0d9312]
[bt] (4) /home/ubuntu/sparse_support/mxnet/python/mxnet/../../build/libmxnet.so(_ZN5mxnet2io11BatchLoader4NextEv+0xa1) [0x7febfe0653f3]
[bt] (5) /home/ubuntu/sparse_support/mxnet/python/mxnet/../../build/libmxnet.so(_ZZN5mxnet2io14PrefetcherIter4InitERKSt6vectorISt4pairINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEES9_ESaISA_EEENKUlPPNS_9DataBatchEE_clESH_+0x50) [0x7febfe04bf98]
[bt] (6) /home/ubuntu/sparse_support/mxnet/python/mxnet/../../build/libmxnet.so(_ZNSt17_Function_handlerIFbPPN5mxnet9DataBatchEEZNS0_2io14PrefetcherIter4InitERKSt6vectorISt4pairINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESE_ESaISF_EEEUlS3_E_E9_M_invokeERKSt9_Any_dataOS3_+0x37) [0x7febfe053473]
[bt] (7) /home/ubuntu/sparse_support/mxnet/python/mxnet/../../build/libmxnet.so(_ZNKSt8functionIFbPPN5mxnet9DataBatchEEEclES3_+0x49) [0x7febfe053797]
[bt] (8) /home/ubuntu/sparse_support/mxnet/python/mxnet/../../build/libmxnet.so(_ZZN4dmlc12ThreadedIterIN5mxnet9DataBatchEE4InitESt8functionIFbPPS2_EES4_IFvvEEENKUlvE_clEv+0x311) [0x7febfe0512eb]
[bt] (9) /home/ubuntu/sparse_support/mxnet/python/mxnet/../../build/libmxnet.so(_ZNSt12_Bind_simpleIFZN4dmlc12ThreadedIterIN5mxnet9DataBatchEE4InitESt8functionIFbPPS3_EES5_IFvvEEEUlvE_vEE9_M_invokeIJEEEvSt12_Index_tupleIJXspT_EEE+0x28) [0x7febfe05a188]

 

Why is this a problem ?

 

The above makes the usability experience really bad in a non-terminal environment like jupyter notebook, docker container.

Plus it doesn't allow users to handle exceptions and exit gracefully, retry or perform some other action..

We have had multiple customer requests to fix this:

Look at the community requests here: https://github.com/apache/incubator-mxnet/issues/7335

Proper exception handling and propagation in MXNet is important for two types of use-case.  The first is for MXNet users who are using one of our APIs to build or test a model, and the second is MXNet service owners who are using MXNet in production for DL enabled services.

From the perspective of an MXNet user (especially a casual, or new user), the above makes for a poor user experience.  This poor user experience is worse in a non-terminal environment like jupyter notebook, docker container.  Even if the user understands the error, they're unable to respond in a high-level language like python because MXNet currently doesn't allow users to handle exceptions and exit gracefully (or to retry, or perform some other action).

From a service owner's perspective, when we don't properly propagate errors through our language bindings it becomes extremely difficult to debug and support our service.  As an example, crashing instead of propagating errors obviously has a negative affect on highly-available services.  Many services would page on-call staff when this occurs.  Additionally, although MXNet has a largely asynchronous API, it currently does not allow for services to handle exceptions on a per-request basis.  If we have one mis-shaped request in our queue to be processed, and 50 other in-flight requests, the single mis-shaped request will crash the entire process.  This currently requires services to implement logic to monitor and retry the other in-flight requests (which negatively affects latencies, and may in turn violate SLAs).

We have had multiple customer requests to fix this:

Look at the community requests here:https://github.com/apache/incubator-mxnet/issues/7335Kellen Sunderland: "There's a reasonable expectation that MXNet will be / is being used as a core library for a wide variety of machine learning services.  When we don't properly propagate errors through our language bindings it becomes extremely difficult to debug these services.  Crashing instead of propagating errors is the worst thing we could possibly do.  Imagine trying to support an SLA on a service where the core library is crashing every time it encounters a non-fatal error.  To me the usability for researchers and jupyter notebook issues should be noted, but crashing a production service for a non-fatal error should be the primary reason to develop a comprehensive fix."

Exception Handling for Iterators

...