This page describes FileNotFoundExceptions that may occur after a successful open().

Example Stack Trace

 

java.io.FileNotFoundException: Reopen at position 0 on s3a://bucket-name/test/some-file: com.amazonaws.services.s3.model.AmazonS3Exception: The specified key does not exist. (Service: Amazon S3; Status Code: 404; Error Code: NoSuchKey; Request ID: 58552EC03A3499D7), S3 Extended Request ID: thYjg0cDPGceq5M3n5T2nLmRDfFnoAeyiVMx8rOvYv/IHDPZiBnL5oAOPjdw44rQgzngDk4wELY=
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1588)
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1258)
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1030)
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:742)
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:716)
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:699)
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:667)
        at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:649)
        at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:513)
        at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4221)
        at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4168)
        at com.amazonaws.services.s3.AmazonS3Client.getObject(AmazonS3Client.java:1378)
        at org.apache.hadoop.fs.s3a.S3AInputStream.reopen(S3AInputStream.java:160)
        at org.apache.hadoop.fs.s3a.S3AInputStream.onReadFailure(S3AInputStream.java:350)
        at org.apache.hadoop.fs.s3a.S3AInputStream.read(S3AInputStream.java:323)
        at java.io.FilterInputStream.read(FilterInputStream.java:83)

 

Causes of This Error

  1. The file was deleted after the client successfully opened it.  Upon trying to read the data, it is discovered the file is no longer present.
  2. If you are running with S3Guard enabled, there are two possible causes:
    1. Eventual consistency:  Although the open() was successful because the file's metadata was found in S3Guard's MetadataStore, the file is still not available in S3 by the time the client tried to read its data.  This is expected to be rare, as GET is generally consistent on S3.  There may be a case where a previous GET of the same path before it existed (negative GET) is cached by S3 infrastructure and it improperly treats the file as still missing.  Unfortunately, S3 documentation on this behavior is hard to find.
    2. S3Guard MetadataStore is out of sync with S3.   This can happen if there are clients modifying the same bucket that do not have S3Guard enabled.  This can cause the S3Guard MetadataStore to get out of sync with the S3 metadata.  This condition can also occur if the client (S3A) crashes between updating S3 and updating the S3Guard MetadataStore.  This case can be resolved by clearing the MetadataStore (e.g. drop the DynamoDB table) and then re-running your job.

 

 

  • No labels