Conversation
WalkthroughAdded defensive error handling and logging to getHDFSDefaultBlockSizeMB: the function now imports Logger/LoggerFactory and NonFatal, wraps resolution and block-size retrieval in a try/catch, logs NonFatal exceptions at debug level, and returns None on error or non-positive sizes; returns Some(blockSizeInMB) when positive. Changes
Sequence DiagramsequenceDiagram
participant Caller
participant HDFSUtils
participant HDFS as HDFS API
Caller->>HDFSUtils: getHDFSDefaultBlockSizeMB(path)
activate HDFSUtils
rect rgb(230,245,255)
Note over HDFSUtils: try — resolve path & fetch default block size
HDFSUtils->>HDFS: resolve(path) / getDefaultBlockSize
end
alt Success
HDFS-->>HDFSUtils: blockSizeBytes
HDFSUtils->>HDFSUtils: blockSizeInMB = bytes / (1024*1024)
alt blockSizeInMB > 0
HDFSUtils-->>Caller: Some(blockSizeInMB)
else
HDFSUtils-->>Caller: None
end
else NonFatal Exception
rect rgb(255,235,235)
Note over HDFSUtils: catch NonFatal → log debug, return None
HDFS--x HDFSUtils: Exception
HDFSUtils-->>Caller: None
end
end
deactivate HDFSUtils
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing touches🧪 Generate unit tests (beta)
Comment |
JaCoCo code coverage report - 'cobol-parser'
|
JaCoCo code coverage report - 'spark-cobol'
|
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (2)
spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/utils/HDFSUtils.scala (2)
75-91: Consider logging the caught exception for debugging.While the try/catch block correctly returns None on errors (as per the PR objective), silently swallowing exceptions makes debugging difficult. If there's a legitimate HDFS access issue, operators have no visibility into what went wrong.
Consider adding a log statement before returning None:
} catch { case NonFatal(ex) => + // Log at debug or warn level depending on your logging framework + // Example: logger.debug(s"Failed to get HDFS default block size: ${ex.getMessage}", ex) None }
79-80: Minor: Variable naming and formatting.Two small improvements:
- Line 79: Variable name
blockSizeInBMshould beblockSizeInMBfor consistency- Line 80: Missing spaces around
>operator- val blockSizeInBM = (blockSizeInBytes / bytesInMegabyte).toInt - if (blockSizeInBM>0) { - Some (blockSizeInBM) + val blockSizeInMB = (blockSizeInBytes / bytesInMegabyte).toInt + if (blockSizeInMB > 0) { + Some(blockSizeInMB)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/utils/HDFSUtils.scala(2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Spark 3.4.4 on Scala 2.12.20
- GitHub Check: Spark 2.4.8 on Scala 2.11.12
- GitHub Check: Spark 3.5.7 on Scala 2.12.20
- GitHub Check: Spark 3.5.7 on Scala 2.13.17
- GitHub Check: test (2.12.20, 2.12, 3.3.4, 0, 80, 20)
🔇 Additional comments (1)
spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/utils/HDFSUtils.scala (1)
25-25: LGTM!The NonFatal import is appropriate for catching operational exceptions while allowing fatal errors to propagate.
f12b239 to
03037ba
Compare
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/utils/HDFSUtils.scala (1)
77-94: Good defensive error handling; optional log message enhancement.The try-catch block properly addresses the PR objective by gracefully handling exceptions from unsupported filesystems like OBS/OSS. The use of NonFatal and debug-level logging is appropriate.
Verification confirms that the single caller in
SparkUtils.scala(line 500) properly handles theNonereturn value with a pattern match, logging an informational message when the block size cannot be obtained.Optional enhancement: consider including the path in the log message for easier troubleshooting:
- log.debug(s"Failed to get HDFS default block size.", ex) + log.debug(s"Failed to get HDFS default block size for path: ${path.getOrElse("/")}.", ex)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/utils/HDFSUtils.scala(2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Spark 3.4.4 on Scala 2.12.20
- GitHub Check: Spark 3.5.7 on Scala 2.12.20
- GitHub Check: Spark 3.5.7 on Scala 2.13.17
- GitHub Check: Spark 2.4.8 on Scala 2.11.12
- GitHub Check: test (2.12.20, 2.12, 3.3.4, 0, 80, 20)
🔇 Additional comments (1)
spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/utils/HDFSUtils.scala (1)
22-22: LGTM: Standard logging and error handling setup.The SLF4J logging and NonFatal exception handling imports are idiomatic Scala best practices. The private logger instance is properly scoped.
Also applies to: 26-26, 32-32
03037ba to
6ce6fcc
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/utils/HDFSUtils.scala(2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Spark 3.5.7 on Scala 2.13.17
- GitHub Check: Spark 3.4.4 on Scala 2.12.20
- GitHub Check: Spark 3.5.7 on Scala 2.12.20
- GitHub Check: Spark 2.4.8 on Scala 2.11.12
- GitHub Check: test (2.12.20, 2.12, 3.3.4, 0, 80, 20)
🔇 Additional comments (3)
spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/utils/HDFSUtils.scala (3)
22-22: LGTM! Logging and error handling imports are appropriate.The SLF4J logging facade and NonFatal exception handling are standard best practices for Scala projects.
Also applies to: 26-26
32-32: LGTM! Logger initialization follows best practices.The private logger instance is correctly initialized using the standard SLF4J pattern.
78-89: Excellent defensive error handling that addresses issue #796.The try-catch with NonFatal correctly handles exceptions from filesystem implementations (like OBS/OSS) that may not support
getDefaultBlockSizeon certain paths. The nested defensive checks for positive values are appropriate, and returningNoneon error aligns with the function's documented behavior.
| None | ||
| } catch { | ||
| case NonFatal(ex) => | ||
| log.debug(s"Failed to get HDFS default block size for path: $hdfsPath..", ex) |
There was a problem hiding this comment.
Fix grammatical issue in log message.
The double period after $hdfsPath.. appears to be a typo.
Apply this diff to fix the grammar:
- log.debug(s"Failed to get HDFS default block size for path: $hdfsPath..", ex)
+ log.debug(s"Failed to get HDFS default block size for path: $hdfsPath", ex)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| log.debug(s"Failed to get HDFS default block size for path: $hdfsPath..", ex) | |
| log.debug(s"Failed to get HDFS default block size for path: $hdfsPath", ex) |
🤖 Prompt for AI Agents
In
spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/utils/HDFSUtils.scala
around line 92, the log message mistakenly contains a double period after the
hdfsPath string; update the log.debug call to remove the extra period so the
message reads with a single period (or otherwise proper punctuation) after the
path, preserving the exception parameter.
Closes #796
Summary by CodeRabbit