#796 Ignore errors when getting the default block size for the Hadoop filesystem. by yruslan · Pull Request #798 · AbsaOSS/cobrix

yruslan · 2025-11-10T13:47:09Z

Closes #796

Summary by CodeRabbit

Bug Fixes
- Improved error handling when determining HDFS block size to prevent crashes if resolution fails.
- Added logging to help diagnose cases where block size cannot be determined.
- Now returns an explicit "unknown" (no value) when the computed block size is non-positive or when an error occurs, avoiding misleading block size reports.

coderabbitai · 2025-11-10T13:47:20Z

Walkthrough

Added defensive error handling and logging to getHDFSDefaultBlockSizeMB: the function now imports Logger/LoggerFactory and NonFatal, wraps resolution and block-size retrieval in a try/catch, logs NonFatal exceptions at debug level, and returns None on error or non-positive sizes; returns Some(blockSizeInMB) when positive.

Changes

Cohort / File(s)	Summary
HDFS utils update `spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/utils/HDFSUtils.scala`	Imported `org.slf4j.Logger`/`org.slf4j.LoggerFactory` and `scala.util.control.NonFatal`; added a private `log`; wrapped `getHDFSDefaultBlockSizeMB` body in `try`/`catch` to return `None` on `NonFatal` and log a debug message; renamed intermediate to `blockSizeInMB`, compute bytes→MB, return `Some(blockSizeInMB)` only if > 0; explicit `None` for non-positive or error cases.

Sequence Diagram

sequenceDiagram
    participant Caller
    participant HDFSUtils
    participant HDFS as HDFS API

    Caller->>HDFSUtils: getHDFSDefaultBlockSizeMB(path)
    activate HDFSUtils

    rect rgb(230,245,255)
    Note over HDFSUtils: try — resolve path & fetch default block size
    HDFSUtils->>HDFS: resolve(path) / getDefaultBlockSize
    end

    alt Success
        HDFS-->>HDFSUtils: blockSizeBytes
        HDFSUtils->>HDFSUtils: blockSizeInMB = bytes / (1024*1024)
        alt blockSizeInMB > 0
            HDFSUtils-->>Caller: Some(blockSizeInMB)
        else
            HDFSUtils-->>Caller: None
        end
    else NonFatal Exception
        rect rgb(255,235,235)
        Note over HDFSUtils: catch NonFatal → log debug, return None
        HDFS--x HDFSUtils: Exception
        HDFSUtils-->>Caller: None
        end
    end

    deactivate HDFSUtils

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Verify correct import and usage of scala.util.control.NonFatal.
Confirm debug log message is non-sensitive and appropriately worded.
Check numeric conversion and >0 boundary matches prior semantics.

Poem

🐰
I hopped through bytes and paths today,
A gentle try/catch cleared the way,
When HDFS stumbles, I softly chide,
I log the fall and turn aside —
Safe None returned, then off I play.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Linked Issues check	⚠️ Warning	The code change implements error handling for getDefaultBlockSize but does not address the core requirement of integrating OBS/OSS authentication support as requested in issue #796.	The PR only adds error handling for block size retrieval. To fully resolve #796, implement OBS/OSS authentication integration (e.g., credential providers, endpoint configuration support).

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main change: adding error handling when retrieving the default block size for Hadoop filesystem, which directly addresses issue #796.
Out of Scope Changes check	✅ Passed	All changes in HDFSUtils.scala are scoped to improving error handling for block size retrieval, which is contextually related to the underlying issue but narrower than the requested feature.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch bugfix/796-ignore-optional-default-block-size-errors

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2025-11-10T13:49:42Z

JaCoCo code coverage report - 'cobol-parser'

Overall Project	91.55%	🍏

There is no coverage information present for the Files changed

github-actions · 2025-11-10T13:49:44Z

JaCoCo code coverage report - 'spark-cobol'

File	Coverage [78.61%]	❌
HDFSUtils.scala	78.61%	❌

Total Project Coverage	79.81%	🍏

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (2)

spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/utils/HDFSUtils.scala (2)
75-91: Consider logging the caught exception for debugging.

While the try/catch block correctly returns None on errors (as per the PR objective), silently swallowing exceptions makes debugging difficult. If there's a legitimate HDFS access issue, operators have no visibility into what went wrong.

Consider adding a log statement before returning None:
     } catch {
       case NonFatal(ex) =>
+        // Log at debug or warn level depending on your logging framework
+        // Example: logger.debug(s"Failed to get HDFS default block size: ${ex.getMessage}", ex)
         None
     }
79-80: Minor: Variable naming and formatting.

Two small improvements:

Line 79: Variable name blockSizeInBM should be blockSizeInMB for consistency

Line 80: Missing spaces around > operator
-        val blockSizeInBM = (blockSizeInBytes / bytesInMegabyte).toInt
-        if (blockSizeInBM>0) {
-          Some (blockSizeInBM)
+        val blockSizeInMB = (blockSizeInBytes / bytesInMegabyte).toInt
+        if (blockSizeInMB > 0) {
+          Some(blockSizeInMB)

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4b427f6 and f12b239.

📒 Files selected for processing (1)

spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/utils/HDFSUtils.scala (2 hunks)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)

GitHub Check: Spark 3.4.4 on Scala 2.12.20
GitHub Check: Spark 2.4.8 on Scala 2.11.12
GitHub Check: Spark 3.5.7 on Scala 2.12.20
GitHub Check: Spark 3.5.7 on Scala 2.13.17
GitHub Check: test (2.12.20, 2.12, 3.3.4, 0, 80, 20)

🔇 Additional comments (1)

spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/utils/HDFSUtils.scala (1)

25-25: LGTM!

The NonFatal import is appropriate for catching operational exceptions while allowing fatal errors to propagate.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/utils/HDFSUtils.scala (1)
77-94: Good defensive error handling; optional log message enhancement.

The try-catch block properly addresses the PR objective by gracefully handling exceptions from unsupported filesystems like OBS/OSS. The use of NonFatal and debug-level logging is appropriate.

Verification confirms that the single caller in SparkUtils.scala (line 500) properly handles the None return value with a pattern match, logging an informational message when the block size cannot be obtained.

Optional enhancement: consider including the path in the log message for easier troubleshooting:
-        log.debug(s"Failed to get HDFS default block size.", ex)
+        log.debug(s"Failed to get HDFS default block size for path: ${path.getOrElse("/")}.", ex)

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f12b239 and 03037ba.

📒 Files selected for processing (1)

spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/utils/HDFSUtils.scala (2 hunks)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)

GitHub Check: Spark 3.4.4 on Scala 2.12.20
GitHub Check: Spark 3.5.7 on Scala 2.12.20
GitHub Check: Spark 3.5.7 on Scala 2.13.17
GitHub Check: Spark 2.4.8 on Scala 2.11.12
GitHub Check: test (2.12.20, 2.12, 3.3.4, 0, 80, 20)

🔇 Additional comments (1)

spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/utils/HDFSUtils.scala (1)

22-22: LGTM: Standard logging and error handling setup.

The SLF4J logging and NonFatal exception handling imports are idiomatic Scala best practices. The private logger instance is properly scoped.

Also applies to: 26-26, 32-32

… filesystem.

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 03037ba and 6ce6fcc.

📒 Files selected for processing (1)

spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/utils/HDFSUtils.scala (2 hunks)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)

GitHub Check: Spark 3.5.7 on Scala 2.13.17
GitHub Check: Spark 3.4.4 on Scala 2.12.20
GitHub Check: Spark 3.5.7 on Scala 2.12.20
GitHub Check: Spark 2.4.8 on Scala 2.11.12
GitHub Check: test (2.12.20, 2.12, 3.3.4, 0, 80, 20)

🔇 Additional comments (3)

spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/utils/HDFSUtils.scala (3)

22-22: LGTM! Logging and error handling imports are appropriate.

The SLF4J logging facade and NonFatal exception handling are standard best practices for Scala projects.

Also applies to: 26-26

32-32: LGTM! Logger initialization follows best practices.

The private logger instance is correctly initialized using the standard SLF4J pattern.

78-89: Excellent defensive error handling that addresses issue #796.

The try-catch with NonFatal correctly handles exceptions from filesystem implementations (like OBS/OSS) that may not support getDefaultBlockSize on certain paths. The nested defensive checks for positive values are appropriate, and returning None on error aligns with the function's documented behavior.

coderabbitai · 2025-11-10T13:59:46Z

spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/utils/HDFSUtils.scala

-      None
+    } catch {
+      case NonFatal(ex) =>
+        log.debug(s"Failed to get HDFS default block size for path: $hdfsPath..", ex)


⚠️ Potential issue | 🟡 Minor

Fix grammatical issue in log message.

The double period after $hdfsPath.. appears to be a typo.

Apply this diff to fix the grammar:

- log.debug(s"Failed to get HDFS default block size for path: $hdfsPath..", ex) + log.debug(s"Failed to get HDFS default block size for path: $hdfsPath", ex)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

log.debug(s"Failed to get HDFS default block size for path: $hdfsPath..", ex)

log.debug(s"Failed to get HDFS default block size for path: $hdfsPath", ex)

🤖 Prompt for AI Agents

In spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/utils/HDFSUtils.scala around line 92, the log message mistakenly contains a double period after the hdfsPath string; update the log.debug call to remove the extra period so the message reads with a single period (or otherwise proper punctuation) after the path, preserving the exception parameter.

yruslan changed the title ~~#799 Ignore errors when getting the default block size for the Hadoop filesystem.~~ #796 Ignore errors when getting the default block size for the Hadoop filesystem. Nov 10, 2025

coderabbitai bot reviewed Nov 10, 2025

View reviewed changes

yruslan force-pushed the bugfix/796-ignore-optional-default-block-size-errors branch from f12b239 to 03037ba Compare November 10, 2025 13:52

coderabbitai bot reviewed Nov 10, 2025

View reviewed changes

#796 Ignore errors when getting the default block size for the Hadoop…

6ce6fcc

… filesystem.

yruslan force-pushed the bugfix/796-ignore-optional-default-block-size-errors branch from 03037ba to 6ce6fcc Compare November 10, 2025 13:56

coderabbitai bot reviewed Nov 10, 2025

View reviewed changes

yruslan merged commit 87d5205 into master Nov 10, 2025
6 of 8 checks passed

yruslan deleted the bugfix/796-ignore-optional-default-block-size-errors branch November 10, 2025 14:03

yruslan mentioned this pull request Nov 10, 2025

ADD OBS OR OSS Access Method #796

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

#796 Ignore errors when getting the default block size for the Hadoop filesystem.#798

#796 Ignore errors when getting the default block size for the Hadoop filesystem.#798
yruslan merged 1 commit intomasterfrom
bugfix/796-ignore-optional-default-block-size-errors

yruslan commented Nov 10, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Nov 10, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Nov 10, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Nov 10, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Nov 10, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	log.debug(s"Failed to get HDFS default block size for path: $hdfsPath..", ex)
	log.debug(s"Failed to get HDFS default block size for path: $hdfsPath", ex)

Conversation

yruslan commented Nov 10, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Nov 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Poem

Pre-merge checks and finishing touches

Uh oh!

github-actions bot commented Nov 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

JaCoCo code coverage report - 'cobol-parser'

Uh oh!

github-actions bot commented Nov 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

JaCoCo code coverage report - 'spark-cobol'

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

yruslan commented Nov 10, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Nov 10, 2025 •

edited

Loading

github-actions bot commented Nov 10, 2025 •

edited

Loading

github-actions bot commented Nov 10, 2025 •

edited

Loading