Skip to content

Conversation

@COLDTURNIP
Copy link
Contributor

Which issue(s) this PR fixes:

Issue longhorn/longhorn#12006

What this PR does / why we need it:

Ensure the target mount point is unmounted and removed while node server handling a failed publish volume request.

Special notes for your reviewer:

Additional documentation or context

@COLDTURNIP COLDTURNIP self-assigned this Nov 7, 2025
@coderabbitai
Copy link

coderabbitai bot commented Nov 7, 2025

Important

Review skipped

Auto reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

Prepare and validate bind-mount targets for block volumes, add ensureBindMountPoint and lazy unmount handling, change mount failure handling to call unmountAndCleanupMountPoint with explicit cleanup/logging, and update NodePublishBlockVolume messages to report bind-mount failure details.

Changes

Cohort / File(s) Summary
Node server mount/error path
csi/node_server.go
Replace direct target file creation with ensureBindMountPoint, perform bind mount via mounter.Mount(...), on mount failure log the mount error, invoke unmountAndCleanupMountPoint (using a fresh mounter) to clean up, log cleanup outcome, and return an Internal error describing the bind-mount failure with device and target.
Mount-point utilities and cleanup
csi/util.go
Add ensureBindMountPoint(path string, mounter mount.Interface) error; add lazyUnmount(path string) using umount -l (os/exec); rework unmount to treat certain errors as non‑fatal and attempt lazy unmount fallback; update unmountAndCleanupMountPoint to unmount then remove the mount-point file directly (handling broken symlinks) instead of CleanupMountPoint; add isNotMountedError helper and adjust makeFile comment and behavior to create a regular file for bind mounts.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

  • Areas needing extra attention:
    • Race conditions between ensureBindMountPoint (which unmounts/creates file) and concurrent mount operations.
    • Correctness and security/permissions for lazyUnmount (umount -l) fallback and its error propagation.
    • Ensuring removal of mount-point paths does not mask the original mount error or leave resources busy.
    • Logging and error-wrapping clarity to avoid leaking internal details.

Possibly related PRs

Suggested reviewers

  • Vicente-Cheng
  • derekbit
  • WebberHuang1118

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: cleaning up the mount point when node fails to publish a volume.
Description check ✅ Passed The description is related to the changeset and explains the purpose: ensuring mount point cleanup during failed publish volume requests.
Linked Issues check ✅ Passed The PR implements both key requirements from issue #12006: ensures mount point cleanup and surfaces the mount error instead of shadowing it with cleanup errors.
Out of Scope Changes check ✅ Passed All changes are scoped to fixing mount error handling during CSI node publish; no unrelated modifications detected.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@COLDTURNIP COLDTURNIP force-pushed the 12006-csi_node_publish_vol_err_handling branch from 87abeed to 69f056f Compare November 11, 2025 09:45
@COLDTURNIP COLDTURNIP marked this pull request as ready for review November 11, 2025 09:45
@COLDTURNIP COLDTURNIP force-pushed the 12006-csi_node_publish_vol_err_handling branch from 69f056f to d8cd841 Compare November 11, 2025 09:48
@COLDTURNIP COLDTURNIP force-pushed the 12006-csi_node_publish_vol_err_handling branch from d8cd841 to 3f953da Compare November 12, 2025 03:24
@COLDTURNIP
Copy link
Contributor Author

@coderabbitai review

@coderabbitai
Copy link

coderabbitai bot commented Nov 12, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
csi/util.go (1)

467-481: Update misleading comment for makeFile.

The comment at Line 468 states "If pathname already exists, whether a file or directory, no error is returned," but this is inaccurate. When pathname is an existing directory, os.OpenFile with os.O_CREATE returns an error (typically EISDIR), which is not os.IsExist, so the function returns that error rather than nil.

Apply this diff to correct the comment:

 // makeFile creates an empty regular file.
-// If pathname already exists, whether a file or directory, no error is returned.
+// If pathname already exists as a regular file, no error is returned.
+// If pathname already exists as a directory or other non-regular file type, an error is returned.
 func makeFile(pathname string) error {
 	f, err := os.OpenFile(pathname, os.O_CREATE, os.FileMode(0644))
 	if f != nil {
 		err = f.Close()
 		return err
 	}
 	if err != nil {
 		if !os.IsExist(err) {
 			return err
 		}
 	}
 	return nil
 }
🧹 Nitpick comments (1)
csi/util.go (1)

341-364: Good addition for bind mount point validation.

The new ensureBindMountPoint function correctly validates and prepares bind mount points by removing corrupt entries and ensuring a regular file exists. The error handling and logging are appropriate.

However, note that if path is an existing directory, makeFile will fail (Line 360) because os.OpenFile on a directory returns an EISDIR error. This is likely the intended behavior (requiring a regular file for bind mounts), but consider adding an explicit check and a clearer error message for this case to improve debuggability.

Optional enhancement:

 func ensureBindMountPoint(path string) error {
 	logrus.Infof("Trying to ensure bind mount point %v", path)
 	info, statErr := os.Lstat(path)
 	if statErr != nil {
 		if !os.IsNotExist(statErr) {
 			return errors.Wrapf(statErr, "failed to check bind mount at %s", path)
 		}
 	} else {
+		if info.Mode().IsDir() {
+			return errors.Errorf("bind mount point at %s is a directory, expected regular file", path)
+		}
 		if !info.Mode().IsRegular() && !info.Mode().IsDir() {
-			logrus.WithField("fileMode", info.Mode().String()).Infof("Bind mount point %v exists but is not a valid file, clean up", path)
+			logrus.WithField("fileMode", info.Mode().String()).Infof("Bind mount point %v exists but is not a regular file (type: %s), cleaning up", path, info.Mode().Type())
 			if rmErr := os.Remove(path); rmErr != nil {
 				return errors.Wrapf(rmErr, "failed to remove corrupt bind mount point at %s", path)
 			}
 		}
 	}
 
 	if makeFileErr := makeFile(path); makeFileErr != nil {
 		return errors.Wrapf(makeFileErr, "failed to create regular file for bind mount at %s", path)
 	}
 	return nil
 }
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c3ec01a and 3f953da.

📒 Files selected for processing (2)
  • csi/node_server.go (1 hunks)
  • csi/util.go (2 hunks)
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: mantissahz
Repo: longhorn/longhorn-manager PR: 2182
File: csi/controller_server.go:1070-1071
Timestamp: 2024-10-30T07:07:32.174Z
Learning: In the method `cleanupBackupVolume` in `csi/controller_server.go`, errors are returned directly and handled at the upper-level method `DeleteSnapshot`. Therefore, it's acceptable to return errors directly in such helper methods without wrapping them.
Learnt from: mantissahz
Repo: longhorn/longhorn-manager PR: 3648
File: upgrade/v18xto190/upgrade.go:67-86
Timestamp: 2025-04-21T06:56:13.427Z
Learning: In `upgrade/v18xto190/upgrade.go`, changes made to volumes in the `upgradeVolumes` function after calling `ListAndUpdateVolumesInProvidedCache` are expected to be persisted by the caller, not within the function itself.
Learnt from: mantissahz
Repo: longhorn/longhorn-manager PR: 2182
File: manager/volume.go:1180-1182
Timestamp: 2024-10-29T12:51:34.772Z
Learning: Ensure that when reviewing the `restoreBackingImage` function in `manager/volume.go`, verify that variables referenced in error messages are actually undefined before suggesting corrections.
📚 Learning: 2024-10-30T12:47:32.309Z
Learnt from: mantissahz
Repo: longhorn/longhorn-manager PR: 2182
File: controller/volume_controller.go:3627-3636
Timestamp: 2024-10-30T12:47:32.309Z
Learning: In the file `controller/volume_controller.go`, avoid suggesting additional logging statements if they are unnecessary or redundant. Only add logging when it provides significant value.

Applied to files:

  • csi/node_server.go
📚 Learning: 2024-10-29T08:47:14.968Z
Learnt from: mantissahz
Repo: longhorn/longhorn-manager PR: 2182
File: controller/backup_controller.go:404-404
Timestamp: 2024-10-29T08:47:14.968Z
Learning: In `controller/backup_controller.go`, be aware that after setting the backup status to error when a volume is not found, the code already includes `return nil`, so no additional return statement is needed.

Applied to files:

  • csi/node_server.go
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Build binaries
  • GitHub Check: Summary
🔇 Additional comments (1)
csi/node_server.go (1)

384-390: Improved mount failure cleanup.

The change to use unmountAndCleanupMountPoint instead of os.Remove is the correct approach for cleaning up failed mount attempts. This ensures the mount point is properly unmounted before removal, preventing the original mount error from being shadowed by cleanup errors. The logging of cleanup failures is also appropriate.

Copy link
Contributor

@WebberHuang1118 WebberHuang1118 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a few questions and suggestions — please check it, thanks.

@WebberHuang1118 WebberHuang1118 self-requested a review November 12, 2025 04:05
@COLDTURNIP COLDTURNIP force-pushed the 12006-csi_node_publish_vol_err_handling branch 4 times, most recently from 958a8c2 to 928b613 Compare November 13, 2025 07:38
@COLDTURNIP
Copy link
Contributor Author

@coderabbitai review

@coderabbitai
Copy link

coderabbitai bot commented Nov 13, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3f953da and 928b613.

📒 Files selected for processing (2)
  • csi/node_server.go (1 hunks)
  • csi/util.go (5 hunks)
🧰 Additional context used
🧠 Learnings (14)
📚 Learning: 2024-10-29T08:47:14.968Z
Learnt from: mantissahz
Repo: longhorn/longhorn-manager PR: 2182
File: controller/backup_controller.go:404-404
Timestamp: 2024-10-29T08:47:14.968Z
Learning: In `controller/backup_controller.go`, be aware that after setting the backup status to error when a volume is not found, the code already includes `return nil`, so no additional return statement is needed.

Applied to files:

  • csi/node_server.go
📚 Learning: 2024-10-30T12:47:32.309Z
Learnt from: mantissahz
Repo: longhorn/longhorn-manager PR: 2182
File: controller/volume_controller.go:3627-3636
Timestamp: 2024-10-30T12:47:32.309Z
Learning: In the file `controller/volume_controller.go`, avoid suggesting additional logging statements if they are unnecessary or redundant. Only add logging when it provides significant value.

Applied to files:

  • csi/node_server.go
📚 Learning: 2024-10-30T12:51:18.255Z
Learnt from: mantissahz
Repo: longhorn/longhorn-manager PR: 2182
File: controller/volume_controller.go:3652-3669
Timestamp: 2024-10-30T12:51:18.255Z
Learning: In `volume_controller.go`, within the `getBackupVolumeInfo` method, avoid adding `nil` checks for the `volume` variable `v` because it is already guaranteed to be non-`nil` by an earlier check at the beginning of the `syncVolume` method.

Applied to files:

  • csi/node_server.go
📚 Learning: 2024-10-29T12:51:34.772Z
Learnt from: mantissahz
Repo: longhorn/longhorn-manager PR: 2182
File: manager/volume.go:1180-1182
Timestamp: 2024-10-29T12:51:34.772Z
Learning: Ensure that when reviewing the `restoreBackingImage` function in `manager/volume.go`, verify that variables referenced in error messages are actually undefined before suggesting corrections.

Applied to files:

  • csi/node_server.go
📚 Learning: 2025-04-21T06:56:13.427Z
Learnt from: mantissahz
Repo: longhorn/longhorn-manager PR: 3648
File: upgrade/v18xto190/upgrade.go:67-86
Timestamp: 2025-04-21T06:56:13.427Z
Learning: In `upgrade/v18xto190/upgrade.go`, changes made to volumes in the `upgradeVolumes` function after calling `ListAndUpdateVolumesInProvidedCache` are expected to be persisted by the caller, not within the function itself.

Applied to files:

  • csi/node_server.go
📚 Learning: 2024-11-07T08:45:46.186Z
Learnt from: mantissahz
Repo: longhorn/longhorn-manager PR: 2182
File: webhook/resources/backuptarget/validator.go:0-0
Timestamp: 2024-11-07T08:45:46.186Z
Learning: In the `validateDRVolume` function within `webhook/resources/backuptarget/validator.go`, prefer comparing backup target names via labels instead of URLs when validating DR volumes to account for potential URL formatting differences.

Applied to files:

  • csi/node_server.go
📚 Learning: 2024-10-28T13:21:51.980Z
Learnt from: mantissahz
Repo: longhorn/longhorn-manager PR: 2182
File: api/backup.go:123-128
Timestamp: 2024-10-28T13:21:51.980Z
Learning: In `api/backup.go`, the method `s.m.DeleteBackupTarget` already checks for the existence of the backup target and handles errors. Therefore, it's unnecessary to add an existence check before calling this method.

Applied to files:

  • csi/node_server.go
📚 Learning: 2024-10-24T12:04:44.187Z
Learnt from: mantissahz
Repo: longhorn/longhorn-manager PR: 2182
File: controller/backup_controller.go:325-326
Timestamp: 2024-10-24T12:04:44.187Z
Learning: In `controller/backup_controller.go`, `backupTargetClient` is guaranteed to be non-`nil` when there is no error, so additional `nil` checks before its usage are unnecessary.

Applied to files:

  • csi/node_server.go
📚 Learning: 2024-10-30T12:31:25.782Z
Learnt from: mantissahz
Repo: longhorn/longhorn-manager PR: 2182
File: upgrade/v17xto180/upgrade.go:68-110
Timestamp: 2024-10-30T12:31:25.782Z
Learning: In `upgrade/v17xto180/upgrade.go`, the `backupTargetName` can be trusted to be validated if it is not an empty string. Therefore, additional validation in `ensureBackupTargetSet` and `ensureLabel` is unnecessary.

Applied to files:

  • csi/node_server.go
📚 Learning: 2024-10-28T01:25:02.898Z
Learnt from: mantissahz
Repo: longhorn/longhorn-manager PR: 2182
File: controller/backing_image_data_source_controller.go:714-714
Timestamp: 2024-10-28T01:25:02.898Z
Learning: In `controller/backing_image_data_source_controller.go`, the method `GetBackupTargetRO` already handles error checking for `bids.Spec.Parameters[longhorn.DataSourceTypeRestoreParameterBackupTargetName]`, so explicit checks before calling it are unnecessary.

Applied to files:

  • csi/node_server.go
📚 Learning: 2024-10-30T12:42:20.034Z
Learnt from: mantissahz
Repo: longhorn/longhorn-manager PR: 2182
File: controller/volume_controller.go:4776-4778
Timestamp: 2024-10-30T12:42:20.034Z
Learning: In the `controller/volume_controller.go` file's `ReconcileBackupVolumeState` function, when using `errors.Wrapf(err, ...)`, avoid including `err` in the format string, since `errors.Wrapf` already includes the original error `err`. This helps to prevent redundancy in error messages.

Applied to files:

  • csi/node_server.go
📚 Learning: 2024-10-29T04:23:30.091Z
Learnt from: mantissahz
Repo: longhorn/longhorn-manager PR: 2182
File: controller/backup_volume_controller.go:240-242
Timestamp: 2024-10-29T04:23:30.091Z
Learning: In `controller/backup_volume_controller.go`, the `BackupVolumeDelete` method of `backupTargetClient` returns `nil` if the backup volume is not found, so it's unnecessary to check for 'not found' errors after calling this method.

Applied to files:

  • csi/node_server.go
📚 Learning: 2024-10-30T12:46:06.911Z
Learnt from: mantissahz
Repo: longhorn/longhorn-manager PR: 2182
File: controller/volume_controller.go:3884-3894
Timestamp: 2024-10-30T12:46:06.911Z
Learning: In `controller/volume_controller.go`, within the `restoreVolumeRecurringJobs` function, avoid adding redundant checks for `v.Spec.BackupTargetName == ""`, as `GetBackupVolumeWithBackupTargetAndVolumeRO` already performs this validation, and `v.Spec.BackupTargetName` defaults to `default` when the custom resource is created.

Applied to files:

  • csi/node_server.go
📚 Learning: 2024-11-25T23:55:02.080Z
Learnt from: derekbit
Repo: longhorn/longhorn-manager PR: 3282
File: controller/monitor/node_upgrade_monitor.go:351-357
Timestamp: 2024-11-25T23:55:02.080Z
Learning: `GetVolumeRO` guarantees that `volume` is non-nil when `err == nil`, so explicit nil checks after error handling are not needed.

Applied to files:

  • csi/node_server.go
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Build binaries
  • GitHub Check: Summary

@COLDTURNIP COLDTURNIP force-pushed the 12006-csi_node_publish_vol_err_handling branch from 928b613 to ad77400 Compare November 13, 2025 09:57
Copy link
Contributor

@WebberHuang1118 WebberHuang1118 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check the comments, thanks.

@WebberHuang1118 WebberHuang1118 self-requested a review November 14, 2025 07:59
@COLDTURNIP COLDTURNIP force-pushed the 12006-csi_node_publish_vol_err_handling branch 2 times, most recently from a80ea57 to 3fb706f Compare November 18, 2025 07:49
@COLDTURNIP
Copy link
Contributor Author

PR updated per review comment. Please help review again, thanks.

@COLDTURNIP
Copy link
Contributor Author

@coderabbitai review

@coderabbitai
Copy link

coderabbitai bot commented Nov 18, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
csi/util.go (1)

383-416: unmount still returns the original error even after successful lazy unmount

If the initial Unmount/UnmountWithForce call returns an error and IsMountPoint reports the path is still mounted, a lazy unmount with umount -l is performed. When that lazy unmount succeeds, the mount is released, but the function still returns the earlier error at line 416. This has two effects:

  1. unmountAndCleanupMountPoint receives an error and skips the subsequent Lstat+Remove cleanup (lines 428-440), leaving the mount point file unremovedMicrosoft.
  2. Node operations like NodeUnpublishVolume propagate this error and fail, reporting a failure despite the mount point having been successfully detached.

Fix by returning nil instead of the earlier err after a successful lazy unmount (line 416).

🧹 Nitpick comments (1)
csi/node_server.go (1)

365-385: Block publish flow now cleans up correctly and surfaces the real mount error

The updated nodePublishBlockVolume logic looks solid:

  • ensureDirectory(filepath.Dir(targetPath)) makes sure the parent directory exists before any bind work.
  • ensureBindMountPoint(targetPath, mounter) centralizes “clear and recreate bind target” logic, reusing the new unmount/cleanup helpers and guaranteeing a regular file.
  • On mount failure you:
    • Log mountErr.
    • Call unmountAndCleanupMountPoint to clean up the target, logging but ignoring cleanup failures.
    • Return a codes.Internal error that includes devicePath, targetPath, and mountErr, so the underlying bind-mount failure is visible to kubelet instead of being hidden by cleanup.

That matches the issue’s goal of cleaning up the target path without letting cleanup errors shadow the original mount failure. The only minor nit is that the cleanup path uses mount.New("") rather than the passed-in mounter, but for block volumes today that’s effectively the same implementation.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 928b613 and 3fb706f.

📒 Files selected for processing (2)
  • csi/node_server.go (1 hunks)
  • csi/util.go (5 hunks)
🧰 Additional context used
🧠 Learnings (16)
📚 Learning: 2024-10-30T12:47:32.309Z
Learnt from: mantissahz
Repo: longhorn/longhorn-manager PR: 2182
File: controller/volume_controller.go:3627-3636
Timestamp: 2024-10-30T12:47:32.309Z
Learning: In the file `controller/volume_controller.go`, avoid suggesting additional logging statements if they are unnecessary or redundant. Only add logging when it provides significant value.

Applied to files:

  • csi/node_server.go
📚 Learning: 2024-10-29T08:47:14.968Z
Learnt from: mantissahz
Repo: longhorn/longhorn-manager PR: 2182
File: controller/backup_controller.go:404-404
Timestamp: 2024-10-29T08:47:14.968Z
Learning: In `controller/backup_controller.go`, be aware that after setting the backup status to error when a volume is not found, the code already includes `return nil`, so no additional return statement is needed.

Applied to files:

  • csi/node_server.go
  • csi/util.go
📚 Learning: 2025-04-21T06:56:13.427Z
Learnt from: mantissahz
Repo: longhorn/longhorn-manager PR: 3648
File: upgrade/v18xto190/upgrade.go:67-86
Timestamp: 2025-04-21T06:56:13.427Z
Learning: In `upgrade/v18xto190/upgrade.go`, changes made to volumes in the `upgradeVolumes` function after calling `ListAndUpdateVolumesInProvidedCache` are expected to be persisted by the caller, not within the function itself.

Applied to files:

  • csi/node_server.go
📚 Learning: 2024-10-29T12:51:34.772Z
Learnt from: mantissahz
Repo: longhorn/longhorn-manager PR: 2182
File: manager/volume.go:1180-1182
Timestamp: 2024-10-29T12:51:34.772Z
Learning: Ensure that when reviewing the `restoreBackingImage` function in `manager/volume.go`, verify that variables referenced in error messages are actually undefined before suggesting corrections.

Applied to files:

  • csi/node_server.go
  • csi/util.go
📚 Learning: 2024-10-28T13:21:51.980Z
Learnt from: mantissahz
Repo: longhorn/longhorn-manager PR: 2182
File: api/backup.go:123-128
Timestamp: 2024-10-28T13:21:51.980Z
Learning: In `api/backup.go`, the method `s.m.DeleteBackupTarget` already checks for the existence of the backup target and handles errors. Therefore, it's unnecessary to add an existence check before calling this method.

Applied to files:

  • csi/node_server.go
📚 Learning: 2024-10-24T12:04:44.187Z
Learnt from: mantissahz
Repo: longhorn/longhorn-manager PR: 2182
File: controller/backup_controller.go:325-326
Timestamp: 2024-10-24T12:04:44.187Z
Learning: In `controller/backup_controller.go`, `backupTargetClient` is guaranteed to be non-`nil` when there is no error, so additional `nil` checks before its usage are unnecessary.

Applied to files:

  • csi/node_server.go
📚 Learning: 2024-10-30T12:31:25.782Z
Learnt from: mantissahz
Repo: longhorn/longhorn-manager PR: 2182
File: upgrade/v17xto180/upgrade.go:68-110
Timestamp: 2024-10-30T12:31:25.782Z
Learning: In `upgrade/v17xto180/upgrade.go`, the `backupTargetName` can be trusted to be validated if it is not an empty string. Therefore, additional validation in `ensureBackupTargetSet` and `ensureLabel` is unnecessary.

Applied to files:

  • csi/node_server.go
📚 Learning: 2024-10-28T01:25:02.898Z
Learnt from: mantissahz
Repo: longhorn/longhorn-manager PR: 2182
File: controller/backing_image_data_source_controller.go:714-714
Timestamp: 2024-10-28T01:25:02.898Z
Learning: In `controller/backing_image_data_source_controller.go`, the method `GetBackupTargetRO` already handles error checking for `bids.Spec.Parameters[longhorn.DataSourceTypeRestoreParameterBackupTargetName]`, so explicit checks before calling it are unnecessary.

Applied to files:

  • csi/node_server.go
📚 Learning: 2024-10-30T12:42:20.034Z
Learnt from: mantissahz
Repo: longhorn/longhorn-manager PR: 2182
File: controller/volume_controller.go:4776-4778
Timestamp: 2024-10-30T12:42:20.034Z
Learning: In the `controller/volume_controller.go` file's `ReconcileBackupVolumeState` function, when using `errors.Wrapf(err, ...)`, avoid including `err` in the format string, since `errors.Wrapf` already includes the original error `err`. This helps to prevent redundancy in error messages.

Applied to files:

  • csi/node_server.go
📚 Learning: 2024-10-29T04:23:30.091Z
Learnt from: mantissahz
Repo: longhorn/longhorn-manager PR: 2182
File: controller/backup_volume_controller.go:240-242
Timestamp: 2024-10-29T04:23:30.091Z
Learning: In `controller/backup_volume_controller.go`, the `BackupVolumeDelete` method of `backupTargetClient` returns `nil` if the backup volume is not found, so it's unnecessary to check for 'not found' errors after calling this method.

Applied to files:

  • csi/node_server.go
📚 Learning: 2024-10-30T12:46:06.911Z
Learnt from: mantissahz
Repo: longhorn/longhorn-manager PR: 2182
File: controller/volume_controller.go:3884-3894
Timestamp: 2024-10-30T12:46:06.911Z
Learning: In `controller/volume_controller.go`, within the `restoreVolumeRecurringJobs` function, avoid adding redundant checks for `v.Spec.BackupTargetName == ""`, as `GetBackupVolumeWithBackupTargetAndVolumeRO` already performs this validation, and `v.Spec.BackupTargetName` defaults to `default` when the custom resource is created.

Applied to files:

  • csi/node_server.go
📚 Learning: 2024-11-25T23:55:02.080Z
Learnt from: derekbit
Repo: longhorn/longhorn-manager PR: 3282
File: controller/monitor/node_upgrade_monitor.go:351-357
Timestamp: 2024-11-25T23:55:02.080Z
Learning: `GetVolumeRO` guarantees that `volume` is non-nil when `err == nil`, so explicit nil checks after error handling are not needed.

Applied to files:

  • csi/node_server.go
  • csi/util.go
📚 Learning: 2024-10-29T08:51:32.394Z
Learnt from: mantissahz
Repo: longhorn/longhorn-manager PR: 2182
File: controller/backup_controller.go:310-310
Timestamp: 2024-10-29T08:51:32.394Z
Learning: In `controller/backup_controller.go`, when retrieving `backupVolume` using `GetBackupVolumeWithBackupTargetAndVolume`, it can be `nil` without being an error, and the code should handle this case appropriately.

Applied to files:

  • csi/util.go
📚 Learning: 2024-10-30T12:51:18.255Z
Learnt from: mantissahz
Repo: longhorn/longhorn-manager PR: 2182
File: controller/volume_controller.go:3652-3669
Timestamp: 2024-10-30T12:51:18.255Z
Learning: In `volume_controller.go`, within the `getBackupVolumeInfo` method, avoid adding `nil` checks for the `volume` variable `v` because it is already guaranteed to be non-`nil` by an earlier check at the beginning of the `syncVolume` method.

Applied to files:

  • csi/util.go
📚 Learning: 2024-10-28T15:09:08.570Z
Learnt from: mantissahz
Repo: longhorn/longhorn-manager PR: 2182
File: webhook/resources/volume/mutator.go:115-117
Timestamp: 2024-10-28T15:09:08.570Z
Learning: In `webhook/resources/volume/mutator.go`, within the `Create` method of `volumeMutator`, it's acceptable for `bv` to be `nil` after calling `v.ds.GetBackupVolumeWithBackupTargetAndVolumeRO`, as a `nil` `bv` is handled by the volume and engine controllers. Explicit `nil` checks in this context are not necessary.

Applied to files:

  • csi/util.go
📚 Learning: 2024-10-28T14:41:50.732Z
Learnt from: mantissahz
Repo: longhorn/longhorn-manager PR: 2182
File: controller/uninstall_controller.go:530-535
Timestamp: 2024-10-28T14:41:50.732Z
Learning: The duplicate backup target deletion code in the `UninstallController` is intentional, and the `deleteRecreatedCRs` method may be removed instead.

Applied to files:

  • csi/util.go
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Build binaries
  • GitHub Check: Summary
🔇 Additional comments (3)
csi/util.go (3)

342-358: ensureBindMountPoint behavior looks correct and idempotent

The flow of unconditionally calling unmountAndCleanupMountPoint and then makeFile gives a clean, regular-file bind target on first and subsequent calls, while treating non-existent or already-unmounted paths as success via the underlying helpers. This matches the block bind-mount requirements and avoids leaking stale symlinks or files.


419-442: Cleanup path correctly handles broken symlinks and ENOENT

The updated unmountAndCleanupMountPoint uses the new unmount behavior and then os.Lstat + os.Remove, treating os.IsNotExist as success and logging the file mode when removal is needed. This is a good fit for:

  • Handling broken symlinks reliably.
  • Keeping Node* calls idempotent when the path is already gone.

Once unmount’s lazy-unmount return behavior is adjusted as suggested, this function should fully honor the “best-effort cleanup without hiding the original mount failure” goal.


487-501: makeFile doc comment matches current usage

The clarified comment (“creates an empty regular file”) aligns with how makeFile is used after unmountAndCleanupMountPoint to materialize a bind-mount target. The behavior of tolerating existing paths without error remains compatible with previous callers.

@COLDTURNIP COLDTURNIP force-pushed the 12006-csi_node_publish_vol_err_handling branch 2 times, most recently from 08dfe31 to fa1a17e Compare November 18, 2025 08:17
Copy link
Contributor

@WebberHuang1118 WebberHuang1118 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check the comments, thanks.

@COLDTURNIP COLDTURNIP force-pushed the 12006-csi_node_publish_vol_err_handling branch from fa1a17e to 372da43 Compare November 21, 2025 08:59
@COLDTURNIP
Copy link
Contributor Author

@coderabbitai review

@coderabbitai
Copy link

coderabbitai bot commented Nov 24, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@COLDTURNIP COLDTURNIP force-pushed the 12006-csi_node_publish_vol_err_handling branch from 372da43 to f7bb24e Compare November 24, 2025 07:39
@COLDTURNIP
Copy link
Contributor Author

@coderabbitai review

@coderabbitai
Copy link

coderabbitai bot commented Nov 24, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
csi/util.go (2)

373-389: Consider validating mount state before blind cleanup.

The function unconditionally unmounts and removes the target path, even if it's already a valid bind mount point. While this ensures a clean state, it may cause unnecessary work and potential disruption if the mount point is already correctly configured. Consider checking the current state first (e.g., whether the path exists and is a regular file, and whether it's already mounted to the correct device) before performing cleanup.

However, if the "blind cleanup" approach is intentional to ensure a completely fresh state (aligning with the PR objective of thorough cleanup), this is acceptable.


414-424: Consider using utilexec for consistency and testability.

The function uses exec.Command directly, while the rest of the codebase uses the utilexec package (e.g., line 23 imports utilexec, and it's used in getDiskFormat at line 485). Using utilexec would improve consistency and testability through mock executors.

Apply this diff if you want to align with the existing pattern:

-func lazyUnmount(path string) (err error) {
-	lazyUnmountCmd := exec.Command("umount", "-l", path)
-	output, err := lazyUnmountCmd.CombinedOutput()
+func lazyUnmount(path string, exec utilexec.Interface) (err error) {
+	lazyUnmountCmd := exec.Command("umount", "-l", path)
+	output, err := lazyUnmountCmd.CombinedOutput()
 	if err != nil {
 		if isNotMountedError(err) {
 			return nil
 		}
 		return errors.Wrapf(err, "failed to lazy unmount a mount point %v, output: %s", path, output)
 	}
 	return nil
 }

Then update the caller in unmount to pass utilexec.New() or the appropriate executor.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 372da43 and f7bb24e.

📒 Files selected for processing (2)
  • csi/node_server.go (1 hunks)
  • csi/util.go (7 hunks)
🧰 Additional context used
🧠 Learnings (16)
📚 Learning: 2025-04-21T06:56:13.427Z
Learnt from: mantissahz
Repo: longhorn/longhorn-manager PR: 3648
File: upgrade/v18xto190/upgrade.go:67-86
Timestamp: 2025-04-21T06:56:13.427Z
Learning: In `upgrade/v18xto190/upgrade.go`, changes made to volumes in the `upgradeVolumes` function after calling `ListAndUpdateVolumesInProvidedCache` are expected to be persisted by the caller, not within the function itself.

Applied to files:

  • csi/node_server.go
📚 Learning: 2024-10-29T08:47:14.968Z
Learnt from: mantissahz
Repo: longhorn/longhorn-manager PR: 2182
File: controller/backup_controller.go:404-404
Timestamp: 2024-10-29T08:47:14.968Z
Learning: In `controller/backup_controller.go`, be aware that after setting the backup status to error when a volume is not found, the code already includes `return nil`, so no additional return statement is needed.

Applied to files:

  • csi/node_server.go
  • csi/util.go
📚 Learning: 2024-10-30T12:47:32.309Z
Learnt from: mantissahz
Repo: longhorn/longhorn-manager PR: 2182
File: controller/volume_controller.go:3627-3636
Timestamp: 2024-10-30T12:47:32.309Z
Learning: In the file `controller/volume_controller.go`, avoid suggesting additional logging statements if they are unnecessary or redundant. Only add logging when it provides significant value.

Applied to files:

  • csi/node_server.go
📚 Learning: 2024-10-28T13:21:51.980Z
Learnt from: mantissahz
Repo: longhorn/longhorn-manager PR: 2182
File: api/backup.go:123-128
Timestamp: 2024-10-28T13:21:51.980Z
Learning: In `api/backup.go`, the method `s.m.DeleteBackupTarget` already checks for the existence of the backup target and handles errors. Therefore, it's unnecessary to add an existence check before calling this method.

Applied to files:

  • csi/node_server.go
📚 Learning: 2024-10-24T12:04:44.187Z
Learnt from: mantissahz
Repo: longhorn/longhorn-manager PR: 2182
File: controller/backup_controller.go:325-326
Timestamp: 2024-10-24T12:04:44.187Z
Learning: In `controller/backup_controller.go`, `backupTargetClient` is guaranteed to be non-`nil` when there is no error, so additional `nil` checks before its usage are unnecessary.

Applied to files:

  • csi/node_server.go
📚 Learning: 2024-10-30T12:31:25.782Z
Learnt from: mantissahz
Repo: longhorn/longhorn-manager PR: 2182
File: upgrade/v17xto180/upgrade.go:68-110
Timestamp: 2024-10-30T12:31:25.782Z
Learning: In `upgrade/v17xto180/upgrade.go`, the `backupTargetName` can be trusted to be validated if it is not an empty string. Therefore, additional validation in `ensureBackupTargetSet` and `ensureLabel` is unnecessary.

Applied to files:

  • csi/node_server.go
📚 Learning: 2024-10-28T01:25:02.898Z
Learnt from: mantissahz
Repo: longhorn/longhorn-manager PR: 2182
File: controller/backing_image_data_source_controller.go:714-714
Timestamp: 2024-10-28T01:25:02.898Z
Learning: In `controller/backing_image_data_source_controller.go`, the method `GetBackupTargetRO` already handles error checking for `bids.Spec.Parameters[longhorn.DataSourceTypeRestoreParameterBackupTargetName]`, so explicit checks before calling it are unnecessary.

Applied to files:

  • csi/node_server.go
📚 Learning: 2024-10-30T12:42:20.034Z
Learnt from: mantissahz
Repo: longhorn/longhorn-manager PR: 2182
File: controller/volume_controller.go:4776-4778
Timestamp: 2024-10-30T12:42:20.034Z
Learning: In the `controller/volume_controller.go` file's `ReconcileBackupVolumeState` function, when using `errors.Wrapf(err, ...)`, avoid including `err` in the format string, since `errors.Wrapf` already includes the original error `err`. This helps to prevent redundancy in error messages.

Applied to files:

  • csi/node_server.go
📚 Learning: 2024-10-29T04:23:30.091Z
Learnt from: mantissahz
Repo: longhorn/longhorn-manager PR: 2182
File: controller/backup_volume_controller.go:240-242
Timestamp: 2024-10-29T04:23:30.091Z
Learning: In `controller/backup_volume_controller.go`, the `BackupVolumeDelete` method of `backupTargetClient` returns `nil` if the backup volume is not found, so it's unnecessary to check for 'not found' errors after calling this method.

Applied to files:

  • csi/node_server.go
📚 Learning: 2024-10-29T12:51:34.772Z
Learnt from: mantissahz
Repo: longhorn/longhorn-manager PR: 2182
File: manager/volume.go:1180-1182
Timestamp: 2024-10-29T12:51:34.772Z
Learning: Ensure that when reviewing the `restoreBackingImage` function in `manager/volume.go`, verify that variables referenced in error messages are actually undefined before suggesting corrections.

Applied to files:

  • csi/node_server.go
  • csi/util.go
📚 Learning: 2024-10-30T12:46:06.911Z
Learnt from: mantissahz
Repo: longhorn/longhorn-manager PR: 2182
File: controller/volume_controller.go:3884-3894
Timestamp: 2024-10-30T12:46:06.911Z
Learning: In `controller/volume_controller.go`, within the `restoreVolumeRecurringJobs` function, avoid adding redundant checks for `v.Spec.BackupTargetName == ""`, as `GetBackupVolumeWithBackupTargetAndVolumeRO` already performs this validation, and `v.Spec.BackupTargetName` defaults to `default` when the custom resource is created.

Applied to files:

  • csi/node_server.go
📚 Learning: 2024-11-25T23:55:02.080Z
Learnt from: derekbit
Repo: longhorn/longhorn-manager PR: 3282
File: controller/monitor/node_upgrade_monitor.go:351-357
Timestamp: 2024-11-25T23:55:02.080Z
Learning: `GetVolumeRO` guarantees that `volume` is non-nil when `err == nil`, so explicit nil checks after error handling are not needed.

Applied to files:

  • csi/node_server.go
  • csi/util.go
📚 Learning: 2024-10-29T08:51:32.394Z
Learnt from: mantissahz
Repo: longhorn/longhorn-manager PR: 2182
File: controller/backup_controller.go:310-310
Timestamp: 2024-10-29T08:51:32.394Z
Learning: In `controller/backup_controller.go`, when retrieving `backupVolume` using `GetBackupVolumeWithBackupTargetAndVolume`, it can be `nil` without being an error, and the code should handle this case appropriately.

Applied to files:

  • csi/util.go
📚 Learning: 2024-10-30T12:51:18.255Z
Learnt from: mantissahz
Repo: longhorn/longhorn-manager PR: 2182
File: controller/volume_controller.go:3652-3669
Timestamp: 2024-10-30T12:51:18.255Z
Learning: In `volume_controller.go`, within the `getBackupVolumeInfo` method, avoid adding `nil` checks for the `volume` variable `v` because it is already guaranteed to be non-`nil` by an earlier check at the beginning of the `syncVolume` method.

Applied to files:

  • csi/util.go
📚 Learning: 2024-10-28T15:09:08.570Z
Learnt from: mantissahz
Repo: longhorn/longhorn-manager PR: 2182
File: webhook/resources/volume/mutator.go:115-117
Timestamp: 2024-10-28T15:09:08.570Z
Learning: In `webhook/resources/volume/mutator.go`, within the `Create` method of `volumeMutator`, it's acceptable for `bv` to be `nil` after calling `v.ds.GetBackupVolumeWithBackupTargetAndVolumeRO`, as a `nil` `bv` is handled by the volume and engine controllers. Explicit `nil` checks in this context are not necessary.

Applied to files:

  • csi/util.go
📚 Learning: 2024-10-28T14:41:50.732Z
Learnt from: mantissahz
Repo: longhorn/longhorn-manager PR: 2182
File: controller/uninstall_controller.go:530-535
Timestamp: 2024-10-28T14:41:50.732Z
Learning: The duplicate backup target deletion code in the `UninstallController` is intentional, and the `deleteRecreatedCRs` method may be removed instead.

Applied to files:

  • csi/util.go
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Build binaries
  • GitHub Check: Summary
🔇 Additional comments (6)
csi/util.go (3)

426-440: LGTM! Enhanced unmount with lazy fallback.

The addition of lazy unmount as a fallback for stuck mount points aligns well with the PR objectives. The use of isNotMountedError to treat "not mounted" conditions as success is appropriate, and the logging provides good visibility for troubleshooting.


442-465: LGTM! Improved cleanup with explicit symlink handling.

The refactored cleanup logic using os.Lstat correctly handles broken symlinks (which os.Stat would fail on), and the explicit os.Remove call with error checking ensures thorough cleanup. The logging of file mode information is helpful for troubleshooting.


575-581: String-based error detection is acceptable for mount errors.

The function uses string matching to detect "not mounted" conditions, which is somewhat fragile but reasonable given that mount utilities typically return string-based errors. The checked strings ("not mounted" and "no mount point specified") cover the common cases.

Note: If future maintenance reveals additional error patterns that should be treated as "not mounted", consider adding them to this helper.

csi/node_server.go (3)

367-367: LGTM! More precise error message.

The updated message "prepare mount point directory" correctly distinguishes the parent directory preparation from the bind mount point file preparation that follows.


370-373: LGTM! Clean integration of bind mount point preparation.

The call to ensureBindMountPoint properly prepares the target path for bind mounting, and the error message clearly distinguishes "mount point file" from the earlier "mount point directory" message, providing good clarity for troubleshooting.


376-384: Excellent error handling that preserves the mount failure.

The updated logic correctly:

  • Returns immediately on success (lines 377-379)
  • Attempts cleanup on failure without shadowing the original mount error (lines 380-383)
  • Returns the mount error (mountErr) with full context (line 384)

This perfectly aligns with the PR objective to "ensure the original mount error is not shadowed by subsequent cleanup logic." The error message provides comprehensive debugging information with both the device path and target path.

@COLDTURNIP COLDTURNIP force-pushed the 12006-csi_node_publish_vol_err_handling branch from f7bb24e to 5f20e98 Compare November 24, 2025 07:54
Copy link
Contributor

@WebberHuang1118 WebberHuang1118 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just a nit, thanks for the enhancement.

longhorn-12006

Signed-off-by: Raphanus Lo <yunchang.lo@suse.com>
@COLDTURNIP COLDTURNIP force-pushed the 12006-csi_node_publish_vol_err_handling branch from 5f20e98 to 1b80f32 Compare November 25, 2025 01:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants