Troubleshooting Common Issues with NFS Blue Globus

Troubleshooting Common Issues with NFS Blue Globus

Below are common problems you may encounter with NFS Blue Globus and clear, actionable steps to diagnose and fix them.

1. Mount failures on client

  • Symptom: mount command fails or times out.
  • Checks:
    1. Network reachability: ping the server and verify DNS resolution.
    2. Port availability: ensure NFS-related ports (usually 2049 for NFSv3/4) are open between client and server.
    3. Export visibility: run showmount -e from the client to list exports.
    4. Client mount options: confirm you’re using correct NFS version and options (e.g., vers=4, rw, noresvport).
  • Fixes:
    • Fix DNS or /etc/hosts entries if name resolution fails.
    • Open required ports on firewalls or adjust security groups.
    • Add or correct export entries on the server and run exportfs -r.
    • Try explicit options: sudo mount -t nfs -o vers=4 :/export /mnt.

2. Permission denied when accessing files

  • Symptom: EACCES or permission denied errors despite correct mount.
  • Checks:
    1. Server-side filesystem permissions: check owner, group, and mode (ls -l).
    2. Export mapping rules: inspect /etc/exports (or Blue Globus export configuration) for root squash, anonuid, or anongid settings.
    3. User ID mapping: ensure UID/GID consistency between client and server or that id mapping (idmapd) is configured for NFSv4.
  • Fixes:
    • Adjust file/dir permissions or ownership on the server.
    • Modify export options to disable root squash if appropriate (no_root_squash) or set anon UID/GID.
    • Configure and start idmapd, and ensure domain is consistent across systems.

3. Slow performance or high latency

  • Symptom: Read/write operations are slow or inconsistent.
  • Checks:
    1. Network latency and bandwidth: test with ping, mtr, iperf.
    2. I/O wait and server load: use iostat, vmstat, or top on the server.
    3. Mount options: check for synchronous mounts or small rsize/wsize values.
    4. Locking contention: look for heavy file locking or many small writes.
  • Fixes:
    • Optimize network (lower latency, increase bandwidth) or move clients closer to server.
    • Tune mount options: increase rsize/wsize, enable async where safe, and use appropriate tcp/udp settings.
    • Adjust server tunables: cache sizes, NFS thread counts, and underlying storage performance.
    • Reduce locking by batching writes or redesigning access patterns.

4. Stale file handles after server changes

  • Symptom: “Stale file handle” errors after server reboot, server export changes, or storage reconfiguration.
  • Checks:
    1. Confirm the exported filesystem’s UUID or inode structure changed (e.g., remounted different device).
    2. Verify server re-export behavior and whether export paths changed.
  • Fixes:
    • On the client, unmount and remount the export: sudo umount /mnt && sudo mount /mnt.
    • If stale handles persist, reboot affected clients or kill processes holding references.
    • Ensure consistent device mounting on the server so export paths and underlying IDs remain stable.

5. Authentication or Kerberos failures (NFSv4 with sec=krb5)

  • Symptom: Permission or mount failures tied to Kerberos authentication.
  • Checks:
    1. Verify time synchronization (NTP) between client, server, and KDC.
    2. Confirm valid Kerberos tickets: klist.
    3. Check keytab entries and service principals on the server.
    4. Inspect /etc/krb5.conf and GSS/kerberos-related logs.
  • Fixes:
    • Sync clocks using NTP/chrony.
    • Renew tickets (kinit) and ensure principal names match export configuration.
    • Regenerate or correct server keytabs and restart NFS services.

6. Locking and stale NLM locks

  • Symptom: Applications hang or report deadlocks due to NLM locks.
  • Checks:
    1. Check lock manager status (rpc.lockd/rpc.statd) on both client and server.
    2. Use lslocks or application-specific diagnostics to identify held locks.
  • Fixes:
    • Restart lock services (systemctl restart nfs-lock or equivalent) after ensuring it’s safe.
    • Coordinate application restarts if locks are held by orphaned processes.
    • Consider using NFSv4’s integrated locking which reduces reliance on external lock managers.

7. Export not visible to specific clients

  • Symptom: showmount or mount works from some clients but not others.
  • Checks:
    1. Confirm export allows the client’s IP or network range.
    2. Check tcpwrappers (/etc/hosts.allow, /etc/hosts.deny) and firewall rules.
    3. Verify SELinux or AppArmor policies aren’t blocking access.
  • Fixes:
    • Update export rules to include the client’s IP/subnet.
    • Adjust firewall and host-based access controls.
    • Temporarily disable SELinux/AppArmor for testing and then create proper rules.

8. Logs and diagnostics to collect

  • Essential logs:
    • Server: /var/log/messages, /var/log/syslog, NFS-specific logs, dmesg.
    • Client: system logs, dmesg, mount command output.
  • Useful commands:
    • showmount -e
    • mount, umount, /proc/mounts
    • rpcinfo -p
    • netstat -tulnp | grep nfs
    • iostat, vmstat, top, ss, tcpdump for network captures
  • When to escalate: persistent errors after basic fixes, suspected kernel bugs, or storage hardware faults — collect logs, reproducer steps, and timestamps before contacting vendor support.

Quick checklist (one-pass)

  1. Verify network connectivity and DNS.
  2. Confirm export configuration and permissions on server.
  3. Check client mount options and NFS version.
  4. Inspect server load, storage performance, and tuning.
  5. Review authentication (Kerberos) and id mapping if used.
  6. Collect logs and use rpcinfo/showmount for diagnostics.

If you want, I can convert this into a printable troubleshooting checklist or provide exact commands for your specific OS (e.g., RHEL 8, Ubuntu 22.04).

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *