Sometimes I get really pissed off with myself because I have taken a bigoted view, and ended up with eggs on my face. The past week was like that, and the problem was gnawing me on the inside all week, because I was determined to balance my equilibrium by finding the answer.
Early in the week, I was having a conversation with a potential customer. It evolved around the missing 10 seconds or so of the video footage between the users of a popular video editing software. The company had 70% Windows users, and 30% users on the Mac, both sides accessing the NAS device. The issue was the editors on the Windows side will store the raw and edited files to the NAS, but when the Mac users read them, they will often find 10 seconds or so of the stored video files missing.
The likeliest culprit of this problem is the way the SMB protocol write I/O behaves in Windows and in MacOS. Windows SMB, by default, writes I/O asynchronously while SMB on MacOS writes I/O synchronously.
I had a strong conviction I had the answer to this issue but this was not a TrueNAS®, It was another brand of NAS that I did not have knowledge of, and so, I left the conversation feeling quite embarrassed because I had the answer only on the TrueNAS® server side, not on the Windows client side. Bigotry blinded me. Hmmph!
TrueNAS® SLOG for synchronous writes
Those of us in the OpenZFS world know that the ZIL (ZFS Intent Log) on the SLOG (Separate Log Device) works well with synchronous writes. Write I/Os to the TrueNAS® and other OpenZFS NAS systems with SLOG (aka “the write cache”) are persistently stored and acknowledged very quickly to the requesting SMB clients. NFS and SMB on MacOS clients benefit greatly with a super fast SSD acting as persistent storage layer because their default behaviours are synchronous writes.
There are settings with the zfs set command such as sync=default/disabled/always that alter the behaviour of the write I/Os with the OpenZFS datasets.
Why, Windows why?
However, Windows SMB I/O writes asynchronously from the client side and prefer not to wait for a I/O request acknowledgement from the NAS server’s persistent storage medium. This is to speed up the SMB performance because waiting for an ACK incurs additional latency, however minute those milliseconds are. So, the nagging itch for me all week was to find out “How to force the Windows client to write synchronously?”
As of SMBv2, there is definitely bits that determine the synchronous write in the SMB packet header. But I was never a strong Windows SMB guy, knowing enough not to drown. The weekend became an investigative excursion and I found that the net use flag of /writethrough will force the SMB write to go all the way to a persistent storage medium.
The experiment
Here was what I did
- Create a test folder with 538 files with random sizes between 1 byte and 1 Gigabyte. They were all in a single flat folder totally 4.66GB. I used a Dummy File Creator application.
- Use Powershell instead of Cmd prompt.
- Mounted the SMB share from TrueNAS® with the sync=default to the Z: drive. Copy all the files from my laptop to the Z: drive. Measure the time.
- Mounted the SMB share from TrueNAS® with the sync=default to the Y: drive and with the /writethrough flag. Copy all the files from my laptop to the Y: drive. Measure the time.
Here are the screenshots of my experiment:
- In Powershell, check the SMB version with Get-SmbConnection cmdlet
- [ Async write testing ] Map Z: drive with the net use command. This is the default SMB write behaviour. The share from the TrueNAS® is called async.
- Measure the time of copy all the files to the Z: drive. In Powershell, there is a cmdlet called Measure-Command
The first part using SMB async write I/O of 538 files took 11 minutes 45 seconds. I did this part 5-6 again to validate my testing results.
Then I unmapped Z: drive and went to the second part using Windows client SMB sync write I/O.
- [ Sync write testing ] Map Y: drive with the net use command, with the /writethrough flag. The share from the TrueNAS® is called sync.
- Measure the time of copy all the files to the Y: drive, using the cmdlet called Measure-Command
This part of SMB synchronous write I/O took longer – 15 minutes 12 seconds. This was the obvious result, and I tested this several times to validate the slower behaviour.
Scratching the itch
What I did here and what I documented is nothing extraordinary. I wanted to throw out the nagging nuisance. I definitely scratched that gnawing itch because I couldn’t get over the part that I had a partial answer to the problem earlier in the week. That was not what I wanted – to leave the problem hanging. Therefore this pairing of the SMB client and the server is important to me as I am representing my company, iXsystems™ in helping to solve our customer’s problems. I and my colleagues hope to be a trusted advisor in our field, and thus the weekend experiment was another learning mini-adventure for me.
Now I feel more peaceful that my universe felt more balanced now. I also feel like a smart aleck too. 😉
Actually the issue is that the client was using a general purpose NAS for a specialized job.
If they’d been using a proper specialized NAS for video editing not dependent on CIFS/SMB, they wouldn’t have had that issue.