{"id":65,"date":"2009-11-24T14:13:04","date_gmt":"2009-11-24T20:13:04","guid":{"rendered":"http:\/\/edplese.com\/blog\/?p=65"},"modified":"2009-11-24T14:14:24","modified_gmt":"2009-11-24T20:14:24","slug":"zfs-deduplication-with-ntfs","status":"publish","type":"post","link":"https:\/\/www.edplese.com\/blog\/2009\/11\/24\/zfs-deduplication-with-ntfs\/","title":{"rendered":"ZFS Deduplication with NTFS"},"content":{"rendered":"<p>ZFS deduplication was recently integrated into build 128 of OpenSolaris, and while others have tested it out with normal file operations, I was curious to see how effective it would be with zvol-backed NTFS volumes.\u00a0 Due to the structure of NTFS I suspected that it would work well, and the results confirmed that.<\/p>\n<p>NTFS allocates space in fixed sizes, called clusters.\u00a0 The <a title=\"Default cluster size\" href=\"http:\/\/support.microsoft.com\/kb\/140365\">default cluster size<\/a> for NTFS volumes under 16 TB is 4K, but this can be explicitly set to different values when the volume is created.\u00a0 For this test I stuck with the default 4K cluster size and matched the zvol block size to the cluster size to maximize the effectiveness of the deduplication.\u00a0 In reality, for this test the zvol block size most likely had a negligible effect, but for normal workloads it could be considerable.<\/p>\n<p>The OpenSolaris system was prepared by installing OpenSolaris build 127, installing the COMSTAR iSCSI Target, and then <a href=\"http:\/\/hub.opensolaris.org\/bin\/view\/Community+Group+on\/devref_1#H134UpgradingtotheLatestONBits\">BFU<\/a>&#8216;ing the system to build 128.<\/p>\n<p>The zpool was created with both dedup and compression enabled:<\/p>\n<pre># zpool create tank c4t1d0\r\n# zfs set dedup=on tank\r\n# zfs set compression=on tank\r\n# zpool list tank\r\nNAME\u00a0\u00a0 SIZE\u00a0 ALLOC\u00a0\u00a0 FREE\u00a0\u00a0\u00a0 CAP\u00a0 DEDUP\u00a0 HEALTH\u00a0 ALTROOT\r\ntank\u00a0 19.9G\u00a0\u00a0 148K\u00a0 19.9G\u00a0\u00a0\u00a0\u00a0 0%\u00a0 1.00x\u00a0 ONLINE\u00a0 -<\/pre>\n<p>Next, the zvol block devices were created.\u00a0 Note that the <code>volblocksize<\/code> option was explicitly set to 4K:<\/p>\n<pre># zfs create tank\/zvols\r\n# zfs create -V 4G -o volblocksize=4K tank\/zvols\/vol1\r\n# zfs create -V 4G -o volblocksize=4K tank\/zvols\/vol2\r\n# zfs list -r tank\r\nNAME\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 USED\u00a0 AVAIL\u00a0 REFER\u00a0 MOUNTPOINT\r\ntank\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 8.00G\u00a0 11.6G\u00a0\u00a0\u00a0 23K\u00a0 \/tank\r\ntank\/zvols\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 8.00G\u00a0 11.6G\u00a0\u00a0\u00a0 21K\u00a0 \/tank\/zvols\r\ntank\/zvols\/vol1\u00a0\u00a0\u00a0\u00a0 4G\u00a0 15.6G\u00a0\u00a0\u00a0 20K\u00a0 -\r\ntank\/zvols\/vol2\u00a0\u00a0\u00a0\u00a0 4G\u00a0 15.6G\u00a0\u00a0\u00a0 20K\u00a0 -<\/pre>\n<p>After the zvols were created, they were shared with the COMSTAR iSCSI Target and then set up and formated as NTFS from Windows.\u00a0 With only 4 MB of data on the volumes, the dedup ratio shot way up.<\/p>\n<pre># zpool list tank\r\nNAME\u00a0\u00a0 SIZE\u00a0 ALLOC\u00a0\u00a0 FREE\u00a0\u00a0\u00a0 CAP\u00a0 DEDUP\u00a0 HEALTH\u00a0 ALTROOT\r\ntank\u00a0 19.9G\u00a0 3.88M\u00a0 19.9G\u00a0\u00a0\u00a0\u00a0 0%\u00a0 121.97x\u00a0 ONLINE\u00a0 -<\/pre>\n<p>The NTFS volumes were configured in Windows as disks <code>D:<\/code> and <code>E:<\/code>.\u00a0 I started off by copying a 10 MB file and then a 134 MB file to <code>D:<\/code>.\u00a0 The 10 MB file was used to offset the larger file from the start of the disk so that it wouldn&#8217;t be in the same location on both volumes.\u00a0 As expected, the dedup ratio dropped down towards 1x as there was only a single copy of the files:<\/p>\n<pre># zpool list tank\r\nNAME\u00a0\u00a0 SIZE\u00a0 ALLOC\u00a0\u00a0 FREE\u00a0\u00a0\u00a0 CAP\u00a0 DEDUP\u00a0 HEALTH\u00a0 ALTROOT\r\ntank\u00a0 19.9G\u00a0\u00a0 133M\u00a0 19.7G\u00a0\u00a0\u00a0\u00a0 0%\u00a0 1.39x\u00a0 ONLINE\u00a0 -<\/pre>\n<p>The 134 MB file was then copied to <code>E:<\/code>, and immediately the dedup ratio jumped up.\u00a0 So far, so good:\u00a0 dedup works across multiple NTFS volumes:<\/p>\n<pre># zpool list tank\r\nNAME\u00a0\u00a0 SIZE\u00a0 ALLOC\u00a0\u00a0 FREE\u00a0\u00a0\u00a0 CAP\u00a0 DEDUP\u00a0 HEALTH\u00a0 ALTROOT\r\ntank\u00a0 19.9G\u00a0\u00a0 173M\u00a0 19.7G\u00a0\u00a0\u00a0\u00a0 0%\u00a0 2.26x\u00a0 ONLINE\u00a0 -<\/pre>\n<p>A second copy of the 134 MB file was copied to <code>E:<\/code> to test dedup between files on the same NTFS volume.\u00a0 As expected, the dedup ratio jumped back up to around 3x:<\/p>\n<pre># zpool list tank\r\nNAME\u00a0\u00a0 SIZE\u00a0 ALLOC\u00a0\u00a0 FREE\u00a0\u00a0\u00a0 CAP\u00a0 DEDUP\u00a0 HEALTH\u00a0 ALTROOT\r\ntank\u00a0 19.9G\u00a0\u00a0 184M\u00a0 19.7G\u00a0\u00a0\u00a0\u00a0 0%\u00a0 3.19x\u00a0 ONLINE\u00a0 -<\/pre>\n<p>Though simple, these tests showed that ZFS deduplication performed well, and it conserved disk space within a single NTFS volume and also across multiple volumes in the same ZFS pool.\u00a0 The dedup ratios were even a bit higher than expected which suggests that quite a bit of the NTFS metadata, at least initially, was deduplicated.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>ZFS deduplication was recently integrated into build 128 of OpenSolaris, and while others have tested it out with normal file operations, I was curious to see how effective it would be with zvol-backed NTFS volumes.\u00a0 Due to the structure of NTFS I suspected that it would work well, and the results confirmed that. NTFS allocates [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[1],"tags":[7,6,3,8,5],"_links":{"self":[{"href":"https:\/\/www.edplese.com\/blog\/wp-json\/wp\/v2\/posts\/65"}],"collection":[{"href":"https:\/\/www.edplese.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.edplese.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.edplese.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.edplese.com\/blog\/wp-json\/wp\/v2\/comments?post=65"}],"version-history":[{"count":13,"href":"https:\/\/www.edplese.com\/blog\/wp-json\/wp\/v2\/posts\/65\/revisions"}],"predecessor-version":[{"id":78,"href":"https:\/\/www.edplese.com\/blog\/wp-json\/wp\/v2\/posts\/65\/revisions\/78"}],"wp:attachment":[{"href":"https:\/\/www.edplese.com\/blog\/wp-json\/wp\/v2\/media?parent=65"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.edplese.com\/blog\/wp-json\/wp\/v2\/categories?post=65"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.edplese.com\/blog\/wp-json\/wp\/v2\/tags?post=65"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}