| Solaris |
|
|
The ZFS deduplication feature removes redundant data from your ZFS file systems. If a file system has the dedup property enabled, duplicate data blocks are removed synchronously. The result is that only unique data is stored and common components are shared between files. For a detailed description of dedup, see Jeff's blog entry.
SXCE, build 129, with dedup features and fixes, is available in December 2009.
Known dedup CRs and issues:
The SXCE build 129 releases provide the following deduplication features:
The above dedup features are available in ZFS pool version 22.
If you enable dedup on file systems with duplicate data, you should see the benefits of saving space and better performance because less data is written and stored. If you enable dedup on file systems with little duplicate data, you will add system overhead with little benefits gained.
Note: The zdb debugging command can be used to determine the in-core dedup table requirements, but it must be
run on pools that are not in use.
Before you enable dedup, review the following recommendations:
# zdb -S pool-name
If the estimated dedup ratio is greater than 2, then you might see dedup space savings.
3. Make sure your system has enough memory to support dedup. Determine the memory requirements for deduplicating your data as follows:
A. Use the zdb -S ouput to determine the in-core dedup table requirements:
B. Additional memory considerations from Roch's excellent blog:
20 TB of unique data stored in 128K records or more than 1TB of unique data in 8K records would require about 32 GB of physical memory. If you need to store more unique data than what these ratios provide, strongly consider allocating some large read optimized SSD to hold the deduplication table (DDT). The DDT lookups are small random I/Os that are well handled by current generation SSDs.
In general, dedup performance is optimal when the deduplication table fits into memory. If the dedup table has to be written to disk, then performance will decrease. For example, removing a large file system with dedup enabled will severely decrease system performance if the system doesn't meet the memory requirements described above.
Use zdb -DD to display the size of the DDT. This command must be run on a quiet pool.
# zdb -DD pool-name
DDT is considered metadata. Up to 25% of memory (zfs_arc_meta_limit) can be used to store metadata. Monitor size of ZFS memory cache in bytes:
# kstat zfs::arcstats:size
See Roch's blog that describes factors that might impact deduplication performance.
The dedup property can be enabled on a ZFS file system by using the following syntax:
# zfs set dedup=on export
Enabling the dedup property on an existing file system means that all newly written data is deduplicated. Existing file system data remains duplicated.
Deuplication has a pool-wide scope so a read-only pool property, dedupratio, is provided to determine the deduplication ratio realized for your file systems. For example:
# zpool list NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT export 928G 47.5G 881G 5% 1.77x ONLINE - rpool 928G 25.7G 902G 2% 1.40x ONLINE -
A DEDUP ratio of 1.00x generally means that the dedup property is disabled or it has been initially set. As file system deduplication occurs, the DEDUP ratio will generally increase over time.
The zpool list output has changed in this Solaris release. These changes are described in Why has the zpool command changed?
How do I send deduplicated data?
You must use the zfs send -D syntax to send a deduplicated send stream even if the data is already deduped. If your ZFS data is not deduped, then you can send a deduplicated send stream by using the zfs send -D syntax.
What is the dedup checksum?
The default deduplication checksum is sha256. The following syntax is equivalent:
# zfs set dedup=on export # zfs set dedup=sha256 export
After the dedup property is enabled on a ZFS file system, the default file system checksum is sha256 for newly created files. Any previously set file system checksum property value, such as the default checksum of fletcher4, is overridden by the dedup property checksum.
Can I verify deduplicated hash comparisons?
You can ask ZFS to verify the SHA256 hash comparisons of blocks to be deduplicated as described in Jeff's blog by using this syntax:
# zfs set dedup=verify export
However, ZFS uses its own copy of SHA256 and doesn't currently use a crypto accelerator or crypto framework.
How does the dedup property interact with the copies property?
A block with copies set to N will always have at least N copies on the system regardless of the number of deduplicated references.
You can use the dedupditto property to specify a threshold, and if the reference count for a deduped block goes above the threshold, another ditto copy of the block is stored automatically. Need dedupditto values here.
Deduplicated space accounting is reported at the pool level. You must use the zpool list command rather than the zfs list command to identify disk space consumption when dedup is enabled. If you use the zfs list command to review deduplicated space, you might see that the file system appears to be increasing because we're able to store more data on the same physical device. Using the zpool list will show you how much physical space is being consumed and it will also show you the dedup ratio.
The df command is not dedup-aware and will not provide accurate space accounting.
Terms of Use
|
Privacy
|
Trademarks
|
Copyright Policy
|
Site Guidelines
|
Site Map
|
Help
Your use of this web site or any of its content or software indicates your agreement to be bound by these Terms of Use.
© 2012, Oracle Corporation and/or its affiliates.