| Solaris |
|
|
Diskomizer is a program for testing and verifying storage subsystems. It uses multiple processes to do or simulate asynchronous writes and reads to objects that are specified and then verifies the data that is read back is the data that was written to that block. Every block of data has a unique header each time it is written and the body of the data changes every time the block is written. It can also be used to do read only testing of devices; providing a non destructive method of testing.
Diskomizer will find broken devices and paths to devices software bugs and latent faults in hardware. It does not break these devices, it simply finds faults that are already there. It knows nothing about the under lying storage devices, and hence can be used just as well to generate load on NFS file systems as any other device. Diskomizer can be run as an ordinary user as long as that user has permission to open the files and or devices that are being used.
The Diskomizer package contains the complete documentation set. Please use that as the reference for the version you have installed.
Diskomizer consists of a main program that then uses plugin libraries to do or simulate asynchronous IO to whatever the underlying storage is. Thanks to this architecture it is possible to use Diskomizer to test many different types of IO path and also different kinds of storage. It is possible to produce plugins to allow the use of Diskomizer to test non disk based storage systems. As long as the medium can do random IO and you can implement thread safe routines to offer a pread(2) & pwrite(2) interface it should be possible to uses Diskomizer to test your storage system. Everything from a traditional disk to a file system or even a RDBMS.
The asynchronous I/O model that Diskomizer uses is loaded from a shared library at run time. Currently there are five different models available in the Diskomizer package. More could be written.
I/O Model |
|---|
Comments |
SUNOS |
This is the traditional SunOS asynchronous I/O model using aiowrite(), aioread() and aiowait(). |
POSIX |
This uses the POSIX asynchronous I/O model, using aio_write(), aio_read(), aio_error() and aio_return(). |
PREAD |
This uses POSIX threads to issue pread() and pwrite() system calls asynchronously to the main thread. |
FS |
This uses POSIX threads to get asynchronous behaviour but then stores the data in multiple files in a directory, or as attributes of the directory. (See fsattr(5)). This overcomes the single per file writer lock that many file systems have as there are now many files. |
USCSI |
This uses POSIX threads to get asynchronous behaviour and then uses uscsi(7I) to issue the IO. Since only root can issue uscsi commands only root users can use this feature. |
If you wish to exercise raw devices then the SUNOS model is the most efficient with the POSIX model a close second. For file system testing the FS model has the greatest potential, as it has no limits on the number of threads that it will use; however this can lead to very large numbers of threads running and can reduce the impact of any per file write locks that the file system may have.
The IO model is selected using the AIO_ROUTINES option.
Diskomizer is very memory intensive. In addition to the memory required for the buffers to do I/O to and from, it also has to store some data about each block on the devices so that when a block is read back it knows what it wrote to that block and can check that the data is correct. The 32-bit Diskomizer keeps 28 bytes of data per Diskomizer disk block. The 64 bit version keeps 48 bytes. It is these blocks that dominate Diskomizer's memory use. If you need to reduce the memory foot print of Diskomizer consider two options:
It should be clear that the 32-bit Diskomizer will only have enough address space to hold data for at the very most 292G of storage doing 2K I/O's, and that assumes that there is nothing else in the address space, which there clearly is. Additionally there are various resource limitations that can be configured on the system, that will restrict the number and or size of individual memory segments further so that even when using the 64-bit Diskomizer it is not always possible to have all the memory required mapped at the same time.
Diskomizer works around these issues by allowing certain memory segments to be detached and attached on demand. However in doing so you need to be careful that you do not just end up testing the ability of the system to page memory from swap devices.
Here is a brief description of each of the memory allocators. All the shared memory used by Diskomizer is allocated at start up time but attached to at run time.
The SHM shared memory allocator uses System V shared memory obtained with shmget(2) and attached using shmat(2).
When Diskomizer needs to allocate a chunk of shared memory it searches the shared memory segments that are already allocated for a chunk of memory that is free, large enough and of the same type (there are two types memory that can be detached and memory that can not be attached). If it finds a large enough free chunk of memory then it uses enough memory from that chunk as it needs. If it can not find a large enough chunk then it allocates a new block of shared memory using shmget with the maximum size that it can (configured by the option SHMINFO_SHMMAX) and uses as much of that block of memory as it needs.
At run time when it needs to access a chunk of shared memory, it finds which block of shared memory the chunk that it needs is in and if that block is not currently attached it attempts to attach the whole block. If the attach fails then it finds the least recently used block of shared memory that is not in use and detaches that and tries the attach again. This continues until either the memory is attached or all the free memory is detached. If after detaching all the free memory the new attach still does not succeed then Diskomizer will exit with an error.
The ISM shared memory allocator is identical to the SHM memory allocator except when shmget(2) and shmat(2) are called the SHM_SHARE_MMU flag to get "Intimate" shared memory.
The MMAP shared memory allocator uses mmap(2) from /dev/zero for memory that can not be detached and from a file that it creates in the directory given by the EXPERT_MMAP_FILE_DIRECTORY option for memory that can be detached. If not using /dev/zero the file is immediately unlinked so unless you know where to look you will never see it, but it will use up space.
When Diskomizer needs to allocate a chunk of shared memory it searches the mapped files for a chunk of memory that is large enough and if there is enough space it uses that. If there is not enough space in the existing files it will ftruncate(3c) the last file that was created to be (100 * 1024 * sysconf(_SC_PAGESIZE)) bigger and continues doing this until there is enough space or the file reaches it's maximum size.
At run time when Diskomizer needs to attach a chunk of shared memory that is not currently mapped it finds the file and offset for that memory and then uses mmap(2) to map the pages relating to that memory. Unlike the SHM and ISM memory allocators it only maps the memory that it needs and not the whole file. If the mmap fails then it unmaps the least recently used area of memory that is free, then tries the mmap again, it repeats this until either all the memory mappings for free memory have been removed or the mmap of the new segment has succeeded.
The BEST_SHM allocator is a derived allocator that uses the ISM and SHM allocators to allocate and attach to memory, trying the ISM allocator first and if that fails with ENOMEM tries SHM before attempting to detach any shared memory.
The BEST allocator is a derived allocator which used the BEST_SHM and MMAP allocators to allocate and attach to shared memory. When it is unable to attach a chunk of memory it first tries detaching memory segments of the same type as the one that it is trying to attach, before detaching memory segments of the other type.
If all you want to do is exercise the disks then either leave the memory allocator to the default or use the MMAP allocator. If you wish to simulate the behaviour of an RBMS then you should use the ISM allocator, bearing in mind that you will have to configure the systems shared memory parameters in /etc/system and also pass the value of SHMINFO_SHMMAXs to Diskomizer so that Diskomizer knows what the maximum size of shared memory segment that it creates is.
Every buffer that is written o the device or file has a unique buffer header that contains information required for Diskomizer to track errors. The information stored in the header is as follows:
So that the same data is not written to the same part of the disk over and over again there are actually 2 types of buffer headers, type 'A' and type 'B'. Type 'A' headers have the 64 bit value 0xAAAAAAAAAAAAAAAA as the first 8 bytes before and after the header. Type 'B' has the 64 bit value 0x5555555555555555. The definitions, offsets and sizes of the various elements are printed out when Diskomizer starts and also above each entry that is written to a diffs file.
Diskomizer's sequential data is a sequence of 251 elements starting from 0, 1, 2, 3 or 4. Each sequence is used in turn. So there a five sequences 0-250, 1-251, 2-252, 3,-253, 4-254 and each of these sequences will start at a different offset within a 256 byte block, The first sequence will start at offset 0 then next at 251 and the next at 502 etc. So even though the data is sequential it repeats very rarely and from different byte offsets each time. The whole pattern only repeats every 305005 bytes rather than every 256 bytes that you would get with the simpler pattern.
The reverse sequential pattern is the same but in reverse, the sequence counts down from 255, 254, 253, 252, or 251.
The random pattern is as random as lrand() can give.
Support for the ISI and CJT killer patterns which are designed to cause fibre channel communications to have difficulties.
You can also supply a binary file containing a pattern which will be loaded as many times as needed to fill the buffer by using the USERPAT option and EXPERT_USERPAT_FILE option to specify the file.
Prior to reads being submitted; Diskomizer initializes the buffer into which it is reading to make any failure to copy data easier to detect. The pattern used is controlled by the options READ_BUFFER_INIT and READ_BUFFER_SUPPLIED_VALUE. The default is to repeat the 32-bit pattern 0xfeedbede over the whole read buffer
During start up, Diskomizer writes a unique identifier to the same known block on each device after first zeroing that block. It then reads the block back via all the paths to each device and verifies that they match. This will find errors in the configuration where two paths to the same device are specified as separate devices. It will not however find situations where you have over lapping partitions.
Once it begins to do "random" I/O, it can cluster the blocks being written to simulate read ahead, and writes of sequential disk blocks. The size of these I/O clusters are controlled by the options EXPERT_READ_CLUSTER_LENGTH and EXPERT_WRITE_CLUSTER_LENGTH
Diskomizer can idle devices for periods of time. This can allow devices that do "house keeping" when idle to start doing this. These delays are controlled by the options EXPERT_MAX_ACTIVE_TIME, EXPERT_MIN_ACTIVE_TIME. EXPERT_MAX_IDLE_TIME and EXPERT_MIN_IDLE_TIME. These same options can also be used to make Diskomizer only load drives during non peak times.
The implementation of this feature is a state machine with four states:
If the option O_RDONLY is set, then Diskomizer will open all the devices and files read only. In this mode no data is ever written to the devices, so this test is non-destructive. In this mode, rather confusingly, the write threads do not write any data, but instead read blocks and note their checksum so the subsequent reads can verify that the check sums are unchanged. If there are differences the error is reported but no diff file is created as the old data is not available to produce the diff.
Diskomizer has a large number of different options, most of which need never be changed. To help the user the options are grouped into four different types:
If an option is supplied that is not understood by Diskomizer or one of the shared objects Diskomizer is using then this is treated as a fatal error.
There are 2 things that limit the number of I/O's that Diskomizer can keep in the kernel, CPU and memory.
With current modern hardware in 2009 unless you have a lot of short stroked drives being tested you will run out of memory before you run out of CPU.
Diskomizer knows how to read EFI and “plus 1tb vtoc” labels on devices and therefore can be used on devices greater than 1TB in size.
It can also read the new “plus1tb vtoc”.
By default if your domainname as returned by the domainname(1M) command ends with “.sun.com” Diskomizer will send usage tracking data back via email. The data is also written into a file in the current working directory called: “usage_tracking.xml”. No personal data is sent.
Terms of Use
|
Privacy
|
Trademarks
|
Copyright Policy
|
Site Guidelines
|
Site Map
|
Help
Your use of this web site or any of its content or software indicates your agreement to be bound by these Terms of Use.
© 2012, Oracle Corporation and/or its affiliates.