Posted on Thu, 03 May 2012 14:49:00 GMT
Author: Alexei Khalyako
Reviewers: Thomas Kejser, Marcel Franke (PM One); Kevin Cox; Erik Kraemer; Kun Cheng; Murshed Zaman; Chuck Heinzelman
During the recent ‘Proof of Concept’ we were testing mid-range Fast Track configuration. Fast Track configuration looked like: server with 32 cores, more than 200 GB of RAM and storage capacity about 40TB.
Before getting started looking at the SQL Server database performance, we obviously wanted to measure the maximum throughput of the storage and then to rely on it as the theoretical maximum of IO that system could deliver. According to the documentation of the hardware vendor the Maximum Consumption Rate is around 7,5 GB/sec, therefore in the real life we were expecting to see the IO in the range of 6-6,5GB/sec.
The very first runs of SQLIO brought quite surprising numbers: we couldn’t exceed 4,7 GB/sec.
| Comment: following SQLIO command was used for analysis | ECHO ------ Sequential read block size 256 thread on Concurrent -------------------- sqlio -kR -t10 -s%1 -fsequential -o1 –b512 -LS -FCreate_big_test_file_Param.txt >> .\TestCycle3\sqliotest_sr_t10_b256_all.txt timeout /T 60 | where: -kR - means that ‘READ’ workload is used -fsequential - workload is sequential -b512 - block size 256, which is quite typical size for the data warehouse type of workload Looking at the Performance Monitor, we also observed that instead of the nice flat line across all LUNs we have got absolutely uneven performance of the LUNs with the fluctuation up to 100 MB/sec between slowest and the fastest LUN. |

All those observations indicated that we have a problem, but what exactly was the reason for such the underperforming behavior? Since we tested only the throughput of the system storage using SQLIO, we may conclude that the problem is on the hardware or hardware configuration site.
What should we then examine? Following components typically impact the data throughput capabilities:

PCI-slots -> HBA-> Network to the Switch ->Switch -> Network to the storage controller-> storage controller.
First we checked if the HBAs are installed in the right PCI-E slots. In the past we observed couple of cases where HBAs were installed in the slots which were not able to deliver the throughput the HBA can consume.
Note: PCIe x1 v1.0 can deliver about 250 MB/sec. PCIe x4 v1.0 slot can deliver aggregated throughput around 1GB/sec. PCIe x1 v2.0 provides with double performance of the v1.0, which is about 500 MB/sec. Accordingly, PCIe x4 v2.0 could deliver up to 2 GB/sec. Therefore, in order to get anticipated throughput, it would be recommended then installing 8Gb dual port HBA into the PCIe x4. However, not all motherboard PCIe slots are equal. You need to read the fine-print for the motherboard spec for full details.
Even though current hardware typically has enough high throughput PCI-E slots, there are still a couple of the ‘slow’ slots on the motherboards and it is quite understandable if during assembling of the box people could overlook in which slot they put the HBA. So, checking this first could be the easy step to do and very fast to fix if this was a reason for the low throughput.
However, in our case the HBAs were ‘sitting’ in the right slots. What’s next then? Since cabling looked correct, we had to check the mapping of the HBAs to the LUNs.
Checking the mapping on our configuration we observed each port A of the each individual HBA was mapped to all ports A of available storage enclosures. Following picture may help to illustrate configuration.

However even according to the vendor there must be explicit mappings created, matching up a single storage port with a single HBA port on the server. So, one HBA port must be connected to only one storage enclosure, like illustrated on the following picture:

The diagram show only how the ‘Active’ MPIO paths were mapped. For failover you can setup two paths per HBA with the MPIO policy ― ‘Failover Only’‖. This will direct MPIO to use a single path only and failover to the second or secondary path when the first one fails.
With described above configuration, where we fixed mapping issue we’ve got throughput increased about 20% and brought it up from 4,7 GB/sec to 5,7 GB/sec. During the SQLIO runs all LUNs were showing much more attractive picture

Additional configuration settings change of the Read Ahead option on the storage from the value ‘Default ‘ to ‘ 32MB’ helped to raise performance to the ~6,7 GB/sec which was additional +14% gain comparable to that we have got from the re-mapping the HBA-Storage Enclosure configuration.

Conclusion: Reference architectures and appliances give us greatly balanced configurations which help to speed up Data Warehouse deployments and vendors give very clear guidance on how to set it up for better performance. However, the old know wisdom “Trust but verify” is still true and may help your setup look way better.
..
Details:
http://sqlcat.com/sqlCat/b/msdnmirror/archive/2012/05/03/fast-track-improving-performance-through-correct-lun-mapping-and-storage-enclosure-configuration.aspx