On Fri, Apr 15, 2011 at 11:43:00AM +0800, Wu Fengguang wrote: > On Fri, Apr 15, 2011 at 02:16:09AM +0800, Jan Kara wrote: > > On Thu 14-04-11 23:14:25, Wu Fengguang wrote: > > > On Thu, Apr 14, 2011 at 08:23:02AM +0800, Wu Fengguang wrote: > > > > On Thu, Apr 14, 2011 at 07:52:11AM +0800, Dave Chinner wrote: > > > > > On Thu, Apr 14, 2011 at 07:31:22AM +0800, Wu Fengguang wrote: > > > > > > On Thu, Apr 14, 2011 at 06:04:44AM +0800, Jan Kara wrote: > > > > > > > On Wed 13-04-11 16:59:41, Wu Fengguang wrote: > > > > > > > > Reduce the dampening for the control system, yielding faster > > > > > > > > convergence. The change is a bit conservative, as smaller values may > > > > > > > > lead to noticeable bdi threshold fluctuates in low memory JBOD setup. > > > > > > > > > > > > > > > > CC: Peter Zijlstra > > > > > > > > CC: Richard Kennedy > > > > > > > > Signed-off-by: Wu Fengguang > > > > > > > Well, I have nothing against this change as such but what I don't like is > > > > > > > that it just changes magical +2 for similarly magical +0. It's clear that > > > > > > > > > > > > The patch tends to make the rampup time a bit more reasonable for > > > > > > common desktops. From 100s to 25s (see below). > > > > > > > > > > > > > this will lead to more rapid updates of proportions of bdi's share of > > > > > > > writeback and thread's share of dirtying but why +0? Why not +1 or -1? So > > > > > > > > > > > > Yes, it will especially be a problem on _small memory_ JBOD setups. > > > > > > Richard actually has requested for a much radical change (decrease by > > > > > > 6) but that looks too much. > > > > > > > > > > > > My team has a 12-disk JBOD with only 6G memory. The memory is pretty > > > > > > small as a server, but it's a real setup and serves well as the > > > > > > reference minimal setup that Linux should be able to run well on. > > > > > > > > > > FWIW, linux runs on a lot of low power NAS boxes with jbod and/or > > > > > raid setups that have <= 1GB of RAM (many of them run XFS), so even > > > > > your setup could be considered large by a significant fraction of > > > > > the storage world. Hence you need to be careful of optimising for > > > > > what you think is a "normal" server, because there simply isn't such > > > > > a thing.... > > > > > > > > Good point! This patch is likely to hurt a loaded 1GB 4-disk NAS box... > > > > I'll test the setup. > > > > > > Just did a comparison of the IO-less patches' performance with and > > > without this patch. I hardly notice any differences besides some more > > > bdi goal fluctuations in the attached graphs. The write throughput is > > > a bit large with this patch (80MB/s vs 76MB/s), however the delta is > > > within the even larger stddev range (20MB/s). > > Thanks for the test but I cannot find out from the numbers you provided > > how much did the per-bdi thresholds fluctuate in this low memory NAS case? > > You can gather current bdi threshold from /sys/kernel/debug/bdi//stats > > so it shouldn't be hard to get the numbers... > > Hi Jan, attached are your results w/o this patch. The "bdi goal" (gray > line) is calculated as (bdi_thresh - bdi_thresh/8) and is fluctuating > all over the place.. and average wkB/s is only 49MB/s.. I got the numbers for vanilla kernel: XFS can do 57MB/s and 63MB/s in the two runs. There are large fluctuations in the attached graphs, too. To summary it up, for a 1GB mem, 4 disks JBOD setup, running 1 dd per disk: vanilla: 57MB/s, 63MB/s Jan: 49MB/s, 103MB/s Wu: 76MB/s, 80MB/s The balance_dirty_pages-task-bw-jan.png and balance_dirty_pages-pages-jan.png shows very unfair allocation of dirty pages and throughput among the disks... Thanks, Fengguang --- wfg ~/bee% cat xfs-1dd-1M-16p-5907M-3:2-2.6.39-rc3+-2011-04-15.19:21/iostat-avg avg-cpu: %user %nice %system %iowait %steal %idle sum 13.160 0.000 541.130 3124.560 0.000 9521.180 avg 0.100 0.000 4.099 23.671 0.000 72.130 stddev 0.042 0.000 0.846 4.861 0.000 5.333 Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util sum 0.000 313.900 0.000 16712.530 0.000 7638856.310 120985.110 3810.910 28021.330 1160.500 11176.200 avg 0.000 2.378 0.000 126.610 0.000 57870.124 916.554 28.871 212.283 8.792 84.668 stddev 0.000 9.024 0.000 67.243 0.000 30510.769 13.233 23.185 81.820 4.733 14.401 wfg ~/bee% cat xfs-1dd-1M-16p-5907M-3:2-2.6.39-rc3+-2011-04-15.19:37/iostat-avg avg-cpu: %user %nice %system %iowait %steal %idle sum 11.790 0.000 542.390 3083.790 0.000 9662.000 avg 0.089 0.000 4.078 23.186 0.000 72.647 stddev 0.039 0.000 0.841 4.519 0.000 4.941 Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util sum 0.000 761.000 0.000 18539.730 0.000 8472202.900 121988.670 4603.610 30292.830 1069.430 11576.810 avg 0.000 5.722 0.000 139.396 0.000 63700.774 917.208 34.614 227.766 8.041 87.044 stddev 0.000 20.908 0.000 69.502 0.000 31489.429 11.816 24.401 89.685 4.888 14.403 wfg ~/bee% cat xfs-1dd-1M-16p-5907M-3:2-2.6.39-rc3-jan-bdp+-2011-04-15.22:13/iostat-avg avg-cpu: %user %nice %system %iowait %steal %idle sum 1.850 0.000 191.500 3328.520 0.000 8878.190 avg 0.015 0.000 1.544 26.843 0.000 71.598 stddev 0.029 0.000 0.453 6.259 0.000 6.594 Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util sum 0.000 6.100 0.000 28236.660 0.000 12856161.510 112916.910 15936.450 69787.540 545.460 12377.740 avg 0.000 0.049 0.000 227.715 0.000 103678.722 910.620 128.520 562.803 4.399 99.820 stddev 0.000 0.215 0.000 13.069 0.000 5923.547 2.644 33.910 158.911 0.275 1.385