From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wg0-f48.google.com (mail-wg0-f48.google.com [74.125.82.48]) by kanga.kvack.org (Postfix) with ESMTP id 5E2826B00B8 for ; Sat, 15 Nov 2014 11:32:37 -0500 (EST) Received: by mail-wg0-f48.google.com with SMTP id a1so3509378wgh.7 for ; Sat, 15 Nov 2014 08:32:36 -0800 (PST) Received: from mx2.suse.de (cantor2.suse.de. [195.135.220.15]) by mx.google.com with ESMTPS id az10si53805654wjb.127.2014.11.15.08.32.36 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Sat, 15 Nov 2014 08:32:36 -0800 (PST) Message-ID: <54678020.9090402@suse.cz> Date: Sat, 15 Nov 2014 17:32:32 +0100 From: Vlastimil Babka MIME-Version: 1.0 Subject: Re: isolate_freepages_block and excessive CPU usage by OSD process References: In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Andrey Korolyov , "ceph-users@lists.ceph.com" Cc: riel@redhat.com, Mark Nelson , linux-mm@kvack.org, David Rientjes , Joonsoo Kim On 11/15/2014 12:48 PM, Andrey Korolyov wrote: > Hello, > > I had found recently that the OSD daemons under certain conditions > (moderate vm pressure, moderate I/O, slightly altered vm settings) can > go into loop involving isolate_freepages and effectively hit Ceph > cluster performance. I found this thread Do you feel it is a regression, compared to some older kernel version or something? > https://lkml.org/lkml/2012/6/27/545, but looks like that the > significant decrease of bdi max_ratio did not helped even for a bit. > Although I have approximately a half of physical memory for cache-like > stuff, the problem with mm persists, so I would like to try > suggestions from the other people. In current testing iteration I had > decreased vfs_cache_pressure to 10 and raised vm_dirty_ratio and > background ratio to 15 and 10 correspondingly (because default values > are too spiky for mine workloads). The host kernel is a linux-stable > 3.10. Well I'm glad to hear it's not 3.18-rc3 this time. But I would recommend trying it, or at least 3.17. Lot of patches went to reduce compaction overhead for (especially for transparent hugepages) since 3.10. > Non-default VM settings are: > vm.swappiness = 5 > vm.dirty_ratio=10 > vm.dirty_background_ratio=5 > bdi_max_ratio was 100%, right now 20%, at a glance it looks like the > situation worsened, because unstable OSD host cause domino-like effect > on other hosts, which are starting to flap too and only cache flush > via drop_caches is helping. > > Unfortunately there are no slab info from "exhausted" state due to > sporadic nature of this bug, will try to catch next time. > > slabtop (normal state): > Active / Total Objects (% used) : 8675843 / 8965833 (96.8%) > Active / Total Slabs (% used) : 224858 / 224858 (100.0%) > Active / Total Caches (% used) : 86 / 132 (65.2%) > Active / Total Size (% used) : 1152171.37K / 1253116.37K (91.9%) > Minimum / Average / Maximum Object : 0.01K / 0.14K / 15.75K > > OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME > 6890130 6889185 99% 0.10K 176670 39 706680K buffer_head > 751232 721707 96% 0.06K 11738 64 46952K kmalloc-64 > 251636 226228 89% 0.55K 8987 28 143792K radix_tree_node > 121696 45710 37% 0.25K 3803 32 30424K kmalloc-256 > 113022 80618 71% 0.19K 2691 42 21528K dentry > 112672 35160 31% 0.50K 3521 32 56336K kmalloc-512 > 73136 72800 99% 0.07K 1306 56 5224K Acpi-ParseExt > 61696 58644 95% 0.02K 241 256 964K kmalloc-16 > 54348 36649 67% 0.38K 1294 42 20704K ip6_dst_cache > 53136 51787 97% 0.11K 1476 36 5904K sysfs_dir_cache > 51200 50724 99% 0.03K 400 128 1600K kmalloc-32 > 49120 46105 93% 1.00K 1535 32 49120K xfs_inode > 30702 30702 100% 0.04K 301 102 1204K Acpi-Namespace > 28224 25742 91% 0.12K 882 32 3528K kmalloc-128 > 28028 22691 80% 0.18K 637 44 5096K vm_area_struct > 28008 28008 100% 0.22K 778 36 6224K xfs_ili > 18944 18944 100% 0.01K 37 512 148K kmalloc-8 > 16576 15154 91% 0.06K 259 64 1036K anon_vma > 16475 14200 86% 0.16K 659 25 2636K sigqueue > > zoneinfo (normal state, attached) > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org