From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id CD578C4338F for ; Mon, 9 Aug 2021 22:38:08 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 5C98D60EB9 for ; Mon, 9 Aug 2021 22:38:08 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 5C98D60EB9 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id C8A008D0001; Mon, 9 Aug 2021 18:38:07 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C3AA36B0072; Mon, 9 Aug 2021 18:38:07 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B281E8D0001; Mon, 9 Aug 2021 18:38:07 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0116.hostedemail.com [216.40.44.116]) by kanga.kvack.org (Postfix) with ESMTP id 981A96B0071 for ; Mon, 9 Aug 2021 18:38:07 -0400 (EDT) Received: from smtpin36.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 3260018255B3B for ; Mon, 9 Aug 2021 22:38:07 +0000 (UTC) X-FDA: 78457006614.36.7A9A4A3 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [216.205.24.124]) by imf17.hostedemail.com (Postfix) with ESMTP id A1AE7F000785 for ; Mon, 9 Aug 2021 22:38:06 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1628548686; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding; bh=3phZ5xNWaqsuNurqRPYHiavKgHhNvHzLSi+t2Ovfg4k=; b=Aq7zhhuj313tmH09xWoHEDbtS7uQuE/olvGuQSQK64/Jgwt1BkVDZruVwM0VpziTb/pFnM N0fULw3I53+HOdFFScwedm/lrVpSamZ3hA/ZGntThV7+HR5FPOb9JCa98F77/GItzK80qh 3HpgufyVxYcLVThAfX5amb+aEXWv1EI= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-207-33a9sAwKPkKr6nNp4yEzkQ-1; Mon, 09 Aug 2021 18:38:04 -0400 X-MC-Unique: 33a9sAwKPkKr6nNp4yEzkQ-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 639A1760C9; Mon, 9 Aug 2021 22:38:02 +0000 (UTC) Received: from fedora.redhat.com (unknown [10.22.9.157]) by smtp.corp.redhat.com (Postfix) with ESMTP id 1142660BF1; Mon, 9 Aug 2021 22:37:56 +0000 (UTC) From: Nico Pache To: linux-mm@kvack.org, akpm@linux-foundation.org, linux-kernel@vger.kernel.org Cc: hannes@cmpxchg.org, npache@redhat.com, aquini@redhat.com, shakeelb@google.com, llong@redhat.com, mhocko@suse.com, hakavlad@inbox.lv Subject: [PATCH v3] vm_swappiness=0 should still try to avoid swapping anon memory Date: Mon, 9 Aug 2021 18:37:40 -0400 Message-Id: <20210809223740.59009-1-npache@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: A1AE7F000785 Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=Aq7zhhuj; dmarc=pass (policy=none) header.from=redhat.com; spf=none (imf17.hostedemail.com: domain of npache@redhat.com has no SPF policy when checking 216.205.24.124) smtp.mailfrom=npache@redhat.com X-Stat-Signature: uycfc3wjnpoxddwe79grhbogdaowqnwh X-HE-Tag: 1628548686-15376 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Since commit 170b04b7ae49 ("mm/workingset: prepare the workingset detecti= on infrastructure for anon LRU") and commit b91ac374346b ("mm: vmscan: enfor= ce inactive:active ratio at the reclaim root") swappiness can start prematur= ely swapping anon memory. This is due to the assumption that refaulting anon = should always allow the shrinker to target anon memory. Add a check for swappine= ss being >0 before indiscriminately targeting Anon. Before these commits when a user had swappiness=3D0 anon memory would rarely get swapped; this behavior has remained constant sense RHEL5. This commit keeps that behavi= or intact and prevents the new workingset refaulting from challenging the an= on memory when swappiness=3D0. Anon can still be swapped to prevent OOM. This does not completely disabl= e swapping, but rather tames the refaulting aspect of the code that allows = for the deactivating of anon memory. We have two customer workloads that discovered this issue: 1) A VM claiming 95% of the hosts memory followed by file reads (never di= rty) which begins to challenge the anon. Refaulting the anon working set wi= ll then cause the indiscriminant swapping of the anon. 2) A VM running a in-memory DB is being populated from file reads. Swappiness is set to 0 or 1 to defer write I/O as much as possible. On= ce the customer experienced low memory, swapping anon starts, with little-to-no PageCache being swapped. Previously the file cache would account for almost all of the memory reclaimed and reads would throttle. Although the two LRU changes mentione= d allow for less thrashing of file cache, customers would like to be able t= o keep the swappiness=3D0 behavior that has been present in the kernel for a lon= g time. A similar solution may be possible in get_scan_count(), which determines = the reclaim pressure for each LRU; however I believe that kind of solution ma= y be too aggressive, and will not allow other parts of the code (like direct r= eclaim) from targeting the active_anon list. This way we stop the problem at the = heart of what is causing the issue, with the least amount of interference in ot= her code paths. Furthermore, shrink_lruvec can modify the reclaim pressure of= each LRU, which may make the get_scan_count solution even trickier. Changelog: -V3: * Blame the right commit and be more descriptive in my log message. * inactive_is_low should remain independent from the new swappiness c= heck. * Change how we get the swappiness value. Shrink_node can be called w= ith a null target_mem_cgroup so we should depend on the target_lruvec to = do the null check on memcg. -V2: * made this mem_cgroup specific so now it will work with v1, v2, and no cgroups. * I've also touched up my commit log. Signed-off-by: Nico Pache --- mm/vmscan.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index 4620df62f0ff..9f2420da4037 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -2883,8 +2883,12 @@ static void shrink_node(pg_data_t *pgdat, struct s= can_control *sc) struct lruvec *target_lruvec; bool reclaimable =3D false; unsigned long file; + struct mem_cgroup *memcg; + int swappiness; =20 target_lruvec =3D mem_cgroup_lruvec(sc->target_mem_cgroup, pgdat); + memcg =3D lruvec_memcg(target_lruvec); + swappiness =3D mem_cgroup_swappiness(memcg); =20 again: memset(&sc->nr, 0, sizeof(sc->nr)); @@ -2909,7 +2913,7 @@ static void shrink_node(pg_data_t *pgdat, struct sc= an_control *sc) =20 refaults =3D lruvec_page_state(target_lruvec, WORKINGSET_ACTIVATE_ANON); - if (refaults !=3D target_lruvec->refaults[0] || + if ((swappiness && refaults !=3D target_lruvec->refaults[0]) || inactive_is_low(target_lruvec, LRU_INACTIVE_ANON)) sc->may_deactivate |=3D DEACTIVATE_ANON; else --=20 2.31.1