From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3200CC433F5 for ; Wed, 11 May 2022 22:57:14 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 897676B0073; Wed, 11 May 2022 18:57:13 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 848206B0075; Wed, 11 May 2022 18:57:13 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 70F976B0078; Wed, 11 May 2022 18:57:13 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 62C016B0073 for ; Wed, 11 May 2022 18:57:13 -0400 (EDT) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay13.hostedemail.com (Postfix) with ESMTP id 2A0D661B64 for ; Wed, 11 May 2022 22:57:13 +0000 (UTC) X-FDA: 79454974746.01.1AFCFC4 Received: from mail-pf1-f173.google.com (mail-pf1-f173.google.com [209.85.210.173]) by imf12.hostedemail.com (Postfix) with ESMTP id EE8D940098 for ; Wed, 11 May 2022 22:56:49 +0000 (UTC) Received: by mail-pf1-f173.google.com with SMTP id d25so3192062pfo.10 for ; Wed, 11 May 2022 15:57:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=9yXJvnjx/o79AbzLsEFrXFeMDWpjXjlNcwmtm1szoOU=; b=UtGQ4/CAfCeH/O9+csNNNkaE14rK/X03RFdYAGSClArx3h8ybbDCwBrTlLL9m5+qBy McW35EiTRjN1GCHXyXs6Mfi19vXnTclGUF+rgOa7Xd7C2XUaq0u5BNRJM+CVoZHJ1Aph YHDZV08sIpsY4TPCN+WyF5vxtQRHHF7YXxHdCcZwbufHz7kUbkFxyGU2+b3h/pcYG5+g 27GeVPW5vjpgYBBRS4mujL/RHvGwP8JF2z2BtfCQus4NB7wr8fGsL1xsuiKGyvsgoqkV lFq2pHsfJcvjg2hfb6WpWH+Oc2JeXGAiQanmTn2q2fNs2jRLfLjqI5erFiMoc34kVMNj iSPg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:sender:date:from:to:cc:subject:message-id :references:mime-version:content-disposition:in-reply-to; bh=9yXJvnjx/o79AbzLsEFrXFeMDWpjXjlNcwmtm1szoOU=; b=LpVa8ySu8n9Lh1tNZJEugh4anfIETxt8wWoPAK+YJRd6RUF34bIkX4lF9N4wkNJo7G 07EmEfZrkhdKpp5uvUNjb70rQys9DPy1dDU9C6fPK5zG0hPkbb94731oCQAGskc9l2qW D1mh8+pDsl5Pk4jJk0gjYTsF06N30gLA6f80I06SSf3nvMYdDUac7y7lYFdTMy9+5TKR ZuAap4SGXmFBYu1j/3kfKkTj5dp1sL8h4UVooN+o6YW8hNTkwujbuyzt4+PDL0LtBoqu i1ccHDRfz9Ly/2SBVkghVPLGPhL61rvweR6N0imp+yf0nEkcpaJKT+MxJjAys4oplRdk HBVw== X-Gm-Message-State: AOAM533WL2fOvFiJN88XqS1MC0GlyrzU8IFsPtpXi62Zg+hsOd7SCVwq k2/rR8Iy06DuGtn2lMNDWgk= X-Google-Smtp-Source: ABdhPJzJ2E3xWrJdZzusj/3ISttLYh2ci3j4CSdYw216rzo0CfnJ9NObxZixnT6MI/8vy0yNNDxfgQ== X-Received: by 2002:a65:6e88:0:b0:382:3851:50c8 with SMTP id bm8-20020a656e88000000b00382385150c8mr23132564pgb.270.1652309831741; Wed, 11 May 2022 15:57:11 -0700 (PDT) Received: from google.com ([2620:15c:211:201:69ef:9c87:7816:4f74]) by smtp.gmail.com with ESMTPSA id e3-20020aa78c43000000b0050dc7628182sm2322162pfd.92.2022.05.11.15.57.10 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 11 May 2022 15:57:11 -0700 (PDT) Date: Wed, 11 May 2022 15:57:09 -0700 From: Minchan Kim To: Andrew Morton Cc: LKML , linux-mm , Suren Baghdasaryan , Michal Hocko , John Dias , Tim Murray , Matthew Wilcox , Vladimir Davydov , Martin Liu , Johannes Weiner Subject: Re: [PATCH v4] mm: don't be stuck to rmap lock on reclaim path Message-ID: References: <20220510215423.164547-1-minchan@kernel.org> <20220511153349.045ab3865f25920dce11ca16@linux-foundation.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20220511153349.045ab3865f25920dce11ca16@linux-foundation.org> X-Rspamd-Queue-Id: EE8D940098 X-Stat-Signature: e4wrz8h1f6kursrhih1148as41hp9ywm Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b="UtGQ4/CA"; dmarc=fail reason="SPF not aligned (relaxed), DKIM not aligned (relaxed)" header.from=kernel.org (policy=none); spf=pass (imf12.hostedemail.com: domain of minchan.kim@gmail.com designates 209.85.210.173 as permitted sender) smtp.mailfrom=minchan.kim@gmail.com X-Rspam-User: X-Rspamd-Server: rspam08 X-HE-Tag: 1652309809-363908 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, May 11, 2022 at 03:33:49PM -0700, Andrew Morton wrote: > On Tue, 10 May 2022 14:54:23 -0700 Minchan Kim wrote: > > > The rmap locks(i_mmap_rwsem and anon_vma->root->rwsem) could be > > contended under memory pressure if processes keep working on > > their vmas(e.g., fork, mmap, munmap). It makes reclaim path > > stuck. In our real workload traces, we see kswapd is waiting the > > lock for 300ms+(worst case, a sec) and it makes other processes > > entering direct reclaim, which were also stuck on the lock. > > > > This patch makes lru aging path try_lock mode like shink_page_list > > so the reclaim context will keep working with next lru pages > > without being stuck. if it found the rmap lock contended, it rotates > > the page back to head of lru in both active/inactive lrus to make > > them consistent behavior, which is basic starting point rather than > > adding more heristic. > > > > Since this patch introduces a new "contended" field as out-param > > along with try_lock in-param in rmap_walk_control, it's not > > immutable any longer if the try_lock is set so remove const > > keywords on rmap related functions. Since rmap walking is already > > expensive operation, I doubt the const would help sizable benefit( > > And we didn't have it until 5.17). > > > > In a heavy app workload in Android, trace shows following statistics. > > It almost removes rmap lock contention from reclaim path. > > What might be the worst-case failure modes using this approach? > > Could we burn much CPU time pointlessly churning though the LRU? Could > it mess up aging decisions enough to be performance-affecting in any > workload? Yes, correct. However, we are already churning LRUs by several ways. For example, isolate and putback from LRU list for page migration from several sources(typical example is compaction) and trylock_page and sc->gfp_mask not allowing page to be reclaimed in shrink_page_list. > > Something else? One thing I am worry about was the granularity of the churning. Example above was page granuarity churning so might be execuse but this one is address space's churning, especically for file LRU (i_mmap_rwsem) which might cause too many rotating and live-lock in the end(keey rotating in small LRU with heavy memory pressure). If it could be a problem, maybe we use sc->priority to stop the skipping on a certain level of memory pressure. Any thought? Do we really need it?