From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E09A9C77B71 for ; Fri, 21 Apr 2023 14:17:20 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 747586B0082; Fri, 21 Apr 2023 10:17:20 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 6F9126B0083; Fri, 21 Apr 2023 10:17:20 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5BE326B0085; Fri, 21 Apr 2023 10:17:20 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 499416B0082 for ; Fri, 21 Apr 2023 10:17:20 -0400 (EDT) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 0AB481C68A5 for ; Fri, 21 Apr 2023 14:17:20 +0000 (UTC) X-FDA: 80705600640.17.6ECC09B Received: from mail-qt1-f176.google.com (mail-qt1-f176.google.com [209.85.160.176]) by imf16.hostedemail.com (Postfix) with ESMTP id ED677180008 for ; Fri, 21 Apr 2023 14:17:17 +0000 (UTC) Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=cmpxchg-org.20221208.gappssmtp.com header.s=20221208 header.b=1MoarXjd; spf=pass (imf16.hostedemail.com: domain of hannes@cmpxchg.org designates 209.85.160.176 as permitted sender) smtp.mailfrom=hannes@cmpxchg.org; dmarc=pass (policy=none) header.from=cmpxchg.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1682086638; a=rsa-sha256; cv=none; b=qKndsGWaYP0ZmGMFqA+NOWPdiQxhyUraCxb/Id1pVOsvgfiL1pfUHAKt3nZDYsScCLufEb /+W6omxaxdFwtKZRWrAy4gqjbL7klxUTjzIYf1dM0C4fEiYziK/AFaoEUe89DyT15XgDOz 8NPiTRm+JtMlFgqvbglQ6xdPf/H0C4Y= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=pass header.d=cmpxchg-org.20221208.gappssmtp.com header.s=20221208 header.b=1MoarXjd; spf=pass (imf16.hostedemail.com: domain of hannes@cmpxchg.org designates 209.85.160.176 as permitted sender) smtp.mailfrom=hannes@cmpxchg.org; dmarc=pass (policy=none) header.from=cmpxchg.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1682086638; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=ijNYMVjlbJrZCaV78UXYGLc3DOZur5hhO7MmLiMh8Og=; b=h2nO8gZImuLnRP+jjqwBr368Vmj1HohEBicNeuMpTgS0kORtA+scIkdRLSWnEAgUjoB9wT 4t5OYgpAouFTFn7+zEFEBsNOz2HYaCSGojlLeJHCEZ7dpLV10C80tZZ9dJhpMGJDC0YmWM S/+KXAnp+3WVnnmePkb1aZOjx0zJlmE= Received: by mail-qt1-f176.google.com with SMTP id d75a77b69052e-3e6aa05714bso20398561cf.0 for ; Fri, 21 Apr 2023 07:17:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg-org.20221208.gappssmtp.com; s=20221208; t=1682086637; x=1684678637; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=ijNYMVjlbJrZCaV78UXYGLc3DOZur5hhO7MmLiMh8Og=; b=1MoarXjdhobUU86WqZanZo18cI5xugUCVCqCEndho+J97fcXVGnhQG4rpqItWwBKRE pnu+PQ1Q+g5aCGhrqqbN5imC4WEXG17/s9M9M1lNc9awQfYxSMpHDXHoj9hCKtfITqpY E8RzhCarbtSuTtlLFKq54rQmN+WoH2JcWT3Ys7slP0oU4QXGXYvCMTTsmlXH/reFlB4Y S7IR/cWPx9kH+LvSEeM6TfB/5lq2Ep+r02xYaxh4H5g6mTmSBmlfg5KpOt1ql8xpyAYy Pq6HQhvtSGzjNcjlGkkQhm5MjHI0Ik8kd/LO8Wo5fdGOm1SPO0R/ZKjEhr98JR8g9uFa 0nCA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1682086637; x=1684678637; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=ijNYMVjlbJrZCaV78UXYGLc3DOZur5hhO7MmLiMh8Og=; b=QqXsFomt+66nHZ/XSNyp51XoGCI3GK/6KDTYbLY6L9IOpUFjQilRIRHdFf8hUjG7yP f+BG9lYLPTCjMdiKmHfFCCFvmuzRPWQYfNgTN2+MAzV6GsuDux1uqJZkce40+rsl6lrq dauMdT0fV4ze51+a7UeaLWycwD5H4beSdbqXxpB/ZQQGP3qjDxo0zQOhMX8ok7oW0gYf map9Vz0nUuN8WrkV5/E0e7u7FyQyjuciyGvvxqgwMJyItu4e2otUMMRykUhe5I1ShBEo hZXTQeZcbRPQtbwiUuuVC2PDPL7lx1yaH6GScBr2/olqCpC34LsmzX0yhOAWpsLGCb6S DKZw== X-Gm-Message-State: AAQBX9czvxi6KaWW3/y0WPb2R7uxf03BPNhdBrNN5m1e37fBy0pn28ZZ C3EcFWiGOviFJYsqmBf38OOTyg== X-Google-Smtp-Source: AKy350YR+XIBm2Qr27pDAHIa3M+HB+PNKY0FDb8TOcQDtECfUzFoFsx0E+crk5sbOnQvyLyT3L1deg== X-Received: by 2002:ac8:5a15:0:b0:3e1:90e4:c20 with SMTP id n21-20020ac85a15000000b003e190e40c20mr9052657qta.66.1682086636949; Fri, 21 Apr 2023 07:17:16 -0700 (PDT) Received: from localhost ([2620:10d:c091:400::5:6f0d]) by smtp.gmail.com with ESMTPSA id l13-20020a05622a050d00b003ef38277445sm1354605qtx.16.2023.04.21.07.17.15 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 21 Apr 2023 07:17:16 -0700 (PDT) Date: Fri, 21 Apr 2023 10:17:15 -0400 From: Johannes Weiner To: Mel Gorman Cc: linux-mm@kvack.org, Kaiyang Zhao , Vlastimil Babka , David Rientjes , linux-kernel@vger.kernel.org, kernel-team@fb.com Subject: Re: [RFC PATCH 02/26] mm: compaction: avoid GFP_NOFS deadlocks Message-ID: <20230421141715.GA320347@cmpxchg.org> References: <20230418191313.268131-1-hannes@cmpxchg.org> <20230418191313.268131-3-hannes@cmpxchg.org> <20230421122743.d7xfvzyhiunbphh3@techsingularity.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20230421122743.d7xfvzyhiunbphh3@techsingularity.net> X-Rspam-User: X-Rspamd-Queue-Id: ED677180008 X-Rspamd-Server: rspam01 X-Stat-Signature: 7gw31kbe45x9eynq4erq3x75bu3rh3kc X-HE-Tag: 1682086637-334444 X-HE-Meta: U2FsdGVkX18XMJwHT5VdZRShdFMtT0Wn5qEraoaaNWlasA9geMX2YjOT7iNaI2WtddNcaOoGGZ/6dUmPyIxiOsTR1j9PEvkPBXfvhzQF0rh00q51NO69OvCC6jVQkJOu+/DdnHZkCyuolCg91tQdsBwBTyS/4pN8x3qG1F6ddLM5ylzO26Eo7FM1LcPa7i+TDlQaCtpHDHJ2zbcU+EJZ2zLXUgC3VFzfBD38LLKO4jEbcIq8n3dpNEWx4WEhW/E4xdlmRxFTy8tG+PZyVLhjUeo1S4vphiIv/rlyWwL8IhBQdZG0+QSEdJNiqMRkoAlBaHBi9jGCdBcDe6ph6eG3z2+IZ2x/Jmubln+p85HaVC5qR/h9+tS+VYboI5PquCbNbhwPhLZGe1Ryh0iHd9p1G2bEmwDnB6SXTYJdkLhHFEKl0J+2S1PfThsMnqg/q+vVV/KJWW+TXLYCF9eF4zRxwlCkED42JA641PNLTby11Z1GdO0tvV0k2dmfUJ5HKS7Nl/hreV3QapNlfeFMKaHVdGTpj4EP+S7B1AQMy+CAf858iVfTRwzdVY8Qihn7B6WGRldrELisZMS/c4SLqfLL9uNJyj8cA19O28lsQ8DvrUGnTVIUYRlNLyXnw9OCEkcAkBLBqONDgx+OQolEUjKtJqjlIUtuK6dZAGOCBais5ILPeApAAMSYZwtuSN3WhYuYLLYebVwOwJ5iZBxq08beSvHDcIL5joJSxuEjNhfS7MTO4zfSKCkqZGN1+U82jBj0QdD6kB4bNd4Mkaz8cfGogZP6WJO0phodrV7VPlmqN6A8Y3x3WB2cyMRFEyPvM+eXH/pFylQZ6hglA+xQf12j6h8liGEhLXMNAuV6aB4qBwzkTKFBVlIrlts/0FqjFkAVr4PUXwZyyRI6LRqT5LacTv6E4WmfYEP9IfxNSveUfy2ftwMNlv4EvxcK0WgDgeu8B8bnrfNaCrjjCSfMEKL mO9awtho 7wSczv3up+AP4SAoY9l68fcjB9ZZzYS6oiDInM5Dgcm2n9L7/XmxQ9tRhnNHY62ekHNyPQM6GHslXpPy1FFuw9HPy4TwvFbWFyJ1xtUKcdHxDOpELJ4II8wsiZxrmTWlBt0TY/+RWiKtyvh1EedivHoun5nP4cOJVWv0AMTcrFq/T00OSAkleU2D7Um5352OROFaRDqv3IYi/RvDW3wW4Dvv/c5i8Mv3APYfFAYzbhML66VU/rE16on0OoPmNrOui7SKpuX72nX7r2hroAmVr2Kd7Wv8N6wnQKwqxvEcBYX4gbsdyDSOWwjKXSWYJf9v/XBO9tqEEhDH55sRH1+1YpmPJtoU+etezz5BrCxQ1w5zwOKKWaPsMV2I+xQyCxU2BV+XmeuNewczyt6k6iSVSA6/LqgRYb+eY6P+GeUXfuUh7iT65E8sACsAK1LfusgK5oX93M01Nt6qwWFCv/oYm6/xxjQRNrSH8CrrrB8KKV3Vl1AJZkDDxO/V5sLTlmbZNKghBlaFtsVOw2ZuvPalrwShGwQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Fri, Apr 21, 2023 at 01:27:43PM +0100, Mel Gorman wrote: > On Tue, Apr 18, 2023 at 03:12:49PM -0400, Johannes Weiner wrote: > > During stress testing, two deadlock scenarios were observed: > > > > 1. One GFP_NOFS allocation was sleeping on too_many_isolated(), and > > all CPUs were busy with compactors that appeared to be spinning on > > buffer locks. > > > > Give GFP_NOFS compactors additional isolation headroom, the same > > way we do during reclaim, to eliminate this deadlock scenario. > > > > 2. In a more pernicious scenario, the GFP_NOFS allocation was > > busy-spinning in compaction, but seemingly never making > > progress. Upon closer inspection, memory was dominated by file > > pages, which the fs compactor isn't allowed to touch. The remaining > > anon pages didn't have the contiguity to satisfy the request. > > > > Allow GFP_NOFS allocations to bypass watermarks when compaction > > failed at the highest priority. > > > > While these deadlocks were encountered only in tests with the > > subsequent patches (which put a lot more demand on compaction), in > > theory these problems already exist in the code today. Fix them now. > > > > Signed-off-by: Johannes Weiner > > Definitely needs to be split out. Will do. > > mm/compaction.c | 15 +++++++++++++-- > > mm/page_alloc.c | 10 +++++++++- > > 2 files changed, 22 insertions(+), 3 deletions(-) > > > > diff --git a/mm/compaction.c b/mm/compaction.c > > index 8238e83385a7..84db84e8fd3a 100644 > > --- a/mm/compaction.c > > +++ b/mm/compaction.c > > @@ -745,8 +745,9 @@ isolate_freepages_range(struct compact_control *cc, > > } > > > > /* Similar to reclaim, but different enough that they don't share logic */ > > -static bool too_many_isolated(pg_data_t *pgdat) > > +static bool too_many_isolated(struct compact_control *cc) > > { > > + pg_data_t *pgdat = cc->zone->zone_pgdat; > > bool too_many; > > > > unsigned long active, inactive, isolated; > > @@ -758,6 +759,16 @@ static bool too_many_isolated(pg_data_t *pgdat) > > isolated = node_page_state(pgdat, NR_ISOLATED_FILE) + > > node_page_state(pgdat, NR_ISOLATED_ANON); > > > > + /* > > + * GFP_NOFS callers are allowed to isolate more pages, so they > > + * won't get blocked by normal direct-reclaimers, forming a > > + * circular deadlock. GFP_NOIO won't get here. > > + */ > > + if (cc->gfp_mask & __GFP_FS) { > > + inactive >>= 3; > > + active >>= 3; > > + } > > + > > This comment needs to explain why GFP_NOFS gets special treatment > explaning that a GFP_NOFS context may not be able to migrate pages and > why. Fair point, I'll expand on that. > As a follow-up, if GFP_NOFS cannot deal with the majority of the > migration contexts then it should bail out of compaction entirely. The > changelog doesn't say why but maybe SYNC_LIGHT is the issue? It's this condition in isolate_migratepages_block(): /* * Only allow to migrate anonymous pages in GFP_NOFS context * because those do not depend on fs locks. */ if (!(cc->gfp_mask & __GFP_FS) && mapping) goto isolate_fail_put; In terms of bailing even earlier: We do have per-zone file and anon counts that could be consulted. However, the real problem is interleaving of anon and file. Even if only 10% of the zone is anon, it could still be worth trying to compact if they're relatively contiguous. OTOH 50% anon could be uncompactable if every block also contains at least one file. We don't know until we actually scan. I'm hesitant to give allocations premature access to the last reserves. What might work is for NOFS contexts to test if anon is low up front and shortcutting directly to the highest priority (SYNC_FULL). One good faith scan attempt at least before touching the reserves.