From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 998FDC47DD9 for ; Wed, 28 Feb 2024 19:33:55 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2B9F06B0085; Wed, 28 Feb 2024 14:33:55 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 269BF6B0098; Wed, 28 Feb 2024 14:33:55 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0E3BB6B009A; Wed, 28 Feb 2024 14:33:55 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id EE7FC6B0085 for ; Wed, 28 Feb 2024 14:33:54 -0500 (EST) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id A25B3C115E for ; Wed, 28 Feb 2024 19:33:54 +0000 (UTC) X-FDA: 81842212788.01.68F910A Received: from mail-wr1-f48.google.com (mail-wr1-f48.google.com [209.85.221.48]) by imf07.hostedemail.com (Postfix) with ESMTP id D40D44000B for ; Wed, 28 Feb 2024 19:33:52 +0000 (UTC) Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=chromium.org header.s=google header.b=WAuSKUDz; dmarc=pass (policy=none) header.from=chromium.org; spf=pass (imf07.hostedemail.com: domain of kramasub@chromium.org designates 209.85.221.48 as permitted sender) smtp.mailfrom=kramasub@chromium.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1709148833; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=lT04GCfSayZ/NG1YfdCVrXwolvgh0gSX9xKT3OophFQ=; b=RLyc+N6A0/EwBcxOZBanWY2FntJqeyexX37w136/6AsTPbknrQWyEx9ec23/rCjBb9Gp/R BDRbfIYHp/895RNAwhUFcNXlZV1nIZwBmIM8RkLoUkH7n3xjjdstTPKIjH4DJTN7n4U4fn HVVXvsHMb9ffdebYsOoljbEhOwurq3w= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=pass header.d=chromium.org header.s=google header.b=WAuSKUDz; dmarc=pass (policy=none) header.from=chromium.org; spf=pass (imf07.hostedemail.com: domain of kramasub@chromium.org designates 209.85.221.48 as permitted sender) smtp.mailfrom=kramasub@chromium.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1709148833; a=rsa-sha256; cv=none; b=2gnsmZcDqxAzm2UIbmSvGMdq8o7ZQJHnrhGugFiBPTMcde5zi9m4jAtA4eaMwmLjOEKhj3 mnC4VRUUmJ0eEGFjedDiabXdrrTMSgAX8coG4pPVp8RrApACUtj/dpE861QKJ1oo5g2FTn x8KA18ACWkz7Romeni1POyiImi4bDFI= Received: by mail-wr1-f48.google.com with SMTP id ffacd0b85a97d-33dcad9e3a2so82290f8f.3 for ; Wed, 28 Feb 2024 11:33:50 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; t=1709148830; x=1709753630; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=lT04GCfSayZ/NG1YfdCVrXwolvgh0gSX9xKT3OophFQ=; b=WAuSKUDz8cdzSOer9qcvbv66EH7neL0nv6DhPq9gfzAcqI5CPL40+u4QP7RFqrUFR4 8lCcqZZGZ6Wq07DwARZCu8k/iZGywJ7vnzcRGRoaPVv3sgVgOsp3fcW2Gtt1NNQ75yZA gKFVp8CR7mICwZdJhCjHatsj8Bg5uajrhPgTw= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1709148830; x=1709753630; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=lT04GCfSayZ/NG1YfdCVrXwolvgh0gSX9xKT3OophFQ=; b=QRhWRVLjI3EYAdX5pr+uNkWw5iYeBAAPf6orq3lBeYJa43GmdmJxJuysM4HDuh+0Vw eyuulJcFZFTTFP3KhL/karBpPyGTyMNtxwnb0Q2dfQDXtm6CN3iY9wW8/UYZd8WsKfCM DTq2e5ebDJhrB36Uv5XJ5wmgEgs0UmeX35ar3nyHpxnWHb3BHl8P+2cyZK/p6szmud6N aTv1h3l3dih+S7w6VKgnLZSiA6W9nljtQY/q+AmtrS1cJpTo/611sX42XoIAOPzp210y aVjyfzTYr3vhNgXhhQ7x3DLXgEgN+B8TAGiVkIdLOyFV+oaJwAJMyJ84IV6Ks7+OEkUZ LftQ== X-Forwarded-Encrypted: i=1; AJvYcCV1Ig8bQiMPHNulOXt6Gto37k8b3rBpy9V4zCXcaVm3YLsSeOTUocTEMDFJaw7vyu1x0jNORRLnvyDMMy2nmhm3TS8= X-Gm-Message-State: AOJu0Yz9G0ui4S1K2UUhex8XPypf7IYzNAx66xPSvFWgl3H0Pf5iPR4r kt4+GtzCMKYowCb74GhcXZIQzdJT3rmeQ1Tn1weLY2eNxBtt6KgqCX4VAvYQL201JGvUe/7IOfM dL2jcsPcyXwNO0HGKf7T+zYw7OqmQQDo0mhoB X-Google-Smtp-Source: AGHT+IHi0a9VVTiFwIdkF5FnWuaSceqJNfODPnjKL5knq0jzCOUubnV9I4KXVitnKYMcQsXNcnfJC9GeIga63VcV/x8= X-Received: by 2002:adf:f848:0:b0:33d:88c1:31b8 with SMTP id d8-20020adff848000000b0033d88c131b8mr399831wrq.60.1709148829737; Wed, 28 Feb 2024 11:33:49 -0800 (PST) MIME-Version: 1.0 References: <20240221114357.13655-2-vbabka@suse.cz> In-Reply-To: <20240221114357.13655-2-vbabka@suse.cz> From: Karthikeyan Ramasubramanian Date: Wed, 28 Feb 2024 12:33:38 -0700 Message-ID: Subject: Re: [PATCH] mm, vmscan: prevent infinite loop for costly GFP_NOIO | __GFP_RETRY_MAYFAIL allocations To: Vlastimil Babka Cc: Andrew Morton , svenva@chromium.org, bgeffon@google.com, cujomalainey@chromium.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-sound@vger.kernel.org, perex@perex.cz, stable@vger.kernel.org, tiwai@suse.com, tiwai@suse.de, Michal Hocko , Mel Gorman Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Stat-Signature: 5x7kdm3j4sqhu5zb5eo8154zhzpxggig X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: D40D44000B X-HE-Tag: 1709148832-219178 X-HE-Meta: U2FsdGVkX197SvPq1bltLyWpNPYSwobivPrwL7pf0/x7z9vchO77H+CagIt6sL3FQ54GOrNYmkSYiYuFwfdpBnMOW7qIs7Dv6Oj+fLfx75tzrxdPc030u3HW07HkxDUc5AdZPhF6jbAzMEe0ISnm9Bp05y7Ukv7raVLP9iCzTOc3cEXW/pkmDjT4dVWbZ0jMfGN/J3e3hkctsEIJhSOLSuXlc5nJm+bzfLZlRbb4y0A8tg7Uaj9YHqT46aBOC0ylfHp5MjT2klTDfgSdJaPF2linG6ndmSV0AZoJvFAiseD5HfPDgHXne55Xmyl72H0+zwESq5q4UK1ov9/9rllsM4UW6YjWjK9UYskp56BqXL4CRtt3QXJIzEO2UbrJsN3IagDCeCblvl0uFYthZDzOL9nNHrnDNRs8jWT0cdgxN9cRtAD/hHdvZprwuwzneDIdhtGyxGWcKRLweH8+o786j8tdudtce8YbA02Xy2CYGbYWAahDy9d920cKTG3itRFqVe8i1hFKt/GYc7qJDJTCQfmRww6JgUxdzTCftrjcVHCjLDW5qb96qhVCtIV9EA1Uq9zOa8UqiigIS+YxIBzvCNd5y6JrDSmWs7sSTRMIFZNgXhfLV5b4iT6Iv1W6cqdyuF8Dx2Zqv28FaOP5fg66KrbKoXM+6Qaj1E51Wt6MDlizYxgLRkAqRxoORxBpx/hKxbTXq6QWDZXFEJehG94OACn+2ifnmnVT9TprF4NZWV9hZcT1RNtm8iSqAd7x4CvaCjO9jo7mqkOazN7AiMhiS0M8iU1hRGZG8XpsrA201nFQBnipm+kFiYt7bRivBnQpc6YsJaU44/aIz0Y+dQrwZQy2wSl8ppSvHCe63+imW59i6GxylEa9WkPXIuFprTbTQmQpiVAHCX47dGjsI7gfcA80SOnRoNIqgyHReCjq30jyK0qMaaJUqvw/a83x5ay7iX8w+Z/GoekcbWIeka+ rz7r1s15 3GpQlref26urdz4BeL3mZMKF+0U3BqyQmWRguqYrMFFhBlrs= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Feb 21, 2024 at 4:44=E2=80=AFAM Vlastimil Babka wr= ote: > > Sven reports an infinite loop in __alloc_pages_slowpath() for costly > order __GFP_RETRY_MAYFAIL allocations that are also GFP_NOIO. Such > combination can happen in a suspend/resume context where a GFP_KERNEL > allocation can have __GFP_IO masked out via gfp_allowed_mask. > > Quoting Sven: > > 1. try to do a "costly" allocation (order > PAGE_ALLOC_COSTLY_ORDER) > with __GFP_RETRY_MAYFAIL set. > > 2. page alloc's __alloc_pages_slowpath tries to get a page from the > freelist. This fails because there is nothing free of that costly > order. > > 3. page alloc tries to reclaim by calling __alloc_pages_direct_reclaim, > which bails out because a zone is ready to be compacted; it pretends > to have made a single page of progress. > > 4. page alloc tries to compact, but this always bails out early because > __GFP_IO is not set (it's not passed by the snd allocator, and even > if it were, we are suspending so the __GFP_IO flag would be cleared > anyway). > > 5. page alloc believes reclaim progress was made (because of the > pretense in item 3) and so it checks whether it should retry > compaction. The compaction retry logic thinks it should try again, > because: > a) reclaim is needed because of the early bail-out in item 4 > b) a zonelist is suitable for compaction > > 6. goto 2. indefinite stall. > > (end quote) > > The immediate root cause is confusing the COMPACT_SKIPPED returned from > __alloc_pages_direct_compact() (step 4) due to lack of __GFP_IO to be > indicating a lack of order-0 pages, and in step 5 evaluating that in > should_compact_retry() as a reason to retry, before incrementing and > limiting the number of retries. There are however other places that > wrongly assume that compaction can happen while we lack __GFP_IO. > > To fix this, introduce gfp_compaction_allowed() to abstract the __GFP_IO > evaluation and switch the open-coded test in try_to_compact_pages() to > use it. > > Also use the new helper in: > - compaction_ready(), which will make reclaim not bail out in step 3, so > there's at least one attempt to actually reclaim, even if chances are > small for a costly order > - in_reclaim_compaction() which will make should_continue_reclaim() > return false and we don't over-reclaim unnecessarily > - in __alloc_pages_slowpath() to set a local variable can_compact, > which is then used to avoid retrying reclaim/compaction for costly > allocations (step 5) if we can't compact and also to skip the early > compaction attempt that we do in some cases > > Reported-by: Sven van Ashbrook > Closes: https://lore.kernel.org/all/CAG-rBihs_xMKb3wrMO1%2B-%2Bp4fowP9oy1= pa_OTkfxBzPUVOZF%2Bg@mail.gmail.com/ > Fixes: 3250845d0526 ("Revert "mm, oom: prevent premature OOM killer invoc= ation for high order request"") > Cc: > Signed-off-by: Vlastimil Babka Tested-by: Karthikeyan Ramasubramanian > --- > include/linux/gfp.h | 9 +++++++++ > mm/compaction.c | 7 +------ > mm/page_alloc.c | 10 ++++++---- > mm/vmscan.c | 5 ++++- > 4 files changed, 20 insertions(+), 11 deletions(-) > > diff --git a/include/linux/gfp.h b/include/linux/gfp.h > index de292a007138..e2a916cf29c4 100644 > --- a/include/linux/gfp.h > +++ b/include/linux/gfp.h > @@ -353,6 +353,15 @@ static inline bool gfp_has_io_fs(gfp_t gfp) > return (gfp & (__GFP_IO | __GFP_FS)) =3D=3D (__GFP_IO | __GFP_FS)= ; > } > > +/* > + * Check if the gfp flags allow compaction - GFP_NOIO is a really > + * tricky context because the migration might require IO. > + */ > +static inline bool gfp_compaction_allowed(gfp_t gfp_mask) > +{ > + return IS_ENABLED(CONFIG_COMPACTION) && (gfp_mask & __GFP_IO); > +} > + > extern gfp_t vma_thp_gfp_mask(struct vm_area_struct *vma); > > #ifdef CONFIG_CONTIG_ALLOC > diff --git a/mm/compaction.c b/mm/compaction.c > index 4add68d40e8d..b961db601df4 100644 > --- a/mm/compaction.c > +++ b/mm/compaction.c > @@ -2723,16 +2723,11 @@ enum compact_result try_to_compact_pages(gfp_t gf= p_mask, unsigned int order, > unsigned int alloc_flags, const struct alloc_context *ac, > enum compact_priority prio, struct page **capture) > { > - int may_perform_io =3D (__force int)(gfp_mask & __GFP_IO); > struct zoneref *z; > struct zone *zone; > enum compact_result rc =3D COMPACT_SKIPPED; > > - /* > - * Check if the GFP flags allow compaction - GFP_NOIO is really > - * tricky context because the migration might require IO > - */ > - if (!may_perform_io) > + if (!gfp_compaction_allowed(gfp_mask)) > return COMPACT_SKIPPED; > > trace_mm_compaction_try_to_compact_pages(order, gfp_mask, prio); > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 150d4f23b010..a663202045dc 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -4041,6 +4041,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int= order, > struct alloc_context *ac) > { > bool can_direct_reclaim =3D gfp_mask & __GFP_DIRECT_RECLAIM; > + bool can_compact =3D gfp_compaction_allowed(gfp_mask); > const bool costly_order =3D order > PAGE_ALLOC_COSTLY_ORDER; > struct page *page =3D NULL; > unsigned int alloc_flags; > @@ -4111,7 +4112,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int= order, > * Don't try this for allocations that are allowed to ignore > * watermarks, as the ALLOC_NO_WATERMARKS attempt didn't yet happ= en. > */ > - if (can_direct_reclaim && > + if (can_direct_reclaim && can_compact && > (costly_order || > (order > 0 && ac->migratetype !=3D MIGRATE_MOV= ABLE)) > && !gfp_pfmemalloc_allowed(gfp_mask)) { > @@ -4209,9 +4210,10 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned in= t order, > > /* > * Do not retry costly high order allocations unless they are > - * __GFP_RETRY_MAYFAIL > + * __GFP_RETRY_MAYFAIL and we can compact > */ > - if (costly_order && !(gfp_mask & __GFP_RETRY_MAYFAIL)) > + if (costly_order && (!can_compact || > + !(gfp_mask & __GFP_RETRY_MAYFAIL))) > goto nopage; > > if (should_reclaim_retry(gfp_mask, order, ac, alloc_flags, > @@ -4224,7 +4226,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int= order, > * implementation of the compaction depends on the sufficient amo= unt > * of free memory (see __compaction_suitable) > */ > - if (did_some_progress > 0 && > + if (did_some_progress > 0 && can_compact && > should_compact_retry(ac, order, alloc_flags, > compact_result, &compact_priority, > &compaction_retries)) > diff --git a/mm/vmscan.c b/mm/vmscan.c > index 4f9c854ce6cc..4255619a1a31 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -5753,7 +5753,7 @@ static void shrink_lruvec(struct lruvec *lruvec, st= ruct scan_control *sc) > /* Use reclaim/compaction for costly allocs or under memory pressure */ > static bool in_reclaim_compaction(struct scan_control *sc) > { > - if (IS_ENABLED(CONFIG_COMPACTION) && sc->order && > + if (gfp_compaction_allowed(sc->gfp_mask) && sc->order && > (sc->order > PAGE_ALLOC_COSTLY_ORDER || > sc->priority < DEF_PRIORITY - 2)) > return true; > @@ -5998,6 +5998,9 @@ static inline bool compaction_ready(struct zone *zo= ne, struct scan_control *sc) > { > unsigned long watermark; > > + if (!gfp_compaction_allowed(sc->gfp_mask)) > + return false; > + > /* Allocation can already succeed, nothing to do */ > if (zone_watermark_ok(zone, sc->order, min_wmark_pages(zone), > sc->reclaim_idx, 0)) > -- > 2.43.1 >