Re: [PATCH] mm: page_alloc: unreserve highatomic page blocks before oom

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Michal Hocko <mhocko@suse.com>
To: Charan Teja Kalla <quic_charante@quicinc.com>
Cc: akpm@linux-foundation.org, mgorman@techsingularity.net,
	 david@redhat.com, vbabka@suse.cz, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH] mm: page_alloc: unreserve highatomic page blocks before oom
Date: Tue, 31 Oct 2023 14:43:44 +0100	[thread overview]
Message-ID: <b4tb2mj3jr5aiiwtyim6jl4occgicv4xzphhqxk2cxijsw5l2w@gjsp4vn7vw7p> (raw)
In-Reply-To: <2a0d2dd8-562c-fec7-e3ac-0bd955643e16@quicinc.com>

On Tue 31-10-23 18:43:55, Charan Teja Kalla wrote:
> Thanks Michal/Pavan!!
> 
> On 10/31/2023 1:44 PM, Michal Hocko wrote:
> > On Mon 30-10-23 18:09:50, Charan Teja Kalla wrote:
> >> __alloc_pages_direct_reclaim() is called from slowpath allocation where
> >> high atomic reserves can be unreserved after there is a progress in
> >> reclaim and yet no suitable page is found. Later should_reclaim_retry()
> >> gets called from slow path allocation to decide if the reclaim needs to
> >> be retried before OOM kill path is taken.
> >>
> >> should_reclaim_retry() checks the available(reclaimable + free pages)
> >> memory against the min wmark levels of a zone and returns:
> >> a)  true, if it is above the min wmark so that slow path allocation will
> >> do the reclaim retries.
> >> b) false, thus slowpath allocation takes oom kill path.
> >>
> >> should_reclaim_retry() can also unreserves the high atomic reserves
> >> **but only after all the reclaim retries are exhausted.**
> >>
> >> In a case where there are almost none reclaimable memory and free pages
> >> contains mostly the high atomic reserves but allocation context can't
> >> use these high atomic reserves, makes the available memory below min
> >> wmark levels hence false is returned from should_reclaim_retry() leading
> >> the allocation request to take OOM kill path. This is an early oom kill
> >> because high atomic reserves are holding lot of free memory and 
> >> unreserving of them is not attempted.
> > 
> > OK, I see. So we do not release those reserved pages because OOM hits
> > too early. 
> > 
> >> (early)OOM is encountered on a machine in the below state(excerpt from
> >> the oom kill logs):
> >> [  295.998653] Normal free:7728kB boost:0kB min:804kB low:1004kB
> >> high:1204kB reserved_highatomic:8192KB active_anon:4kB inactive_anon:0kB
> >> active_file:24kB inactive_file:24kB unevictable:1220kB writepending:0kB
> >> present:70732kB managed:49224kB mlocked:0kB bounce:0kB free_pcp:688kB
> >> local_pcp:492kB free_cma:0kB
> >> [  295.998656] lowmem_reserve[]: 0 32
> >> [  295.998659] Normal: 508*4kB (UMEH) 241*8kB (UMEH) 143*16kB (UMEH)
> >> 33*32kB (UH) 7*64kB (UH) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB
> >> 0*4096kB = 7752kB
> > 
> > OK, this is quite interesting as well. The system is really tiny and 8MB
> > of reserved memory is indeed really high. How come those reservations
> > have grown that high?
> 
> Actually it is a VM running on the Linux kernel.
> 
> Regarding the reservations, I think it is because of the 'max_managed '
> calculations in the below:
> static void reserve_highatomic_pageblock(struct page *page, ....) {
>     ....
>   /*
>    * Limit the number reserved to 1 pageblock or roughly 1% of a zone.
>    * Check is race-prone but harmless.
>    */
>     max_managed = (zone_managed_pages(zone) / 100) + pageblock_nr_pages;
> 
>     if (zone->nr_reserved_highatomic >= max_managed)
>             goto out;
> 
>     zone->nr_reserved_highatomic += pageblock_nr_pages;
>     set_pageblock_migratetype(page, MIGRATE_HIGHATOMIC);
>     move_freepages_block(zone, page, MIGRATE_HIGHATOMIC, NULL);
> out:
> }
> 
> Since we are always appending the 1% of zone managed pages count to
> pageblock_nr_pages, the minimum it is turning into 2 pageblocks as the
> 'nr_reserved_highatomic' is incremented/decremented in pageblock size
> granules.
> 
> And for my case the 8M out of ~50M is turned out to be 16%, which is high.
> 
> If the below looks fine to you, I can raise this as a separate change:

Yes, please. Having a full page block (4MB) sounds still too much for
such a tiny system. Maybe there shouldn't be any reservation. But
definitely worth a separate patch.

> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 2a2536d..41441ced 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1886,7 +1886,9 @@ static void reserve_highatomic_pageblock(struct
> page *page, struct zone *zone)
>          * Limit the number reserved to 1 pageblock or roughly 1% of a zone.
>          * Check is race-prone but harmless.
>          */
> -       max_managed = (zone_managed_pages(zone) / 100) + pageblock_nr_pages;
> +       max_managed = max_t(unsigned long,
> +                       ALIGN(zone_managed_pages(zone) / 100,
> pageblock_nr_pages),
> +                       pageblock_nr_pages);
>         if (zone->nr_reserved_highatomic >= max_managed)
>                 return;
> 
> >>
> >> Per above log, the free memory of ~7MB exist in the high atomic
> >> reserves is not freed up before falling back to oom kill path.
> >>
> >> This fix includes unreserving these atomic reserves in the OOM path
> >> before going for a kill. The side effect of unreserving in oom kill path
> >> is that these free pages are checked against the high wmark. If
> >> unreserved from should_reclaim_retry()/__alloc_pages_direct_reclaim(),
> >> they are checked against the min wmark levels.
> > 
> > I do not like the fix much TBH. I think the logic should live in
> 
> yeah, This code looks way too cleaner to me. Let me know If I can raise
> V2 with the below, suggested-by you.

Sure, go ahead.
 
> I think another thing system is missing here is draining the pcp lists.
> min:804kB low:1004kB high:1204kB free_pcp:688kB

Yes, but this seems like negligible even under a small system like that.
Does it actually help to keep system in balance? I would expect that the
OOM is just imminent no matter the draining. Anyway if this makes any
difference then just make it a separate patch please.
-- 
Michal Hocko
SUSE Labs

next prev parent reply	other threads:[~2023-10-31 13:43 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-30 12:39 Charan Teja Kalla
2023-10-31  7:53 ` Pavan Kondeti
2023-10-31  8:14 ` Michal Hocko
2023-10-31 13:13   ` Charan Teja Kalla
2023-10-31 13:43     ` Michal Hocko [this message]
2023-11-01  6:46     ` Pavan Kondeti
2023-11-01  6:53       ` Charan Teja Kalla
2023-11-01  9:41         ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b4tb2mj3jr5aiiwtyim6jl4occgicv4xzphhqxk2cxijsw5l2w@gjsp4vn7vw7p \
    --to=mhocko@suse.com \
    --cc=akpm@linux-foundation.org \
    --cc=david@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=quic_charante@quicinc.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox