From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0D9DAD10376 for ; Thu, 24 Oct 2024 21:16:20 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 93FA76B0082; Thu, 24 Oct 2024 17:16:19 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 8EF1A6B0083; Thu, 24 Oct 2024 17:16:19 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7B6CF6B0085; Thu, 24 Oct 2024 17:16:19 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 5F13F6B0082 for ; Thu, 24 Oct 2024 17:16:19 -0400 (EDT) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id CA4424161E for ; Thu, 24 Oct 2024 21:16:08 +0000 (UTC) X-FDA: 82709753700.11.0D0E50F Received: from mail-ua1-f46.google.com (mail-ua1-f46.google.com [209.85.222.46]) by imf22.hostedemail.com (Postfix) with ESMTP id 217C1C0021 for ; Thu, 24 Oct 2024 21:15:53 +0000 (UTC) Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=1xHAJdR4; spf=pass (imf22.hostedemail.com: domain of yuzhao@google.com designates 209.85.222.46 as permitted sender) smtp.mailfrom=yuzhao@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1729804373; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=GQTCqNUduHyWXPxZTMfE/mLw9XjdvE2Tl6TDlgx+Fmo=; b=fxm7Ln0LduljlYAbEj4cW+v2Qn+7zGuSweJfl4TQS3tQv2rMmREjbP/Bi48rDp9IM/8mC0 3p7RV/kJmSH3xEnDa5Zltz0mOIeYZ1GkU744tondEfmMhpr28iHjyIP8vNAVZch7vTf05o sBBYWX2eu4qAN8tC1TvJ9Ao3AIUF9lE= ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=1xHAJdR4; spf=pass (imf22.hostedemail.com: domain of yuzhao@google.com designates 209.85.222.46 as permitted sender) smtp.mailfrom=yuzhao@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1729804373; a=rsa-sha256; cv=none; b=5Ph0vO/9Q6DKJXdZCcjuXiZudGcYZwwOLz8wrylHoMsnxnYP7O2SQl1UvHA/Iu18+hc1RP TaRgT9KjZ+MG/6nwpwxzDj6g5DyMhPfXBq6T6Gfe6WfZGHvmjQN5wlBQMdS/WH9Sp/RtWd U3K0c15Hwi6ugnhBM+52n+Kku4MLn4g= Received: by mail-ua1-f46.google.com with SMTP id a1e0cc1a2514c-84ff612ca93so440552241.0 for ; Thu, 24 Oct 2024 14:16:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1729804576; x=1730409376; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=GQTCqNUduHyWXPxZTMfE/mLw9XjdvE2Tl6TDlgx+Fmo=; b=1xHAJdR4xUf5DTMjertdZK+Eoen1Xq7grDCqIRpi1dswgKigR82Op7T/y3MiDd/f9y e0NFvAOWoYT5VtFmIBEYAb9QIUrzysI4q7idO1CO3wh38irmtsZM1Xw2mo1rfw4soaHR d6TXvmww6/LQWiM9pqM01MvHnnDJnYXrPRvl2hcwAhckjt8vutAk/Wk/H+aooaY9UjhK 65L2JfcnjCu0ZlnBoQ+q6FJ1cXorBTsGbcdGuE8LdAQ691Vf36P0g+wMRRkFiZRPdkA+ N+/s1f/jOsuaDZiV9GG/jQohBFlH7FH50ZhyGXWR8nyYKiB60sURpsnt8ME7djNUOthd V0Gw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1729804576; x=1730409376; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=GQTCqNUduHyWXPxZTMfE/mLw9XjdvE2Tl6TDlgx+Fmo=; b=wtp5TPjwZzV30+4qUbZx1seHDfFORmU7ddg7qm6AZzsCfRpaZsRN4e9w3CacHnArKY +vwITNYbcV9I/w42FFQYsenY/RoDWUNGuQfewLraIbctwwMloxR5hoi/HPqdhuNvoJ3k DCqB/SbDgKFDEdx/2HL8hbaIRDB2CdXywI73rh00UPSolSzQsxF7OGlxz+lL3WjAfsYU WcVyuG4vsZnlWPlBSFDsbf+WuSwG0vJQju1lY1JBrmzosbD1ww19/C1KlvI0mbJTwFk+ +gI7HWbHCQPXvfRN6d3hFvpS9cCMKM8Qud72rtnBKQVvy1+lwy43VvbWkDcifju9LAfm 6M2w== X-Forwarded-Encrypted: i=1; AJvYcCUpXZXzEFX70iWqvHURVfyXJl1qznRPXIfAbzbY6DxFNwpu6W4kRKJKEyS9vc5kHSFdPTCC/ERXLg==@kvack.org X-Gm-Message-State: AOJu0Ywf0f2/04MP/gFET8rPznZfbUp98S/bFgduwj5zE9wSZ+gslVEa oiRMuxa+okAnUrph9fKv7wjWOQiOwpeR5c70I/AgnC55olwPpeNLZEqyH2W0LCAqQq+KDgpuYCo A25iLOCM+OPbmm5LZwDFrP5DaGR++b4L/Nz/g X-Google-Smtp-Source: AGHT+IFyGfRVQ4B9yOdAPvEizSuLviG372ZLWQQ5w14MVBrpQL6sMsecsu0bcgPZiDHoOjujVWyV4ZSjUAucNOxHn94= X-Received: by 2002:a05:6102:5489:b0:4a3:c9b6:b311 with SMTP id ada2fe7eead31-4a8711e6d2cmr4493406137.26.1729804575971; Thu, 24 Oct 2024 14:16:15 -0700 (PDT) MIME-Version: 1.0 References: <20241020051315.356103-1-yuzhao@google.com> <82e6d623-bbf3-4dd8-af32-fdfc120fc759@suse.cz> <97ccf48e-f30c-4abd-b8ff-2b5310a8b60f@suse.cz> In-Reply-To: From: Yu Zhao Date: Thu, 24 Oct 2024 15:15:39 -0600 Message-ID: Subject: Re: [PATCH mm-unstable v1] mm/page_alloc: try not to overestimate free highatomic To: Vlastimil Babka Cc: Mel Gorman , Michal Hocko , Andrew Morton , David Rientjes , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Link Lin , Matt Fleming Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 217C1C0021 X-Stat-Signature: 6xb6heizyjjbuaii6h8cx397331u5m4y X-Rspam-User: X-HE-Tag: 1729804553-617991 X-HE-Meta: U2FsdGVkX19aixIf5ON/sbfnwGnncIZaaMlVIMR7miWshB58iidJf3Q7ouTrlImuqlFNtkCxM+SvH2yk+nzKUzaIg/w1n1Z9iGFHH5YnYDFziLguKsdr40qWcng67qEGsJ8mTSHGT5HUlITzWJfQzy4h8AdKUGJsOTFvwW4BN1dZLkd7vf8gtcWfeMbVWC222Sdr0Wr2DhO4RG1+P9RgHTKQ9qcdeFdQDoi2bMEwhqcdzG9pl6bWaUb9fWEBHK8NSzaaaknDLfzK7Hp5fZs8vdHyK/uTVkzlIm9m2qxCibwjLgXNqo+ESaUJPP8DsOn33DsNektoPMQKKc7TC44OvXXs62D2/VW9qeQK2VcFgppmEmqTh+4SgYzmVstXTdDzF6PBI4niw5nIkn+Z8ld2j3agyw/ENngeHKga+2N/aUtE5zvWzcmAnSWrnPmqcEFKXVYapcFyAVJX1uYLdLqyArof8ypscDnnA4zzbswtHa4kOVsig04zPX1nfp2dKygQJWoGnNdU+97xypXakUmGN/paotK+X7DrClzKSo7T0I/may5p1QzG7aQV8Cor37jv3QtwZsYKw+V5HvzlmZlIlTQOO4ZN4t7f803MbghLYoRfTiIFoGPEZf4D+mjLo2IPP3AcNuapw453mFPcJFnwRgCvDGglb4p5OansZHkN7xaOuA4sUHxqUQ/SVNTvHEMcg7haG5qfLvejr+zHKmbIyfmkd4ECG87t7ls9F/LMMSMv5twc8kbs4bk+lHdcCJ7VYuAZrV3+VBzjXFtvt4bDXXzZAqyXTRy1ReIHuD61cdEvR80Mp0zffbEb8NhDInJgGimqpIXnIoBiI0dSVfvLcDzrjzHY32bd+Vbn+lOIW+Bit7Ya38usZ0wClTh0SJiySW+rk6zI3Ta3lssxlI8m40CvF4Xik4cxBKLmRFKG02AF4vHg3ljVKi/SxoP65jR/JWvT0PlBI5asNIgZNeV egqC1aos 92gWGOGqLgAV4PlDbnq3o69p/3TpoJd+/OuDNMqaruSBl1rohxyiN0wVHTO6qtU6bwjJpTma5Yjs+oX4RjxivYjOaAYhsKquGe1fEIz4brBkpAwbx3IyK0L3S/3FhuxBC7g/0gHj6mun5KJZHlpQee+dl/eplAzJ9ihplrTUhBzoU5mnDGbISBevZ+6irNU6Yb0pwmHS/in7Z9Vqsi7mFTB0q+b9jtbFn3fOG+t35EPvlpIQ= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Oct 24, 2024 at 2:16=E2=80=AFAM Vlastimil Babka wr= ote: > > On 10/24/24 06:35, Yu Zhao wrote: > > On Wed, Oct 23, 2024 at 1:35=E2=80=AFAM Vlastimil Babka wrote: > >> > >> On 10/23/24 08:36, Yu Zhao wrote: > >> > On Tue, Oct 22, 2024 at 4:53=E2=80=AFAM Vlastimil Babka wrote: > >> >> > >> >> +Cc Mel and Matt > >> >> > >> >> On 10/21/24 19:25, Michal Hocko wrote: > >> >> > >> >> Hm I don't think it's completely WAI. The intention is that we shou= ld be > >> >> able to unreserve the highatomic pageblocks before going OOM, and t= here > >> >> seems to be an unintended corner case that if the pageblocks are fu= lly > >> >> exhausted, they are not reachable for unreserving. > >> > > >> > I still think unreserving should only apply to highatomic PBs that > >> > contain free pages. Otherwise, it seems to me that it'd be > >> > self-defecting because: > >> > 1. Unreserving fully used hightatomic PBs can't fulfill the alloc > >> > demand immediately. > >> > >> I thought the alloc demand is only blocked on the pessimistic watermar= k > >> calculation. Usable free pages exist, but the allocation is not allowe= d to > >> use them. > > > > I think we are talking about two different problems here: > > 1. The estimation problem. > > 2. The unreserving policy problem. > > > > What you said here is correct w.r.t. the first problem, and I was > > talking about the second problem. > > OK but the problem with unreserving currently makes the problem of > estimation worse and unfixable. > > >> > 2. More importantly, it only takes one alloc failure in > >> > __alloc_pages_direct_reclaim() to reset nr_reserved_highatomic to 2M= B, > >> > from as high as 1% of a zone (in this case 1GB). IOW, it makes more > >> > sense to me that highatomic only unreserves what it doesn't fully us= e > >> > each time unreserve_highatomic_pageblock() is called, not everything > >> > it got (except the last PB). > >> > >> But if the highatomic pageblocks are already full, we are not really > >> removing any actual highatomic reserves just by changing the migratety= pe and > >> decreasing nr_reserved_highatomic? > > > > If we change the MT, they can be fragmented a lot faster, i.e., from > > the next near OOM condition to upon becoming free. Trying to persist > > over time is what actually makes those PBs more fragmentation > > resistant. > > If we assume the allocations there have similar sizes and lifetimes, then= I > guess yeah. > > >> In fact that would allow the reserves > >> grow with some actual free pages in the future. > > > > Good point. I think I can explain it better along this line. > > > > If highatomic is under the limit, both your proposal and the current > > implementation would try to grow, making not much difference. However, > > the current implementation can also reuse previously full PBs when > > they become available. So there is a clear winner here: the current > > implementation. > > I'd say it depends on the user of the highatomic blocks (the workload), > which way ends up better. > > > If highatomic has reached the limit, with your proposal, the growth > > can only happen after unreserve, and unreserve only happens under > > memory pressure. This means it's likely that it tries to grow under > > memory pressure, which is more difficult than the condition where > > there is plenty of memory. For the current implementation, it doesn't > > try to grow, rather, it keeps what it already has, betting those full > > PBs becoming available for reuse. So I don't see a clear winner > > between trying to grow under memory pressure and betting on becoming > > available for reuse. > > Understood. But also note there are many conditions where the current > implementation and my proposal behave the same. If highatomic pageblocks > become full and then only one or few pages from each is freed, it suddenl= y > becomes possible to unreserve them due to memory pressure, and there is n= o > reuse for those highatomic allocations anymore. This very different outco= me > only depends on whether a single page is free for the unreserve to work, = but > from the efficiency of pageblock reusal you describe above a single page = is > only a minor difference. My proposal would at least remove the sudden cha= nge > of behavior when going from a single free page to no free page. > > >> Hm that assumes we're adding some checks in free fastpath, and for tha= t to > >> work also that there will be a freed page in highatomic PC in near eno= ugh > >> future from the decision we need to unreserve something. Which is not = so > >> much different from the current assumption we'll find such a free page > >> already in the free list immediately. > >> > >> > To summarize, I think this is an estimation problem, which I would > >> > categorize as a lesser problem than accounting problems. But it soun= ds > >> > to me that you think it's a policy problem, i.e., the highatomic > >> > unreserving policy is wrong or not properly implemented? > >> > >> Yeah I'd say not properly implemented, but that sounds like a mechanis= m, not > >> policy problem to me :) > > > > What about adding a new counter to keep track of the size of free > > pages reserved for highatomic? > > That's doable but not so trivial and means starting to handle the highato= mic > pageblocks much more carefully, like we do with CMA pageblocks and > NR_FREE_CMA_PAGES counter, otherwise we risk drifting the counter unrecov= erably. The counter would be protected by the zone lock: diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 17506e4a2835..86c63d48c08e 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -824,6 +824,7 @@ struct zone { unsigned long watermark_boost; unsigned long nr_reserved_highatomic; + unsigned long nr_free_highatomic; /* * We don't know if the memory that we're going to allocate will be diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 8afab64814dc..4d8031817c59 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -644,6 +644,17 @@ static inline void account_freepages(struct zone *zone, int nr_pages, __mod_zone_page_state(zone, NR_FREE_CMA_PAGES, nr_pages); } +static void account_highatomic_freepages(struct zone *zone, unsigned int order, int old_mt, int new_mt) +{ + int nr_pages =3D 1 < order; + + if (is_migrate_highatomic(old_mt)) + zone->nr_free_highatomic -=3D nr_pages; + + if (is_migrate_highatomic(new_mt)) + zone->nr_free_highatomic +=3D nr_pages; +} + /* Used for pages not on another list */ static inline void __add_to_free_list(struct page *page, struct zone *zone= , unsigned int order, int migratetype, @@ -660,6 +671,8 @@ static inline void __add_to_free_list(struct page *page, struct zone *zone, else list_add(&page->buddy_list, &area->free_list[migratetype]); area->nr_free++; + + account_highatomic_freepages(zone, order, -1, migratetype); } /* @@ -681,6 +694,8 @@ static inline void move_to_free_list(struct page *page, struct zone *zone, account_freepages(zone, -(1 << order), old_mt); account_freepages(zone, 1 << order, new_mt); + + account_highatomic_freepages(zone, order, old_mt, new_mt); } static inline void __del_page_from_free_list(struct page *page, struct zone *zone, @@ -698,6 +713,8 @@ static inline void __del_page_from_free_list(struct page *page, struct zone *zon __ClearPageBuddy(page); set_page_private(page, 0); zone->free_area[order].nr_free--; + + account_highatomic_freepages(zone, order, migratetype, -1); } static inline void del_page_from_free_list(struct page *page, struct zone *zone, @@ -3085,7 +3102,7 @@ static inline long __zone_watermark_unusable_free(struct zone *z, * over-estimate the size of the atomic reserve but it avoids a search. */ if (likely(!(alloc_flags & ALLOC_RESERVES))) - unusable_free +=3D z->nr_reserved_highatomic; + unusable_free +=3D z->nr_free_highatomic; #ifdef CONFIG_CMA /* If allocation can't use CMA areas don't use free CMA pages */