From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B9CC5E77183 for ; Wed, 18 Dec 2024 04:59:24 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C23816B0082; Tue, 17 Dec 2024 23:59:23 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id BD33D6B0083; Tue, 17 Dec 2024 23:59:23 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A9B0E6B0085; Tue, 17 Dec 2024 23:59:23 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 8A7EB6B0082 for ; Tue, 17 Dec 2024 23:59:23 -0500 (EST) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 01D09809F6 for ; Wed, 18 Dec 2024 04:59:22 +0000 (UTC) X-FDA: 82906875828.02.A50E2DD Received: from mail-qt1-f169.google.com (mail-qt1-f169.google.com [209.85.160.169]) by imf03.hostedemail.com (Postfix) with ESMTP id D96672000C for ; Wed, 18 Dec 2024 04:59:06 +0000 (UTC) Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=rwYjqDsD; spf=pass (imf03.hostedemail.com: domain of yosryahmed@google.com designates 209.85.160.169 as permitted sender) smtp.mailfrom=yosryahmed@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1734497939; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=tM9aAIzem3SfKVAWA4VaTEk++v35HrnlAHqCNLRzXMk=; b=izVcKLEr1IJzV7gKtvV4XDBnUbRxR8j/Fa62QmnKW6R6HLj/h4NeOQt454WaXBjFPlne7i y2ASLDftAlsMIHcHd0JXDmgnwH3MausquJ0YOoionKH9cPlyF+E3njG1iGNFq/HSoBioWw PSsteaa+nvVnWMhgCbArvjvAeuunbcg= ARC-Authentication-Results: i=1; imf03.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=rwYjqDsD; spf=pass (imf03.hostedemail.com: domain of yosryahmed@google.com designates 209.85.160.169 as permitted sender) smtp.mailfrom=yosryahmed@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1734497939; a=rsa-sha256; cv=none; b=YJCAeZOiPMMqM+tiyKY47bUMzzKK8rc0UuVN02Cc52CzX9c5RSMuF4fAStamG19HmU+I/a lgZZADpKfVXz/MscPyfdo76/eZjQZx68TnBmaC4soogEZTQiSEJtwuYmUtObxVm8zllKE6 RWLoehGFI8031BJQacrcNb7HErdDEj0= Received: by mail-qt1-f169.google.com with SMTP id d75a77b69052e-4679fc9b5f1so50238251cf.1 for ; Tue, 17 Dec 2024 20:59:20 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1734497960; x=1735102760; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=tM9aAIzem3SfKVAWA4VaTEk++v35HrnlAHqCNLRzXMk=; b=rwYjqDsDGDxbAGpCUs08UOHtLAUW7PtG7RVTDx9/19ah4SF8H3wwrTVTmI4zFFBJo2 DC7Iz0mFmwGGa7dVXiLYPo4NQRrDlJEAy2Z5pqIZKGMHLRUMQs0sH2xMQNKCDFj5WZvV 1mkwHhKO5c/Awzc3XCAelmDEvkont/HSIhS7g1WwzlRTYGTHfvZLisF9NjJ5FPj94weZ ifKza0t29GQGUQfLbP3VqBN0qCl2SYbqDPNgUccQCQG6ABopFGWOwOziX1M2ia8tnvtB a114WvIv6jvQzbunNUwie02WWhV20/QL//T8ltHl90BSRBOK03D33Xi8S+xYrBfwMN0W FblA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1734497960; x=1735102760; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=tM9aAIzem3SfKVAWA4VaTEk++v35HrnlAHqCNLRzXMk=; b=pYHLfn+m9WPIqNBjPlWylY+SJ4NDcaE9xLd1rSueFDzsi1pCjWH8kAhG6j4yx46Qem /X4JPByDnPcCto8+r5wD7sMyV0uWFoHXwpqTt6czzyZoPB0QRS1aHU7qz4JJCJKalh/+ HEkH80Y17lwncZT5vCxm4EslbcKGSjHMS5Qz1P7FaUMDAgg/t63WPTzFO/CbiRiF20Au S7t7LGC+YX/UdNU3ckfvIxu97QSmm4SlZZAx70hOp3pCxQaFNBrhQDygDTHSOEnGVd8B 7SOs0L3WRNX6Fer2cLVmcTpuTI6v66z4XKBldgbJHQ4xd+BmvlxI+ampOxgl87wJLJaV SfHg== X-Forwarded-Encrypted: i=1; AJvYcCXV2ayQtqgXRgtwnIFfns7QuAnq8r7gKXcvoy1qhDvaYslcdmntH2hjQHgIapeqOjo0QA6slCRZmQ==@kvack.org X-Gm-Message-State: AOJu0YwZvftFsEdadWDEmIgR3/rVIGN4QQTqReB7od7W2mTjEpOzMM0Y Ha4fFq4E2MtgpYROdboixQlqRiG+Z2B46HTDPJmMwOCT7DIitRWAY0TCsvuqu2b2nkZFFRFCqCN jYltpklf4DlegJHmW+s8aaJ+xE+igpF7ifAYj X-Gm-Gg: ASbGnct1gSnmUehzOTDu8Z4yLiPDOmdCRnI/YwJ/lRpUa5YzlfIXOC+60TETac4bsZS M2BqnbKgxj31FFpgJUrKnF90thFLwEdCkDis= X-Google-Smtp-Source: AGHT+IGGee0Jhlf7uun5NkaJLNERwoodBYYu5MEU02cNCdKeTSuAt+mlVLQw7SdIS8zsgM1D1cZNHs3S5Qbc20myLIs= X-Received: by 2002:ad4:4eac:0:b0:6dc:d29a:b18e with SMTP id 6a1803df08f44-6dd0925cc7amr31279036d6.47.1734497959872; Tue, 17 Dec 2024 20:59:19 -0800 (PST) MIME-Version: 1.0 References: <20241218030720.1602449-1-alexei.starovoitov@gmail.com> <20241218030720.1602449-3-alexei.starovoitov@gmail.com> In-Reply-To: <20241218030720.1602449-3-alexei.starovoitov@gmail.com> From: Yosry Ahmed Date: Tue, 17 Dec 2024 20:58:43 -0800 X-Gm-Features: AbW1kvYrgEsUSkvjR1lm0Dsfqg2WP9ABQhR90_xAulHmJGik8KXpYq8q3U9F9-0 Message-ID: Subject: Re: [PATCH bpf-next v3 2/6] mm, bpf: Introduce free_pages_nolock() To: alexei.starovoitov@gmail.com Cc: bpf@vger.kernel.org, andrii@kernel.org, memxor@gmail.com, akpm@linux-foundation.org, peterz@infradead.org, vbabka@suse.cz, bigeasy@linutronix.de, rostedt@goodmis.org, houtao1@huawei.com, hannes@cmpxchg.org, shakeel.butt@linux.dev, mhocko@suse.com, willy@infradead.org, tglx@linutronix.de, jannh@google.com, tj@kernel.org, linux-mm@kvack.org, kernel-team@fb.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam05 X-Stat-Signature: xscp99t95csnosun8s6gp1gnxddn3d6n X-Rspamd-Queue-Id: D96672000C X-Rspam-User: X-HE-Tag: 1734497946-268045 X-HE-Meta: U2FsdGVkX1963E7hIRmv4HUQNg9pP7ucq9B4Ft47CLmO68LjMS6/tiUaUvnU/T/OZ+qKGA9fStL+VMJNuEMh+tyavEZ9ZghKXPXIjq8LztNWx2GJkKeDhjxL6UPOIXQI3MtKELettlccQq/gZOav2ujxfylEGI9BGgvMQlVn/pX45ZXhgEFeHmmOwDYCGf3quvakoWl9xAdXn6ETiWnBMRkeR7h8WYn1EuHvt3C0W5+is3KxcCMwlBu+v5C5i+8jzPV5axWpMCr1Y/FR7+ne9K4Me+ozDtf0pDBQMuCaBleNlDP5RypTlzIj+/P2ZobZdJEDIM6+Nf4jkMvWHYYZ6mZlFLOidWX1xICObD129KZDlJUlDnHdvrxyGB0bQow+CAEyokTuZk0/Pqo1TZJiDuiczlTp7iIzmpnkZj8Pv7PdRhIz6HPzDAzsawhaFtBZuiifEcOSZYqWsBEaCV7pyNO/ThaNx/c4zAV+FBBSHEqg8pr9pnd8GLX12AVNWUOo2AXa23bYBhYNm9XgSO8Kr/9h8H/3J/IbCrv6VTM8Eg6FgPHm8H9s6wDmrdrFwAicjpL+CDzBo8tCVR4hYqtoyGfGjCvkYx1T6Eo4PDfyVF/bYE8yriA2ik1GNk/Ti+0AL3ri59kl0gOaDnUEigcTcz3fV8BsXvJvnpw4EY4CfFXXza77gXv+K+G6ux/pupsA1SMb/fPNOmGECofYqfArEksNY23/hgHoNHftoYhqfAJ7Ysk/r3/HRkD3FcOjPOq6kVbXkqnetGQCxxAtfQbIZUnr0r8GLv9b7kzxCgN72+f3rXL+793IErsZ+47BCrXTcyVshmfLdhUOfLKeAt2/DwtL/c7iYlpeCp+VWiyO9mJU0pYal7wJCTjIEhhw0gwUTm8jLQec0k7p9kHZIO9G3MFCkG3bBhfB/Px8zyxJ+v6K2A0Ew7UCU42UbyFimOb4YcKgdYZbkyYiNgDEHYP 2vcRS8aL cUulrKHgLmS0pRvmVzXcA1meMaOmIE0CBDQLA8uao2C4+Qpwtrvlbm8nUeFEZfa6z7WewXn9ScmMGKeSalM3LPdygNftwBwVIX19mpot3KJMxPGEfbdMBb1Ey1VzNxxdmGX3ZkNS/RiJLXpZWFPSvE22Daa6d3PkwyCgFVYQa0dEYP2gP7RAvgZZFb/gIp93vqNAEZ+WzwN1qC9As7N0AHj9KQ57HqW4agfScGNIpMJ3PQhE7TA8hv7z8lyh8wED3BNJy5nrpdIf0nsbKvMb0Lh1hHTkA7b6HaFVlJDUku8XvuaYh7HvdaQGISBMrUBbd3IxAErh6AnuW7OI9+p62qqjd0hxfd0ywPIgNQBfX6YmbDGc= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000069, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Dec 17, 2024 at 7:07=E2=80=AFPM wrot= e: > > From: Alexei Starovoitov > > Introduce free_pages_nolock() that can free pages without taking locks. > It relies on trylock and can be called from any context. > Since spin_trylock() cannot be used in RT from hard IRQ or NMI > it uses lockless link list to stash the pages which will be freed > by subsequent free_pages() from good context. > > Signed-off-by: Alexei Starovoitov > --- > include/linux/gfp.h | 1 + > include/linux/mm_types.h | 4 ++ > include/linux/mmzone.h | 3 ++ > mm/page_alloc.c | 79 ++++++++++++++++++++++++++++++++++++---- > 4 files changed, 79 insertions(+), 8 deletions(-) > > diff --git a/include/linux/gfp.h b/include/linux/gfp.h > index 65b8df1db26a..ff9060af6295 100644 > --- a/include/linux/gfp.h > +++ b/include/linux/gfp.h > @@ -372,6 +372,7 @@ __meminit void *alloc_pages_exact_nid_noprof(int nid,= size_t size, gfp_t gfp_mas > __get_free_pages((gfp_mask) | GFP_DMA, (order)) > > extern void __free_pages(struct page *page, unsigned int order); > +extern void free_pages_nolock(struct page *page, unsigned int order); > extern void free_pages(unsigned long addr, unsigned int order); > > #define __free_page(page) __free_pages((page), 0) > diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h > index 7361a8f3ab68..52547b3e5fd8 100644 > --- a/include/linux/mm_types.h > +++ b/include/linux/mm_types.h > @@ -99,6 +99,10 @@ struct page { > /* Or, free page */ > struct list_head buddy_list; > struct list_head pcp_list; > + struct { > + struct llist_node pcp_llist; > + unsigned int order; > + }; > }; > /* See page-flags.h for PAGE_MAPPING_FLAGS */ > struct address_space *mapping; > diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h > index b36124145a16..1a854e0a9e3b 100644 > --- a/include/linux/mmzone.h > +++ b/include/linux/mmzone.h > @@ -953,6 +953,9 @@ struct zone { > /* Primarily protects free_area */ > spinlock_t lock; > > + /* Pages to be freed when next trylock succeeds */ > + struct llist_head trylock_free_pages; > + > /* Write-intensive fields used by compaction and vmstats. */ > CACHELINE_PADDING(_pad2_); > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index d23545057b6e..10918bfc6734 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -88,6 +88,9 @@ typedef int __bitwise fpi_t; > */ > #define FPI_TO_TAIL ((__force fpi_t)BIT(1)) > > +/* Free the page without taking locks. Rely on trylock only. */ > +#define FPI_TRYLOCK ((__force fpi_t)BIT(2)) > + The comment above the definition of fpi_t mentions that it's for non-pcp variants of free_pages(), so I guess that needs to be updated in this patch. More importantly, I think the comment states this mainly because the existing flags won't be properly handled when freeing pages to the pcplist. The flags will be lost once the pages are added to the pcplist, and won't be propagated when the pages are eventually freed to the buddy allocator (e.g. through free_pcppages_bulk()). So I think we need to at least explicitly check which flags are allowed when freeing pages to the pcplists or something similar. > /* prevent >1 _updater_ of zone percpu pageset ->high and ->batch fields= */ > static DEFINE_MUTEX(pcp_batch_high_lock); > #define MIN_PERCPU_PAGELIST_HIGH_FRACTION (8) > @@ -1247,13 +1250,44 @@ static void split_large_buddy(struct zone *zone, = struct page *page, > } > } > > +static void add_page_to_zone_llist(struct zone *zone, struct page *page, > + unsigned int order) > +{ > + /* Remember the order */ > + page->order =3D order; > + /* Add the page to the free list */ > + llist_add(&page->pcp_llist, &zone->trylock_free_pages); > +} > + > static void free_one_page(struct zone *zone, struct page *page, > unsigned long pfn, unsigned int order, > fpi_t fpi_flags) > { > + struct llist_head *llhead; > unsigned long flags; > > - spin_lock_irqsave(&zone->lock, flags); > + if (!spin_trylock_irqsave(&zone->lock, flags)) { > + if (unlikely(fpi_flags & FPI_TRYLOCK)) { > + add_page_to_zone_llist(zone, page, order); > + return; > + } > + spin_lock_irqsave(&zone->lock, flags); > + } > + > + /* The lock succeeded. Process deferred pages. */ > + llhead =3D &zone->trylock_free_pages; > + if (unlikely(!llist_empty(llhead) && !(fpi_flags & FPI_TRYLOCK)))= { > + struct llist_node *llnode; > + struct page *p, *tmp; > + > + llnode =3D llist_del_all(llhead); > + llist_for_each_entry_safe(p, tmp, llnode, pcp_llist) { > + unsigned int p_order =3D p->order; > + > + split_large_buddy(zone, p, page_to_pfn(p), p_orde= r, fpi_flags); > + __count_vm_events(PGFREE, 1 << p_order); > + } > + } > split_large_buddy(zone, page, pfn, order, fpi_flags); > spin_unlock_irqrestore(&zone->lock, flags); > > @@ -2596,7 +2630,7 @@ static int nr_pcp_high(struct per_cpu_pages *pcp, s= truct zone *zone, > > static void free_unref_page_commit(struct zone *zone, struct per_cpu_pag= es *pcp, > struct page *page, int migratetype, > - unsigned int order) > + unsigned int order, fpi_t fpi_flags) > { > int high, batch; > int pindex; > @@ -2631,6 +2665,14 @@ static void free_unref_page_commit(struct zone *zo= ne, struct per_cpu_pages *pcp, > } > if (pcp->free_count < (batch << CONFIG_PCP_BATCH_SCALE_MAX)) > pcp->free_count +=3D (1 << order); > + > + if (unlikely(fpi_flags & FPI_TRYLOCK)) { > + /* > + * Do not attempt to take a zone lock. Let pcp->count get > + * over high mark temporarily. > + */ > + return; > + } > high =3D nr_pcp_high(pcp, zone, batch, free_high); > if (pcp->count >=3D high) { > free_pcppages_bulk(zone, nr_pcp_free(pcp, batch, high, fr= ee_high), > @@ -2645,7 +2687,8 @@ static void free_unref_page_commit(struct zone *zon= e, struct per_cpu_pages *pcp, > /* > * Free a pcp page > */ > -void free_unref_page(struct page *page, unsigned int order) > +static void __free_unref_page(struct page *page, unsigned int order, > + fpi_t fpi_flags) > { > unsigned long __maybe_unused UP_flags; > struct per_cpu_pages *pcp; > @@ -2654,7 +2697,7 @@ void free_unref_page(struct page *page, unsigned in= t order) > int migratetype; > > if (!pcp_allowed_order(order)) { > - __free_pages_ok(page, order, FPI_NONE); > + __free_pages_ok(page, order, fpi_flags); > return; > } > > @@ -2671,24 +2714,33 @@ void free_unref_page(struct page *page, unsigned = int order) > migratetype =3D get_pfnblock_migratetype(page, pfn); > if (unlikely(migratetype >=3D MIGRATE_PCPTYPES)) { > if (unlikely(is_migrate_isolate(migratetype))) { > - free_one_page(page_zone(page), page, pfn, order, = FPI_NONE); > + free_one_page(page_zone(page), page, pfn, order, = fpi_flags); > return; > } > migratetype =3D MIGRATE_MOVABLE; > } > > zone =3D page_zone(page); > + if (IS_ENABLED(CONFIG_PREEMPT_RT) && (in_nmi() || in_hardirq())) = { > + add_page_to_zone_llist(zone, page, order); > + return; > + } > pcp_trylock_prepare(UP_flags); > pcp =3D pcp_spin_trylock(zone->per_cpu_pageset); > if (pcp) { > - free_unref_page_commit(zone, pcp, page, migratetype, orde= r); > + free_unref_page_commit(zone, pcp, page, migratetype, orde= r, fpi_flags); > pcp_spin_unlock(pcp); > } else { > - free_one_page(zone, page, pfn, order, FPI_NONE); > + free_one_page(zone, page, pfn, order, fpi_flags); > } > pcp_trylock_finish(UP_flags); > } > > +void free_unref_page(struct page *page, unsigned int order) > +{ > + __free_unref_page(page, order, FPI_NONE); > +} > + > /* > * Free a batch of folios > */ > @@ -2777,7 +2829,7 @@ void free_unref_folios(struct folio_batch *folios) > > trace_mm_page_free_batched(&folio->page); > free_unref_page_commit(zone, pcp, &folio->page, migratety= pe, > - order); > + order, FPI_NONE); > } > > if (pcp) { > @@ -4854,6 +4906,17 @@ void __free_pages(struct page *page, unsigned int = order) > } > EXPORT_SYMBOL(__free_pages); > > +/* > + * Can be called while holding raw_spin_lock or from IRQ and NMI, > + * but only for pages that came from try_alloc_pages(): > + * order <=3D 3, !folio, etc > + */ > +void free_pages_nolock(struct page *page, unsigned int order) > +{ > + if (put_page_testzero(page)) > + __free_unref_page(page, order, FPI_TRYLOCK); > +} > + > void free_pages(unsigned long addr, unsigned int order) > { > if (addr !=3D 0) { > -- > 2.43.5 > >