From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id DB710C021B1 for ; Thu, 20 Feb 2025 20:37:06 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5DB712802EB; Thu, 20 Feb 2025 15:37:06 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 58B822802E2; Thu, 20 Feb 2025 15:37:06 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 42BF72802EB; Thu, 20 Feb 2025 15:37:06 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 24F0D2802E2 for ; Thu, 20 Feb 2025 15:37:06 -0500 (EST) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id BAF7B1CD3D0 for ; Thu, 20 Feb 2025 20:37:05 +0000 (UTC) X-FDA: 83141482410.04.71FF34D Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf24.hostedemail.com (Postfix) with ESMTP id 4C6B8180005 for ; Thu, 20 Feb 2025 20:37:03 +0000 (UTC) Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=AzIuPu9J; spf=pass (imf24.hostedemail.com: domain of luizcap@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=luizcap@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740083823; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=IMhy9qiICvzONWp5u6SxcMcMKcO6AHzHEe89orkSRsM=; b=UTb/R0lGcJgFKOLfCa8IVkq+4IH7ebAB+s7Sy4jc3iWJLb4lL2TUxHrp3Lg9pLOGphGX/e MqyKGt774bg+r1lhhs//lih7wOV7nS302lfMrTkpkAfCAyTk7qr56nW97g3Gnv/G6Huy5d lMvQJAlVbmoW+xIfR2nJ6qS+pqipzyA= ARC-Authentication-Results: i=1; imf24.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=AzIuPu9J; spf=pass (imf24.hostedemail.com: domain of luizcap@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=luizcap@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740083823; a=rsa-sha256; cv=none; b=H3pDSU3S1vWnJcZJVSJAcKiwgk+WQ7AKzaCfn0COzHnO7RDaqgO9+Bp0bK2UKnUonQs0Y2 m5oolz3kHa7w2E6GtLJWCCu2UrMSUY//hKjpzSbP815F0jwapq7wwxkq9CX199JBSOk5c+ 3WfP+86yyxVrSR/e/t1M/NEm4AUWMg0= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1740083822; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=IMhy9qiICvzONWp5u6SxcMcMKcO6AHzHEe89orkSRsM=; b=AzIuPu9JYkffTRme0BVOu2hLBk3Qbecfu7NM4pyG/3fuz2j1WCgDOMt+EZe/SkkFDrmDMy nZd7ELMXX3cGPCQ8LvM3T/phz64eZofrlTT+pBZTYc94ZH/i6fvl0qV9b9ZPir+gCEe+mu +Wt01KEWnESUss5B14eyIiH1jxO4Duc= Received: from mail-qk1-f200.google.com (mail-qk1-f200.google.com [209.85.222.200]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-383-OEmaY9XPNEe62ixDV7oaRQ-1; Thu, 20 Feb 2025 15:36:59 -0500 X-MC-Unique: OEmaY9XPNEe62ixDV7oaRQ-1 X-Mimecast-MFC-AGG-ID: OEmaY9XPNEe62ixDV7oaRQ_1740083818 Received: by mail-qk1-f200.google.com with SMTP id af79cd13be357-7c0bb434620so198591185a.3 for ; Thu, 20 Feb 2025 12:36:59 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1740083818; x=1740688618; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=IMhy9qiICvzONWp5u6SxcMcMKcO6AHzHEe89orkSRsM=; b=qTNExFbWJBXPh/gMcjxzzZpPrxfcVHKgPyCPTWeCnlUpOYzD7wc6fL9f1v1gfAuks1 Wsg8idFTtQ7GNlytVucayQxdu7abgb3Pfd8ZBTEumVmZ7RXV3gHR3s+T57CIpHFjsn7f E2ifU9L/ClAdObBxUe9Hm9scakpoaBsDX08tcuWxp4CVVK3pVEXJ3tMhCRs6m+aQGB7J rg8k8zNhu/acNKuocP/uxHQY5YeV85E6IavOfloWOZE5oJJtWoptIIVo15Co6uP0t6N9 ri14MCfjkiAPfa9zqM2PqSy5TwTMXKMXLoazgPGAqZLkM1PVrOzi9UIXI3Sk2BJNkI85 67gQ== X-Forwarded-Encrypted: i=1; AJvYcCWv7yWKaMohMgRNmPDfZdpskBEvIjFWbE13eEGEt4mvk45gs7gXEiMKwYs67MHq//xfb/FNJtkNVA==@kvack.org X-Gm-Message-State: AOJu0Yw3y5eIa79A+OVYrmjkchMemD+netfC/SznCQ962WTrZLRgFPM+ gwb0hSZr2mtWvFn9DEP54Wp068G6Th9kqnalc4kqTXzxRAKuEuY/gb0premCJcQBFuLhVrCrljA Xu+QZDS9iYHOrZpFh6S15ly/oH6vEgQewPHQAs2HTRXbWxQF2 X-Gm-Gg: ASbGncuvprxLFlCk9B1BBILUArHeXlUJLh2eVsK1t6aqoaBujSIpdTJIVd+ah08w+B7 ei7mcrGaipnxMavCMc7CPibSYofOxO01EtJeS+GmEyCiuJJpBLBWb81gMHyS0YORqHB3hPzxJI3 R6bYoQ3ILrGwWjKt3qfARfmsP8WvKcF5Mj9dsP6DkhzKt2u+HVhzJPH+EzTZEGXtgPF2nkDoOJ5 4ihyNh7Wdv46G2sQoqNnHnSOfWDl40FqscBOHGN9CniTXc4g6iG9j9NYOCq2W90Z4y8neUA/InK bQnR X-Received: by 2002:a05:620a:269f:b0:7c0:c568:2ab9 with SMTP id af79cd13be357-7c0cf953f67mr30938085a.35.1740083818095; Thu, 20 Feb 2025 12:36:58 -0800 (PST) X-Google-Smtp-Source: AGHT+IF4OTd1+4Kk9zClIjp8SNBVcJyux52H4N1MVp9g47S8DyVsnoN7iPs0orNbXKfk0IPgzBwf+w== X-Received: by 2002:a05:620a:269f:b0:7c0:c568:2ab9 with SMTP id af79cd13be357-7c0cf953f67mr30933585a.35.1740083817755; Thu, 20 Feb 2025 12:36:57 -0800 (PST) Received: from [192.168.2.110] ([70.52.22.87]) by smtp.gmail.com with ESMTPSA id af79cd13be357-7c096e87af7sm561315085a.79.2025.02.20.12.36.57 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 20 Feb 2025 12:36:57 -0800 (PST) Message-ID: <0a8bd481-0b97-416d-935e-84828016445d@redhat.com> Date: Thu, 20 Feb 2025 15:36:46 -0500 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH 1/4] mm: page_ext: add an iteration API for page extensions To: David Hildenbrand , linux-kernel@vger.kernel.org, linux-mm@kvack.org, yuzhao@google.com, pasha.tatashin@soleen.com Cc: akpm@linux-foundation.org, hannes@cmpxchg.org, muchun.song@linux.dev References: <3f0e058aef3951b39cf6bb4259c247352d4fe736.1739931468.git.luizcap@redhat.com> <4cb93166-29fd-4aea-965b-5dfb62d4dc8c@redhat.com> From: Luiz Capitulino In-Reply-To: <4cb93166-29fd-4aea-965b-5dfb62d4dc8c@redhat.com> X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: IZZMdMbH4ZqsB8jrHlHfpPrQUFR7CLFsr72jS3lKTDY_1740083818 X-Mimecast-Originator: redhat.com Content-Language: en-US, en-CA Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Rspam-User: X-Stat-Signature: dw1pgej36eofe9fhzcu6zs354iiqba8j X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 4C6B8180005 X-HE-Tag: 1740083823-848458 X-HE-Meta: U2FsdGVkX18/Mpmx5UJvzWI5jcZA6h9OCOE8GGnvyrZ+huIVclqDhtAe94LcYfqFys9ywfIg7BZt4liq/VaVHg/cIckCB5etq6J2ooyETOwULb6lUL6rE3zITN44cE3+UXNQRe9fN+FoLtwNCxi9JiEgd5CD6XWtHhobcsvJ/tYGwmg3+C1BR/CwpxGEjL2X7BQLmiC2ptp53BZn1FEH5vlgHGHMU/dqFEimXR5JBMhxIXJTbu+AbaIu/loFNXc9tBzAMknCZLmBfPMAUd7LXx9sr7ToYiwo+1iNH+EFHfihfGSRVk+CVZFS+e9gEUBcB0NmJ04SPLQTF0JQdgrdKdMvsNGJQECnrw9UV1f/KYqojt/Ng5NGEn9b4kPU1Un3lwcy6lywF81KwVeJc2efg+3L165LESpbucbjqkZxyHoCGl9TkA2UljxPMgirScWO8oKdWGNZRKlcq0EFOiDmzpzurt90bLJWAACcyW9oijIfo8vNfnx/qrozKJ/Ryl5aaE2/dKEarIjRAgPMa6svPJMMQqPxXa6tAPteGiXj40144ZaDNTKfEnlg8in6BFbFl6LPDjG9Tp3QehmPfDi/u7ol+FXvKjUvxEnwJ+t/G5MTTLG5ClTzQyI7cDxzI+GpjAvSL6LLPMNlKXDpSWbK7yP6wUAngxyHIlPYgeHuWxKJs5WpNB+xFnL5wxxwlXcdHghPUBazbVHCL/KgPYM3j7pJ1QasyQuPMF1w1TmMP4WZ+CuKBOQV9K2wsG+zimi/j8UWxXdDs70CCPgKZ8H1Js758R13TePDzV0mH6EG3Tpg0K+naJ6OZt9veBbxt8uYZppj8jfz7IdJJZtevzffnYUVsVuYuwx9RxQgfx2Gmq2OOXxvhyyEIDW52nkrkEXV+7G9pIyEMYFOsHELlQByyCMgQDoM7yLnW3TnUINg2qCUsGJo0ZB8ouqM+lfoT1t5AgPPiZM+NV39ExMqiIF SGmt1Nke Mhk0OosbAnN80tlX1HdqSauSSqXGtnNgaBZDmngPnUJUhUrJqmTCrhio24BrHPLVxJMlM0bhGcTuaFEecJS45O56Xi5yaUGcXYlHIlr9zAFWnzOf69G1P7pexsFUkppFYALt+0+az6sU0g4XmWvhLZ2dbvuTobGrQdE4cOPbu3wkBjaIKXhSKplgXcLoSzTTIGqgN3jL4yklK1PQVQNEy7DIzZCuoaVeM1d0VshfnQAA0MY1AGIuRQIpspu4gX2Qw18jZvS7oEtG7LJ0md+23Pmy4xUddAFBgaIisJN8k/57mOOjfyzJLRieQpuc7/tUO66zSxJS4I+/EYCkBAzqZoXyxS8C33UhJhNaQbKBH75AJ/yc8bC/eyTv6c0x7EjMSzCQceRDzLIZ5JWd38m+pPwnbTis+/2tljqm0ZHDmBPW+afElin1waRjEOr4Va5dib9RG X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 2025-02-20 05:59, David Hildenbrand wrote: > On 19.02.25 03:17, Luiz Capitulino wrote: >> The page extension implementation assumes that all page extensions of >> a given page order are stored in the same memory section. The function >> page_ext_next() relies on this assumption by adding an offset to the >> current object to return the next adjacent page extension. >> >> This behavior works as expected for flatmem but fails for sparsemem when >> using 1G pages. The commit cf54f310d0d3 ("mm/hugetlb: use __GFP_COMP for >> gigantic folios") exposes this issue, making it possible for a crash when >> using page_owner or page_table_check page extensions. >> >> The problem is that for 1G pages, the page extensions may span memory >> section boundaries and be stored in different memory sections. This issue >> was not visible before commit cf54f310d0d3 ("mm/hugetlb: use __GFP_COMP >> for gigantic folios") because alloc_contig_pages() never passed more than >> MAX_PAGE_ORDER to post_alloc_hook(). However, the series introducing >> mentioned commit changed this behavior allowing the full 1G page order >> to be passed. >> >> Reproducer: >> >>   1. Build the kernel with CONFIG_SPARSEMEM=y and table extensions >>      support >>   2. Pass 'default_hugepagesz=1 page_owner=on' in the kernel command-line >>   3. Reserve one 1G page at run-time, this should crash (backtrace below) >> >> To address this issue, this commit introduces a new API for iterating >> through page extensions. The main iteration loops are for_each_page_ext() >> and for_each_page_ext_order(). Both must be called with the RCU read >> lock taken. Here's an usage example: >> >> """ >> struct page_ext_iter iter; >> struct page_ext *page_ext; >> >> ... >> >> rcu_read_lock(); >> for_each_page_ext_order(page, order, page_ext, iter) { >>     struct my_page_ext *obj = get_my_page_ext_obj(page_ext); >>     ... >> } >> rcu_read_unlock(); >> """ >> > > [...] > >> +struct page_ext *page_ext_iter_begin(struct page_ext_iter *iter, struct page *page); >> +struct page_ext *page_ext_iter_next(struct page_ext_iter *iter); >> + >> +/** >> + * page_ext_iter_get() - Get current page extension >> + * @iter: page extension iterator. >> + * >> + * Return: NULL if no page_ext exists for this iterator. >> + */ >> +static inline struct page_ext *page_ext_iter_get(const struct page_ext_iter *iter) >> +{ >> +    return iter->page_ext; >> +} >> + >> +/** >> + * for_each_page_ext(): iterate through page_ext objects. >> + * @__page: the page we're interested in >> + * @__pgcount: how many pages to iterate through >> + * @__page_ext: struct page_ext pointer where the current page_ext >> + *              object is returned >> + * @__iter: struct page_ext_iter object (defined in the stack) >> + * >> + * IMPORTANT: must be called with RCU read lock taken. >> + */ >> +#define for_each_page_ext(__page, __pgcount, __page_ext, __iter) \ >> +    __page_ext = page_ext_iter_begin(&__iter, __page);     \ > > Doing stuff out of the loop is dangerous. Assume something does > > if (xxx) >     for_each_page_ext() > > Just move that inside the for(). > > for (__page_ext = page_ext_iter_begin(&__iter, __page), __iter.index = 0) Oh, good point. Will do the change. >> +    for (__iter.index = 0;                                 \ >> +        __page_ext && __iter.index < __pgcount;        \ >> +        __page_ext = page_ext_iter_next(&__iter),      \ >> +        __iter.index++) > > Hm, if we now have an index, why not turn iter.pfn -> iter.start_pfn, and only adjust the index in page_ext_iter_next? > > Then you can set the index to 0 in page_ext_iter_begin() and have here > > for (__page_ext = page_ext_iter_begin(&__iter, __page), >      __page_ext && __iter.index < __pgcount, >      __page_ext = page_ext_iter_next(&__iter);) I can do this if you feel strong about it, but I prefer explicitly over implicitly. I moved the index into the iter object just to avoid having to define it in the macro's body. Also, the way I did it allows for using page_ext_iter_begin()/page_ext_iter_next() own their if the need arises. > A page_ext_iter_reset() could then simply reset the index=0 and > lookup the page_ext(start_pfn + index) == page_ext(start_pfn) Just note we don't have page_ext_iter_reset() today (and I guess it's not needed). >> + >> +/** >> + * for_each_page_ext_order(): iterate through page_ext objects >> + *                            for a given page order >> + * @__page: the page we're interested in >> + * @__order: page order to iterate through >> + * @__page_ext: struct page_ext pointer where the current page_ext >> + *              object is returned >> + * @__iter: struct page_ext_iter object (defined in the stack) >> + * >> + * IMPORTANT: must be called with RCU read lock taken. >> + */ >> +#define for_each_page_ext_order(__page, __order, __page_ext, __iter) \ >> +    for_each_page_ext(__page, (1UL << __order), __page_ext, __iter) >> + >>   #else /* !CONFIG_PAGE_EXTENSION */ >>   struct page_ext; >> diff --git a/mm/page_ext.c b/mm/page_ext.c >> index 641d93f6af4c1..508deb04d5ead 100644 >> --- a/mm/page_ext.c >> +++ b/mm/page_ext.c >> @@ -549,3 +549,44 @@ void page_ext_put(struct page_ext *page_ext) >>       rcu_read_unlock(); >>   } >> + >> +/** >> + * page_ext_iter_begin() - Prepare for iterating through page extensions. >> + * @iter: page extension iterator. >> + * @page: The page we're interested in. >> + * >> + * Must be called with RCU read lock taken. >> + * >> + * Return: NULL if no page_ext exists for this page. >> + */ >> +struct page_ext *page_ext_iter_begin(struct page_ext_iter *iter, struct page *page) >> +{ >> +    iter->pfn = page_to_pfn(page); >> +    iter->page_ext = lookup_page_ext(page); >> + >> +    return iter->page_ext; >> +} >> + >> +/** >> + * page_ext_iter_next() - Get next page extension >> + * @iter: page extension iterator. >> + * >> + * Must be called with RCU read lock taken. >> + * >> + * Return: NULL if no next page_ext exists. >> + */ >> +struct page_ext *page_ext_iter_next(struct page_ext_iter *iter) >> +{ >> +    if (WARN_ON_ONCE(!iter->page_ext)) >> +        return NULL; >> + >> +    iter->pfn++; > > +> +    if (page_ext_iter_next_fast_possible(iter->pfn)) { >> +        iter->page_ext = page_ext_next(iter->page_ext); >> +    } else { >> +        iter->page_ext = lookup_page_ext(pfn_to_page(iter->pfn)); >> +    } >> + >> +    return iter->page_ext; >> +} > > We now always have a function call when calling into page_ext_iter_next(). Could we move that to the header and rather expose lookup_page_ext() ? I personally don't like over-using inline functions, also I don't think this code needs optimization since the current clients make the affected code paths slow anyways (and this also applies to the likely/unlikely use in page_owner and page_table_check, I'd drop all of them if you ask me). But again, I can change if this would prevent you from giving your ACK :)