From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3254EC282EC for ; Tue, 11 Mar 2025 20:58:07 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A0B40280004; Tue, 11 Mar 2025 16:58:05 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 994F0280001; Tue, 11 Mar 2025 16:58:05 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 80E4C280004; Tue, 11 Mar 2025 16:58:05 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 60120280001 for ; Tue, 11 Mar 2025 16:58:05 -0400 (EDT) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 264831A0670 for ; Tue, 11 Mar 2025 20:58:06 +0000 (UTC) X-FDA: 83210482572.15.D47086B Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf29.hostedemail.com (Postfix) with ESMTP id 7DC17120008 for ; Tue, 11 Mar 2025 20:58:04 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=LvGZFBaT; spf=pass (imf29.hostedemail.com: domain of sj@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=sj@kernel.org; dmarc=pass (policy=quarantine) header.from=kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1741726684; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=eSw30pV8JzwtPKQH8eHe2EIRWXPs9XVC0eBxIVeZlTE=; b=KOM8UIX5LXOVaEVU4vCWBl29LVa8KnDXlGh5/Yo1s3oWgDhrTM3LopYN+qDRKoKg4qHIsz IxMNNtxajEmwScEasJTqZNGiV368wZIEMVKzKAL5bhexC+q7U7BvEbhmL17oces2SG/f4b g5TkpkCK4A/yXVMv6DGruCDvpwdBppc= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1741726684; a=rsa-sha256; cv=none; b=TjRiF6JjoovHhsLjGMXp53GtUGAnRAZpb7puL0SjoLU3eElduiOtNjaHqFqDwTSBE6K3an INGBkDlfINbEiDvr20YwmzezIY3XobQULdlKiIrUWWvF2CJrhCg5Qva6PMSH78FQsUDxMZ rkQJ4Y6SEol9NRp70/qI1Tnc3Mn9OxE= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=LvGZFBaT; spf=pass (imf29.hostedemail.com: domain of sj@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=sj@kernel.org; dmarc=pass (policy=quarantine) header.from=kernel.org Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by dfw.source.kernel.org (Postfix) with ESMTP id 049805C0ECF; Tue, 11 Mar 2025 20:55:47 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 512ECC4CEE9; Tue, 11 Mar 2025 20:58:03 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1741726683; bh=SK+SYOnBQrPOlVqhunMya7xbfpWHCq0YSwjPq1HLipw=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=LvGZFBaTDTAuVlILLegfdTgXxbCAYYWcB18xC7bc6lZaEze4BXtwxenTVwWWhdLDG xZMEADH2TzQWEj4fygROIVKZDzbWmbnbiFkAXSYlPg7nLnm/s+PYeM8VhFmlCXzXT/ jVGcVNn/gHx1oth9nlIj5CSTbrNs5zT3bCUtPkUk2WKcEUByUbsSiQoygf3M0gjkT8 t+6oxdJVZQmkZNqmsOQ2KhYViohwpNTzhPSMYmAXRKFgB4l8ebklaFCLdMWeGB61hw +0OmztfrFMCZxYt8lO4tAx83F4sUnlQtjsDFvBaBVCM51TzS3Rfy+6Pih10ZrbXJrF 8kVGjKzfdNvfQ== From: SeongJae Park To: Lorenzo Stoakes Cc: SeongJae Park , Andrew Morton , "Liam R. Howlett" , David Hildenbrand , Shakeel Butt , Vlastimil Babka , kernel-team@meta.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Rik van Riel Subject: Re: [PATCH 6/9] mm/memory: split non-tlb flushing part from zap_page_range_single() Date: Tue, 11 Mar 2025 13:58:01 -0700 Message-Id: <20250311205801.85356-1-sj@kernel.org> X-Mailer: git-send-email 2.39.5 In-Reply-To: <6250fc68-2ce8-43a8-a064-e24877033ce1@lucifer.local> References: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Rspam-User: X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 7DC17120008 X-Stat-Signature: rix3wp4b9st7ty9czcm785ofk9ktkxm1 X-HE-Tag: 1741726684-159997 X-HE-Meta: U2FsdGVkX1+G1irgUQnrA5ZWd+SwCmiy/5SBGktfv+NdZjIPeY6XNQdwxtgxAOnMZlw8g7Q6AD/ryiQ0CSLH/NHVUJBemSeS3RmgBxaZmcnF/7wgfXBVhJ80sx6WpWHAtqtsXNmCnUm6PFw6XlyaUkH0Q5dYSy+dBK16HQDX7gSU+qXn6pdqTDvEwokz1y8LsHJUk6BsVg6l7bef0LDo/6IeSqsu/h3bGa7LoXMEhwrVjmn82O4kLZouIibvUag9P8KmOb52EdstnFtthRJ6Hn6N7PhbDRCb1mWiivj+416s/89DNcwVa7Es9CXxSpA0HZPyALX0G10BGdYgkqVynauY4AaSVgrRmw4e758BSelN6yjIJjoy6sKqQ/L+XFZ3FXf6Kk9cEPyvMjzwGtA5sC1xqF0p2BpGoyhuHi0I9ULxM8l+GuqrLmOhTCNyocEfHe2bLxC+FPri1nsrJ+ds1R/CY88RyD8bYW4XIsU2+XeiusinAJWsSHQaUdgjMqRI/Obx3gstsKw+wFhwp8yLdifPngQNSnw8n96CtwwGfroWDdhGvI4qn1gQftb7mrlzRsYBkGaf0lmh0dSmTE9q7gcXOdF07M4LR+KqWyfmrI1xtA7xENquuEh8P8fNYLozGhPcNXMvzcUNgEHO4lJiMi5K6l+pLePLBh7EyIuqJrjF3pkLldXIQqMQKkamTsQ3gEXOXj0sa1Vr752lv4eDu3+bZgkkhdckxl7yd5etZEP4CJhfqsBRzWUvGCkIu/vNAiRDB+tLAYMXS9nzR8hQoXQPCSGv41yU7VyqiNDbPEYYgtCGbSIWeezGnrWk1vXTrOCxaerIoXh6VKRdfEUdUSTUUBHeS8dxi4UTllwiCGZYiepvdSu2myTlKG3PmygBYfC8+jhcXJ5J+gSd8+hm9VtDs3duWC6iehvysQdQjvLuflPJgoeo09Kv4WbJI2kzbZWSy2/jcn/hKrj3mp2 qulula1f aZ/rx64N+aHwjmDKQ2Mp1pLa848nkCp044pM2o5lXMRjQ2HOd9o22wLR6cHBA3uqXJCI4Hduy+ISPUUVW1EWdU7lE6LiI7UCxkjaKPKZxMUJ1aODZyLmq2CITqPR8dVwyCfXpRmlEP/Nt8/b9E43WOIwiOwR9+Jso4r9liH/fhZ9NcmFgjvm9U/rNM9ah7aGpjs8I0kx+RqhmSBw7vgCM5uOUeQPKQrPT7DotyjPblhZUDQz/Tma2z+fzYJcsRjoIYYaAnszVooGeoedhb88TGu4+0kjFlhm0LjCEXiqtHRX7rylTZCt9uAYAGg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, 11 Mar 2025 12:45:44 +0000 Lorenzo Stoakes wrote: > On Mon, Mar 10, 2025 at 10:23:15AM -0700, SeongJae Park wrote: > > Some of zap_page_range_single() callers such as [process_]madvise() with > > MADV_DONEED[_LOCKED] cannot batch tlb flushes because > > zap_page_range_single() does tlb flushing for each invocation. Split > > out the body of zap_page_range_single() except mmu_gather object > > initialization and gathered tlb entries flushing parts for such batched > > tlb flushing usage. > > > > Signed-off-by: SeongJae Park > > --- > > mm/memory.c | 36 ++++++++++++++++++++++-------------- > > 1 file changed, 22 insertions(+), 14 deletions(-) > > > > diff --git a/mm/memory.c b/mm/memory.c > > index 78c7ee62795e..88c478e2ed1a 100644 > > --- a/mm/memory.c > > +++ b/mm/memory.c > > @@ -1995,38 +1995,46 @@ void unmap_vmas(struct mmu_gather *tlb, struct ma_state *mas, > > mmu_notifier_invalidate_range_end(&range); > > } > > > > -/** > > - * zap_page_range_single - remove user pages in a given range > > - * @vma: vm_area_struct holding the applicable pages > > - * @address: starting address of pages to zap > > - * @size: number of bytes to zap > > - * @details: details of shared cache invalidation > > - * > > - * The range must fit into one VMA. > > - */ > > -void zap_page_range_single(struct vm_area_struct *vma, unsigned long address, > > +static void unmap_vma_single(struct mmu_gather *tlb, > > + struct vm_area_struct *vma, unsigned long address, > > unsigned long size, struct zap_details *details) > > { > > const unsigned long end = address + size; > > struct mmu_notifier_range range; > > - struct mmu_gather tlb; > > > > mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma->vm_mm, > > address, end); > > hugetlb_zap_begin(vma, &range.start, &range.end); > > - tlb_gather_mmu(&tlb, vma->vm_mm); > > update_hiwater_rss(vma->vm_mm); > > mmu_notifier_invalidate_range_start(&range); > > /* > > * unmap 'address-end' not 'range.start-range.end' as range > > * could have been expanded for hugetlb pmd sharing. > > */ > > - unmap_single_vma(&tlb, vma, address, end, details, false); > > + unmap_single_vma(tlb, vma, address, end, details, false); > > mmu_notifier_invalidate_range_end(&range); > > - tlb_finish_mmu(&tlb); > > hugetlb_zap_end(vma, details); > > Previously hugetlb_zap_end() would happen after tlb_finish_mmu(), now it happens > before? > > This seems like a major problem with this change. Oh, you're right. This could re-introduce the racy hugetlb allocation failure problem that fixed by commit 2820b0f09be9 ("hugetlbfs: close race between MADV_DONTNEED and page fault"). That is, this patch can make hugetlb allocation failures increase while MADV_DONTNEED is going on. Maybe a straightforward fix of the problem is doing hugetlb_zap_end() for all vmas in a batched manner, similar to that for tlb flush. For example, add a list or an array for the vmas in 'struct madvise_behavior', let 'unmap_vma_single()' adds each vma in there, and call hugetlb_zap_end() for gathered vmas at vector_madvise() or do_madvise(). Does that make sense? Also Cc-ing Rik, who is the author of the commit 2820b0f09be9 ("hugetlbfs: close race between MADV_DONTNEED and page fault") for a case that I'm missing something important. > If not you need to explain why > not in the commit message. I now think it is a problem. If it turns out I'm wrong, I will of course add the reason on the commit message. Thanks, SJ [...]