From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id CCB9EC3DA6E for ; Fri, 5 Jan 2024 10:25:46 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 404AF6B0092; Fri, 5 Jan 2024 05:25:46 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 3B45D6B0098; Fri, 5 Jan 2024 05:25:46 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 27C176B0095; Fri, 5 Jan 2024 05:25:46 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 195F76B0205 for ; Fri, 5 Jan 2024 05:25:46 -0500 (EST) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id D8F771C0CEC for ; Fri, 5 Jan 2024 10:25:45 +0000 (UTC) X-FDA: 81644876250.24.A8F37CE Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf12.hostedemail.com (Postfix) with ESMTP id 2327B4000D for ; Fri, 5 Jan 2024 10:25:43 +0000 (UTC) Authentication-Results: imf12.hostedemail.com; dkim=none; spf=pass (imf12.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com; dmarc=pass (policy=none) header.from=arm.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1704450344; a=rsa-sha256; cv=none; b=f1d4wUp9Awomkh7VOlf4B8PIY/x0TCa6NFoHIIF6tDsBhH5exQtyOnhngFh2LVa0tOs/km I1voZPWcpNMRaSXYPEi09mDSXSUA3WYqBBK9Ogkj77M2YiE07DsY88yn8qediRQaV6kWSV acVbKd0M0nHAVREWWYQodzjeQhw0ork= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=none; spf=pass (imf12.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com; dmarc=pass (policy=none) header.from=arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1704450344; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=14ks9XMg5r6Kpa4r2VuBVncav/FTs0YrtYsR0DUx3yQ=; b=zfQNeI7Lgy1zqBTUR+BDocggNgr7mn5FFhVuc67vTf4rCDxDpHLtA1Hmaa1fSiBdnF3xQZ fBU4IulSG/QIMIgogpspk36ULt1FLocf6rBEt4m0HVTGCb/9Waoa45t9wHiRXZu9wEskEN gW+uGnyCzvhuFGr73Gb+3PwSE/ueCv4= Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 7519DC15; Fri, 5 Jan 2024 02:26:29 -0800 (PST) Received: from [10.57.76.44] (unknown [10.57.76.44]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id D652A3F64C; Fri, 5 Jan 2024 02:25:41 -0800 (PST) Message-ID: <63eda2bc-6791-454d-b43e-20d73c7e9843@arm.com> Date: Fri, 5 Jan 2024 10:25:40 +0000 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [syzbot] [mm?] WARNING in __folio_rmap_sanity_checks Content-Language: en-GB To: David Hildenbrand , Yin Fengwei , syzbot , akpm@linux-foundation.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, syzkaller-bugs@googlegroups.com, Matthew Wilcox References: <000000000000014174060e09316e@google.com> <3feecbd6-b3bd-440c-a4f9-2a7dba3ff8f1@intel.com> <36ace74a-1de7-4224-8bc1-7f487764f6e2@redhat.com> <8bc02927-a0f0-490a-a014-0e100d30ffe4@intel.com> <1eb61435-c89c-4ca1-b1b6-aa00b3478cd2@arm.com> <556f8a4f-c739-41e0-85ec-643a0b32a2ce@redhat.com> From: Ryan Roberts In-Reply-To: <556f8a4f-c739-41e0-85ec-643a0b32a2ce@redhat.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Rspam-User: X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 2327B4000D X-Stat-Signature: o4rb4nxjmk6hpqpmdchdgiig76ymwu9b X-HE-Tag: 1704450343-970097 X-HE-Meta: U2FsdGVkX19CExJ9Qk+1G2OLrTAFzjGdVKJ/5dlHGxFHwAqTW1hMwUTk4pGVvVoGtwOK1xHJt1DrdXrX/U4XRky6ey2N0azfnOgga/yoG4Y51Fn0tgy42iX21WjlQabuQe/YsXQTUCEpw5+MrckuCqvfQyKWpV3CZyP2iDvY31yUH73g3JU6kVMbQzCNdwTSDxrZArBLGtl9PC4YexC5l2BlNDvwDYXWIEGdRqd6HbenaEaSWOCVGSngMo1OI5sg2mK2D7is7NPv9g3ckCw4MDTds/PbpGMo/vyjI8Ck10kn+05mI68nOeU1iwiaplMbxs9luPi5V7PVvzrYYVT9SAZA4HXpAz79t7kmVBUkMk+/iYQDpU8nn51kzR0efAYPWGlU/H6CLbDwpqM7jJM5kHZ9oT9DiD76zZTQwmZe6mte2nJIMthFFVtsI0KnuaS9h+hXB0mLoamReMgJ5PaLYsX/LnEyypv4Cr4G0IfJRJGcDzB74rwpI/g+AAFmx/iLr+Ilxz3ZNxlF4hDgIy/qQUwWGPe35gYvj5s9u3ZIRjV9Xxf1M49CSb83RcZMX16VM+/iiBK+VZJwL04Dz9IAToSl/5en4cgYMRDbemve6q3k31o25/WvdZFBhw0sEg3kucQcF8nXQOMbhn2uW4zQpMnqlTrIF1zldTo3uGXPx3IpbCR4UV9G5zhsWCqZaDlYgrjTiLlX0dN3mSD2DFQHhFeIP9qw3vFvfVFtwZPm6I/gfdXQGU/anT3lfZqbciHEsQL4Q/yf8D35PjddagiAQAzMNKc81CUuWWCG9svguzVkUoBcgj6U4ptra3y1jdfEDpw1zCvyuIUGruEeCJRwJIap0s42dvHr6LosWaHrZyVNLKqACaWHaJfX5sDqfflG+Z4A+JZM7OwR68/QKrxKzsuZhafkIP67j+wB0AnjyX/Wuto4YMVpd1IhxYAmBZDqMnN2PdnTdVGI7xhqwkX jpwC2YfY 8Gxz2FwsH1nV2SOkBuUDfI7WcSEn4srn/CSZvYF/CIt1/IUiNUQlwlgIyKWYoamqCCA5mcyfG/5lKcfahdJ8jbBGssPW/R/7JaurXdMLTzGnp/SigwR9v5TbB3Ci3aJt4dg5RAWAzbvBylfvF3VEUi+o1oJAoHe6SbM3mtj0qfg678OpY0Seg9ImAX48zxmroAoGMGfqM7gUx1p/JLZkNPrYGXzotQoXlkvaFQDF2QLpoZa5BjveN7S+9ubyEEEcTPx5Xov38U+7AktVqMdyxqvBK9xy9WajgjW7f+usifBIrslMEGFckfIC5v5XxvbNQxcxTd5b7BH5JBO7010z+aBfuTA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 05/01/2024 08:56, David Hildenbrand wrote: >>>>>> If I am not wrong, that triggers: >>>>>> >>>>>> VM_WARN_ON_FOLIO(folio_test_large(folio) && >>>>>>             !folio_test_large_rmappable(folio), folio); >>>>>> >>>>>> So we are trying to rmap a large folio that did not go through >>>>>> folio_prep_large_rmappable(). >> >> Would someone mind explaining the rules to me for this? As far as I can see, >> folio_prep_large_rmappable() just inits the _deferred_list and sets a flag so we >> remember to deinit the list on destruction. Why can't we just init that list for >> all folios order-2 or greater? Then everything is rmappable? > > I think we much rather want to look into moving all mapcount-related stuff into > folio_prep_large_rmappable(). It doesn't make any sense to initialize that for > any compound pages, especially the ones that will never get mapped to user space. > >> >>>>>> >>>>>> net/packet/af_packet.c calls vm_insert_page() on some pages/folios stoed >>>>>> in the "struct packet_ring_buffer". No idea where that comes from, but I >>>>>> suspect it's simply some compound allocation. >>>>> Looks like: >>>>>     alloc_pg_vec >>>>>       alloc_one_pg_vec_page >>>>>            gfp_t gfp_flags = GFP_KERNEL | __GFP_COMP | >>>>>                              __GFP_ZERO | __GFP_NOWARN | __GFP_NORETRY; >>>>> >>>>>            buffer = (char *) __get_free_pages(gfp_flags, order); >>>>> So you are right here... :). >>>> >>>> Hm, but I wonder if this something that's supposed to work or is this one of >>>> the cases where we should actually use a VM_PFN mapping? >>>> >>>> It's not a pagecache(file/shmem) page after all. >>>> >>>> We could relax that check and document why we expect something that is not >>>> marked rmappable. But it fells wrong. I suspect this should be a VM_PFNMAP >>>> instead (like recent udmabuf changes). >>> >>> VM_PFNMAP looks correct. >> >> And why is making the folio rmappable and mapping it the normal way not the >> right solution here? Because the folio could be order-1? Or something more >> profound? >> > > Think about it: we are adding/removing a page from rmap handling that can > *never* be looked up using the rmap because there is no rmap for these pages, > and folio->index is just completely unexpressive. VM_MIXEDMAP doesn't have any > linearity constraints. I guess I was assuming treating it the same way as anon folios. But I guess that would be VeryBad (TM) because these aren't anon pages and we don't want to swap, etc? OK got it. > > Logically, it doesn't make any sense to involve rmap code although it currently > might work. validate_page_before_insert() blocks off most pages where the > order-0 mapcount would be used for other purposes and everything would blow up. > > Looking at vm_insert_page(), this interface is only for pages the caller > allocated. Maybe we should just not do any rmap accounting when > mapping/unmapping these pages: not involve any rmap code, including mapcounts? > > vm_normal_page() works on these mappings, so we'd also have to skip rmap code > when unmapping these pages etc. Maybe that's the whole reason we have the rmap > handling here: to not special-case the unmap path. Right. I guess it depends what vm_insert_page() is spec'ed to expect; is the bug in the implementation or is the caller providing the wrong type of folio? I guess there will be many callers providing non-rmappable folios (inc out of tree?). > > Alternatively, we can: > > (1) Require the caller to make sure large folios are rmappable. We >     already require allocations to be compound. Should be easy to add. I'm not sure this is practical given vm_insert_page() is exported? > (2) Allow non-rmappable folios in rmap code just for mapcount tracking. >     Confusing but possible. > >>> >>> I do have another question: why do we just check the large folio >>> rmappable? Does that mean order0 folio is always rmappable? >>> > > We didn't really have a check for that I believe. We simply reject all pages in > vm_insert_page() that are problematic because the pagecount is overloaded. > >>> I ask this because vm_insert_pages() is called in net/ipv4/tcp.c >>> and drivers call vm_insert_page. I suppose they all need be VM_PFNMAP. > > Right, similar problem. >