From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 61C8BC0219B for ; Tue, 11 Feb 2025 09:43:22 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B3CE96B007B; Tue, 11 Feb 2025 04:43:21 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id AED676B0082; Tue, 11 Feb 2025 04:43:21 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 98D766B0083; Tue, 11 Feb 2025 04:43:21 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 7D20C6B007B for ; Tue, 11 Feb 2025 04:43:21 -0500 (EST) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 270F5121426 for ; Tue, 11 Feb 2025 09:43:20 +0000 (UTC) X-FDA: 83107175760.06.0959566 Received: from mail-pl1-f174.google.com (mail-pl1-f174.google.com [209.85.214.174]) by imf05.hostedemail.com (Postfix) with ESMTP id 98FFF100013 for ; Tue, 11 Feb 2025 09:43:16 +0000 (UTC) Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=VSGQxkpH; spf=pass (imf05.hostedemail.com: domain of zhengqi.arch@bytedance.com designates 209.85.214.174 as permitted sender) smtp.mailfrom=zhengqi.arch@bytedance.com; dmarc=pass (policy=quarantine) header.from=bytedance.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1739266998; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Vr9sqfyFIOUURX1EuDQ0UB7aKZBVOm3DmIxM7ELCYpI=; b=c3lq5FAWCF+YPd7Mr8xuRGgylQOw2qH3Q28iPT/I33WtENI1mrQXfIxm2/MeOUik6PN9Tn x/nlvf45fvIp21dgPHrFrOmU8O6m3ioX1qGCNQzhzfJArgwnKQ/2SABQqzrEQzs8WnHoQO qL/oHkSdTFRRo3aQoxOLNHLJ0GJg7KY= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1739266998; a=rsa-sha256; cv=none; b=qAD+SZIdOskN6x9ZWAJeMWMfHHu2Ez5pU2WEw7TvW+b9PO0v8SWjM8d92E3jD91bCwteaT kgmhL5jjaz9PB/p1l6kVsV9ucuuYmBBEOrUXPhgdKHBp8Z2nkirvZbjoBNQ5hpqfWTdmqR kZIaAGffSwlKHKQEsGTfQj8dGKiVarU= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=VSGQxkpH; spf=pass (imf05.hostedemail.com: domain of zhengqi.arch@bytedance.com designates 209.85.214.174 as permitted sender) smtp.mailfrom=zhengqi.arch@bytedance.com; dmarc=pass (policy=quarantine) header.from=bytedance.com Received: by mail-pl1-f174.google.com with SMTP id d9443c01a7336-21f40deb941so112289875ad.2 for ; Tue, 11 Feb 2025 01:43:16 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1739266995; x=1739871795; darn=kvack.org; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=Vr9sqfyFIOUURX1EuDQ0UB7aKZBVOm3DmIxM7ELCYpI=; b=VSGQxkpHQeMAwLvdNpJuG5/+e7Dk2tlcWgzH4WSKDSwc8I849P7JLy+4cytHhPZwzZ ibQC1jQvb7v2AgYyOIGLsedbVz8g9S+GoPCrJ/wttd81E/bpbLfcu+zIHrXtvhyClpD1 kbjFMyrJ4j/o5XdCKOL0Q3j1X2vHG+Pg+AKuewdGA5pYagGWEO+liHNCf6ZLxYJgy7oM 6a+Hz/LZtHgX6yGnJU3nRd3nemPhb8kRxwr7Bor6IIx4985f3q84HMppHi0FFK1nu7BP vpPytNVFq7GuHAnQ5ZzszX72jeovN1E7/xVf4zfmEaCkaZWjr//XKJ1mHYucxd2rJ56S cYug== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1739266995; x=1739871795; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=Vr9sqfyFIOUURX1EuDQ0UB7aKZBVOm3DmIxM7ELCYpI=; b=ZaGsbI5SFf9KL3VFksxqalY5K6bqvPUuw0APw2bt94qyWcIiCa8Ma61cBO90VdBK+m RLzL0X+Q+r0m2khGAs6E58VlKOLHd/qtLtsqexnhyLLvn1sjIt2yHly5gPqLJXQtFh8c DrPCfZTxHfnrv2uOkRBjZjoJUK5/2G/KyVnXGDEFGZ554BD3iD1Xq6dDlcx9tmI5C+H7 9umVZC7Q6iEcMVaTynVVoNOD90BCm0k+4wXpDDRQW60C+NHTQDBkM++7wLotEbeH1cmT 29ekRecs/AfdcOXiFau+q7c0ceGlyi+DJkPPQIa/rG9fmMX0u2X6lvk6g+T7ZwnYI0AP lAcQ== X-Forwarded-Encrypted: i=1; AJvYcCXgkNeg5n2USybf5BR/bwPS2iVHyhqtb8MJ3YskeAvUcQr3hZQBshAAaG8ioXbw19y7PndgSJU/Fg==@kvack.org X-Gm-Message-State: AOJu0YyBnEPViJqk87+63QW09p/UPSunP/Z5c2o/E4m7H61Y5Ftf5Ucw m1LoCNGvoea5ewdvHiztUpN/6FMv17BoHN8/pTWB161Y+dsjOzmg7syX43STWr0= X-Gm-Gg: ASbGncvOq4O4iKgKmZSQ2gDbEDHBmpqNDW24yDqPPc+1sIyXo/htmb5x6OKdaToPl6e 9HtcLK0XzvDbLMbtPNv73A/djgzw4XArZ8qDgUmjMKgZWvMWj4KjcK4pJP/i9KTkT0W/JbhQv1u Cexy1Gq/kmNs7UI/oJg42dlkyE9ek+IuJfSm1l/z9vACcIPcmOz/fErPc1aR3KMQJ80NZwvuKbx 1iPd+jC2V2q6vCgcaP3yDc4aGC0libR8OH9iZrwkNuY1as+yR+Gl+Wi/xMHaSG117YyMYeZzKki zx56lG9RUXkdftRpZyruk3uDtQIVxQbiaby/W7BlFw== X-Google-Smtp-Source: AGHT+IFfrAytYWDrtqmvvt0vj9r6BCWYzM3f2Zy62k1Yw1acXBgtzij9wDi4ObcP0N5l9sE5XD+k0A== X-Received: by 2002:a17:903:22ce:b0:21f:805c:fd98 with SMTP id d9443c01a7336-21f805cfe1fmr153489455ad.21.1739266995450; Tue, 11 Feb 2025 01:43:15 -0800 (PST) Received: from [10.84.150.121] ([203.208.167.153]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-21f36561397sm92761155ad.89.2025.02.11.01.43.09 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 11 Feb 2025 01:43:15 -0800 (PST) Message-ID: <4e298f68-36ff-496a-81d2-7124f792180d@bytedance.com> Date: Tue, 11 Feb 2025 17:43:07 +0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [REGRESSION] NULL pointer dereference on ARM (AT91SAM9G25) during compaction Content-Language: en-US To: David Hildenbrand Cc: "Russell King (Oracle)" , Ezra Buehler , linux-mm@kvack.org, Andrew Morton , "Mike Rapoport (Microsoft)" , Muchun Song , Vlastimil Babka , Ryan Roberts , "Vishal Moola (Oracle)" , Hugh Dickins , Matthew Wilcox , Peter Xu , Nicolas Ferre , Alexandre Belloni , Claudiu Beznea , open list , linux-arm-kernel@lists.infradead.org References: <5d50d714-197f-44c0-94e0-ff70ee51e866@bytedance.com> <34bcf011-b4ac-479c-92ce-852623e73039@redhat.com> <3f7babee-b232-4e6b-a896-947150dcd1ef@bytedance.com> <2b0bb476-5bd6-489a-9b9e-7aa20964abfa@redhat.com> From: Qi Zheng In-Reply-To: <2b0bb476-5bd6-489a-9b9e-7aa20964abfa@redhat.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Rspam-User: X-Rspamd-Queue-Id: 98FFF100013 X-Rspamd-Server: rspam07 X-Stat-Signature: t4msmwhcf4hsiejbr61ybx4g4gwnuf4c X-HE-Tag: 1739266996-996550 X-HE-Meta: U2FsdGVkX19VYLnE5wQ+465gBx4Maj62F6hYyAhzwQdvKMLNzJbGNLP4inLW70qnSeg/n5IddIW9hb1QF9DKY1ffwRqMcdq+UtQtNxu55xuGvFFkc609kloM+dj3iEMDCGdb1yCp7pFFCAKZsEkyFvD1uKxxAgWcgy2RqorcBu13V00AVwL26a85cyLr7GbPrFU6GmJ4rnOF/DXBz4Fw1xHsGYwHdBEMjgdQfit68wPy01YbiMMwNwltBYp5wIl8zMxlK7rgPP3mpOVjpIU7bkS7I1aYW5zxAW2UVlm3pH6nmAyU1xiC4HUfAZKp5XkdGXdzdYmUf8S4HGo/I/Q175igmqIVlN2DUEdr8pxl76bfsk8vVblg7/GtoHkNmkd7vhS3oBVNB6XTjMwD+rVmDecgX0y3Gr38+ZBtNxryR6meiuyM0FesSnE5Gek95QyEndx/I4LlD1kC+0KcgcuVtomEt5dXgIFajwEqOAC+DVoEjsFWAPet3TOYHSkVRirOl3EO9bSm02XMYVEsVoGkgPYH11ZpGgbd5BbRAEd6McS8qUVQVEiYy5cN1tbd5wUavp0O+Hor3E62zdWBxJQyXPlnnbg9TkjJNtChiNVsBxAQV0TiJO7MTOktN32b8VjNCC56naGAL6xS54RXrLz08FFgKIp8v2aBnIWZHH8BmoTTJ/Hq/vdRL7Z7DMm8Ku6s2rJYXL8iy5VYBP4Wj4oBl9uiZMI7QdcoE6agVIzSsKqxx9j8GTcX87xTg9sbWT7Adw7IBguUv4ZqD1zTJ5qWDr/i94Sc36AQ9NbGYkQQWk2CqibI11SzVaPAE5o/4UrX6CiIjVJ/pMoJmA1GlR2WiNTefNPXkNulhOiTt2Om6WYGdcm1mIXhmNkA91hiEAs0cd2+P7EgZoUyesrywmwjMcodX2TNuuBs7aCgaWHdWH8eDlMXAKXl83IlHqvLOx8e2yDatMsy7JpHkPp5qbg K2gr+TrN y+hO5/Artn12AimykHbZWaA4WUzd6pI0wGGQVhEku2gtQESBUdI531hTcq0drd+nRudZgWpbOjK/FGHNehEue7tJqW0WgMziJLNwy63nKlPV7qUjs+kHK6MS4+XHwTW1OuNcuJkWJ7bN7oFjthXkg0hnE8Gg6oRAp6Q8PT9eTczZQmooIjX8pBhVEI8bpNVW68IVUxA9sYv01lUT2VPLSXEksQHvi420aq9WtNn7I3Aj+wdwpCK/bSAgkB4ZpoY3HnDVIj9lKgeIXywgEebKdRqr3SRLeOJ+UWxTET2FdaZctKeClDKQ/Pta0ASPZeKTxs+GDYSqNqefQ8c6ElWoWmdDsAXL4ea3NYFOeN2SXl1yCyX7oPklseTT/93IK+F0GELlOqOJ81T7tcxpfAyDrlk2iGNC1wSIfDAah8lEafSOWcgIbtEfkd5XbBA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 2025/2/11 17:37, David Hildenbrand wrote: > On 11.02.25 10:29, Qi Zheng wrote: >> >> >> On 2025/2/11 17:14, David Hildenbrand wrote: >>> On 11.02.25 04:45, Qi Zheng wrote: >>>> Hi Russell, >>>> >>>> On 2025/2/11 01:03, Russell King (Oracle) wrote: >>>>> On Mon, Feb 10, 2025 at 05:49:38PM +0100, Ezra Buehler wrote: >>>>>> When running vanilla Linux 6.13 or newer (6.14-rc2) on the >>>>>> AT91SAM9G25-based GARDENA smart Gateway, we are seeing a NULL pointer >>>>>> dereference resulting in a kernel panic. The culprit seems to be >>>>>> commit >>>>>> fc9c45b71f43 ("arm: adjust_pte() usepte_offset_map_rw_nolock()"). >>>>>> Reverting the commit apparently fixes the issue. >>>>> >>>>> The blamed commit is buggy: >>>>> >>>>> arch/arm/include/asm/tlbflush.h: >>>>> #define update_mmu_cache(vma, addr, ptep) \ >>>>>            update_mmu_cache_range(NULL, vma, addr, ptep, 1) >>>>> >>>>> So vmf can be NULL. This didn't used to matter before this commit, >>>>> because vmf was not used by ARM's update_mmu_cache_range(). However, >>>>> the commit introduced a dereference of it, which now causes a NULL >>>>> point dereference. >>>>> >>>>> Not sure what the correct solution is, but at a guess, both: >>>>> >>>>>      if (ptl != vmf->ptl) >>>>> >>>>> need to become: >>>>> >>>>>      if (!vmf || ptl != vmf->ptl) >>>> >>>> No, we can't do that, because without using split PTE locks, we would >>>> use shared mm->page_table_lock, which would create a deadlock. >>> >>> Maybe we can simply special-case on CONFIG_SPLIT_PTE_PTLOCKS ? >>> >>> if (IS_ENABLED(CONFIG_SPLIT_PTE_PTLOCKS)) { >> >> In this case, if two vmas map the same PTE page, then the same PTE lock >> will be held repeatedly. Right? > > Hmm, the comment says: > >         /* >          * This is called while another page table is mapped, so we >          * must use the nested version.  This also means we need to >          * open-code the spin-locking. >          */ > > "another page table" implies that it cannot be the same. But maybe that > comment was also wrong? I don't see make_coherent() ensuring this when traversing vma. I therefore propose the following changes: diff --git a/arch/arm/mm/fault-armv.c b/arch/arm/mm/fault-armv.c index 2bec87c3327d2..dddbca9a2597e 100644 --- a/arch/arm/mm/fault-armv.c +++ b/arch/arm/mm/fault-armv.c @@ -61,8 +61,41 @@ static int do_adjust_pte(struct vm_area_struct *vma, unsigned long address, return ret; } +#if defined(CONFIG_SPLIT_PTE_PTLOCKS) +/* + * If we are using split PTE locks, then we need to take the pte + * lock here. Otherwise we are using shared mm->page_table_lock + * which is already locked, thus cannot take it. + */ +static inline bool do_pte_lock(spinlock_t *ptl, pmd_t pmdval, pmd_t *pmd) +{ + /* + * Use nested version here to indicate that we are already + * holding one similar spinlock. + */ + spin_lock_nested(ptl, SINGLE_DEPTH_NESTING); + if (unlikely(!pmd_same(pmdval, pmdp_get_lockless(pmd)))) { + spin_unlock(ptl); + return false; + } + + return true; +} + +static inline void do_pte_unlock(spinlock_t *ptl) +{ + spin_unlock(ptl); +} +#else /* !defined(CONFIG_SPLIT_PTE_PTLOCKS) */ +static inline bool do_pte_lock(spinlock_t *ptl) +{ + return true; +} +static inline void do_pte_unlock(spinlock_t *ptl) {} +#endif /* defined(CONFIG_SPLIT_PTE_PTLOCKS) */ + static int adjust_pte(struct vm_area_struct *vma, unsigned long address, - unsigned long pfn, struct vm_fault *vmf) + unsigned long pfn) { spinlock_t *ptl; pgd_t *pgd; @@ -99,23 +132,14 @@ static int adjust_pte(struct vm_area_struct *vma, unsigned long address, if (!pte) return 0; - /* - * If we are using split PTE locks, then we need to take the page - * lock here. Otherwise we are using shared mm->page_table_lock - * which is already locked, thus cannot take it. - */ - if (ptl != vmf->ptl) { - spin_lock_nested(ptl, SINGLE_DEPTH_NESTING); - if (unlikely(!pmd_same(pmdval, pmdp_get_lockless(pmd)))) { - pte_unmap_unlock(pte, ptl); - goto again; - } + if (!do_pte_lock(ptl, pmdval, pmd)) { + pte_unmap(pte); + goto again; } ret = do_adjust_pte(vma, address, pfn, pte); - if (ptl != vmf->ptl) - spin_unlock(ptl); + do_pte_unlock(ptl); pte_unmap(pte); return ret; @@ -123,16 +147,17 @@ static int adjust_pte(struct vm_area_struct *vma, unsigned long address, static void make_coherent(struct address_space *mapping, struct vm_area_struct *vma, - unsigned long addr, pte_t *ptep, unsigned long pfn, - struct vm_fault *vmf) + unsigned long addr, pte_t *ptep, unsigned long pfn) { struct mm_struct *mm = vma->vm_mm; struct vm_area_struct *mpnt; unsigned long offset; + unsigned long start; pgoff_t pgoff; int aliases = 0; pgoff = vma->vm_pgoff + ((addr - vma->vm_start) >> PAGE_SHIFT); + start = ALIGN_DOWN(addr, PMD_SIZE); /* * If we have any shared mappings that are in the same mm @@ -141,6 +166,8 @@ make_coherent(struct address_space *mapping, struct vm_area_struct *vma, */ flush_dcache_mmap_lock(mapping); vma_interval_tree_foreach(mpnt, &mapping->i_mmap, pgoff, pgoff) { + unsigned long mpnt_addr; + /* * If this VMA is not in our MM, we can ignore it. * Note that we intentionally mask out the VMA @@ -151,7 +178,14 @@ make_coherent(struct address_space *mapping, struct vm_area_struct *vma, if (!(mpnt->vm_flags & VM_MAYSHARE)) continue; offset = (pgoff - mpnt->vm_pgoff) << PAGE_SHIFT; - aliases += adjust_pte(mpnt, mpnt->vm_start + offset, pfn, vmf); + mpnt_addr = mpnt->vm_start + offset; + /* + * If mpnt_addr and addr are mapped to the same PTE page, + * also skip this vma. + */ + if (mpnt_addr >= start && mpnt_addr - start < PMD_SIZE) + continue; + aliases += adjust_pte(mpnt, mpnt_addr, pfn); } flush_dcache_mmap_unlock(mapping); if (aliases) @@ -194,7 +228,7 @@ void update_mmu_cache_range(struct vm_fault *vmf, struct vm_area_struct *vma, __flush_dcache_folio(mapping, folio); if (mapping) { if (cache_is_vivt()) - make_coherent(mapping, vma, addr, ptep, pfn, vmf); + make_coherent(mapping, vma, addr, ptep, pfn); else if (vma->vm_flags & VM_EXEC) __flush_icache_all(); } Make sense? > >