From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0F750E77180 for ; Mon, 16 Dec 2024 07:05:41 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E9A176B0085; Mon, 16 Dec 2024 02:05:40 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id E4A226B0088; Mon, 16 Dec 2024 02:05:40 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D113C6B0089; Mon, 16 Dec 2024 02:05:40 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id B06E86B0085 for ; Mon, 16 Dec 2024 02:05:40 -0500 (EST) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 5A2CAC080E for ; Mon, 16 Dec 2024 07:05:40 +0000 (UTC) X-FDA: 82899936588.29.A1F8CDC Received: from dggsgout11.his.huawei.com (dggsgout11.his.huawei.com [45.249.212.51]) by imf20.hostedemail.com (Postfix) with ESMTP id 7EF5B1C0003 for ; Mon, 16 Dec 2024 07:05:05 +0000 (UTC) Authentication-Results: imf20.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf20.hostedemail.com: domain of shikemeng@huaweicloud.com designates 45.249.212.51 as permitted sender) smtp.mailfrom=shikemeng@huaweicloud.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1734332716; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=EmgckptD7qihLfOzZL1Z4fRZWHoNtlsT5lIIcyO2r4k=; b=HOwe9l6vYYLwRFif52miulsPq0sWbGeXE3OmUYtcxkSjyQzTXbqgueEuDAZaVRUvH8kcsB Yztf43PDZGfpK8qMs2VmdN39LfPE2kgA2aAI/LeUPCXx6Awx+i3EAwCpe2SA0xF0jvt4t6 cbHM6XGjmm0jARNdUFQgQeAnMPYvv+Y= ARC-Authentication-Results: i=1; imf20.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf20.hostedemail.com: domain of shikemeng@huaweicloud.com designates 45.249.212.51 as permitted sender) smtp.mailfrom=shikemeng@huaweicloud.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1734332716; a=rsa-sha256; cv=none; b=Wzb9ECS0M8YBJJub8of+oLgbWUs/iBCLr/Elar78MunNa5Zm9gxSD0h40WEVodWWOEHasi wq89eeGSxjqAg+QqjEL936HbHoLS0XYoOEYPckEK74lib3mH3HMC0D9dGjEF6YBugfXkE/ T+0BT93VUM5xws9z+Fs8kTtlAmuMIP8= Received: from mail.maildlp.com (unknown [172.19.163.235]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTP id 4YBWD41YMjz4f3kvP for ; Mon, 16 Dec 2024 15:05:08 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.128]) by mail.maildlp.com (Postfix) with ESMTP id A8E641A0568 for ; Mon, 16 Dec 2024 15:05:28 +0800 (CST) Received: from [10.174.178.129] (unknown [10.174.178.129]) by APP4 (Coremail) with SMTP id gCh0CgDHoYU30V9nL9r5Eg--.39298S2; Mon, 16 Dec 2024 15:05:28 +0800 (CST) Subject: Re: [PATCH v3 1/5] Xarray: Do not return sibling entries from xas_find_marked() To: Baolin Wang , akpm@linux-foundation.org, willy@infradead.org Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-nfs@vger.kernel.org References: <20241213122523.12764-1-shikemeng@huaweicloud.com> <20241213122523.12764-2-shikemeng@huaweicloud.com> <1f8b523e-d68f-4382-8b1e-2475eb47ae81@linux.alibaba.com> From: Kemeng Shi Message-ID: <5d89f26a-8ac9-9768-5fc7-af155473f396@huaweicloud.com> Date: Mon, 16 Dec 2024 15:05:26 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:60.0) Gecko/20100101 Thunderbird/60.5.0 MIME-Version: 1.0 In-Reply-To: <1f8b523e-d68f-4382-8b1e-2475eb47ae81@linux.alibaba.com> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-CM-TRANSID:gCh0CgDHoYU30V9nL9r5Eg--.39298S2 X-Coremail-Antispam: 1UD129KBjvJXoWxZFyfJw18CrW3KFW7Zr1kKrg_yoWrKF15pF Z5KryDKry0yr1kJrnrJ3WUXryUG34UXanrJrWrWa42vF15Ar1jgF4jqr1jgF1DJrWkJF4x JF4UA347ZF1UAr7anT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUkEb4IE77IF4wAFF20E14v26r4j6ryUM7CY07I20VC2zVCF04k2 6cxKx2IYs7xG6rWj6s0DM7CIcVAFz4kK6r1j6r18M28lY4IEw2IIxxk0rwA2F7IY1VAKz4 vEj48ve4kI8wA2z4x0Y4vE2Ix0cI8IcVAFwI0_Xr0_Ar1l84ACjcxK6xIIjxv20xvEc7Cj xVAFwI0_Gr1j6F4UJwA2z4x0Y4vEx4A2jsIE14v26rxl6s0DM28EF7xvwVC2z280aVCY1x 0267AKxVW0oVCq3wAS0I0E0xvYzxvE52x082IY62kv0487Mc02F40EFcxC0VAKzVAqx4xG 6I80ewAv7VC0I7IYx2IY67AKxVWUJVWUGwAv7VC2z280aVAFwI0_Jr0_Gr1lOx8S6xCaFV Cjc4AY6r1j6r4UM4x0Y48IcVAKI48JMxk0xIA0c2IEe2xFo4CEbIxvr21lc7CjxVAaw2AF wI0_JF0_Jw1l42xK82IYc2Ij64vIr41l4I8I3I0E4IkC6x0Yz7v_Jr0_Gr1lx2IqxVAqx4 xG67AKxVWUJVWUGwC20s026x8GjcxK67AKxVWUGVWUWwC2zVAF1VAY17CE14v26r126r1D MIIYrxkI7VAKI48JMIIF0xvE2Ix0cI8IcVAFwI0_Jr0_JF4lIxAIcVC0I7IYx2IY6xkF7I 0E14v26r1j6r4UMIIF0xvE42xK8VAvwI8IcIk0rVWUJVWUCwCI42IY6I8E87Iv67AKxVWU JVW8JwCI42IY6I8E87Iv6xkF7I0E14v26r1j6r4UYxBIdaVFxhVjvjDU0xZFpf9x07UK2N tUUUUU= X-CM-SenderInfo: 5vklyvpphqwq5kxd4v5lfo033gof0z/ X-Rspamd-Queue-Id: 7EF5B1C0003 X-Rspamd-Server: rspam12 X-Stat-Signature: tsehk3bxkp6rij15umih4ck67wj5p58g X-Rspam-User: X-HE-Tag: 1734332705-912851 X-HE-Meta: U2FsdGVkX19lWPapyRchhLktCbSTJF378JeUJqAVPSYwgcgxizg1QAN8DicPwghXp2ZEL0CNzo/dFnUgqBS9izTRYrUs9LGyt0mEEXxb0vwyRO+cd41CuKfT6YZJONztH/UesfBIT5kvuL9jhttKtKuytXaAfyptqaAEL+FJLNtlJxFU3xCDZv9tUcoSICSzRY7EzKMDiN9EQxkDbAJvUo89DMlu/eSDJGKj3B7cMtAEjtXeGiRW7TLCHwjwmcDOJJDbctWUe74WLBcHQ04+FWBABVpKem/72Vr0G2qbJkSxowWAL/S2wEmPZBxR3PUxs+yjsu11JzK5ZC2yL0Jcx664yPgsQq6CU9AmczlRlvSr9B88RKosDh3yN0rt6QqScHQVlHl9AIDJkb8CCR+UCKSIp3zuQRl044Z/1FrSlYzAmm9NsExZAm5MneUjKxVuVcf276YXQZ6LXxSfNrbptqWZ/su5u/aPI6SBnSDY5Wmzm64z7ePhAh199QAFYEU4PNv2mvN5R9eWOADxXvDCGhyVXdWiO/4DNe1W+TacaJeiyi/xglqlwIpGludbhcJKrqN16CpfRpqoWKp96A9zkYr94Xx97d7myMDU9QYuYnbwOId5gupjo4iJjLCAPgTVGpwnGlHq88p2jU6w1Oj0pFiR7n8L06YNT9G3dRIy8X/1uDPjWzUG7WN5JSJ9+t0MZKfvxQpKAoI0wsyzuMRh+KQWpvPU3uhjcLzBM4l9FgBzPe1XMF1h9O96RR/31iizGBoogHmB9X2q6rTo9Y4AHx1D44lOtCyoTI94UhCDU9twPd6ONn4kqOL9btWXoIQIG4nVxVb7v2OtMTjVkyhAtXkZtxnfKG7A8dDObG1LtDgPi06EGoMJqLT8VHBsoHnHxiUXgCunBszANbgPkUo5u6XkpUaJ95KO5hhSwdhUmX5Z4WfMJZxmgDLwKW5ovTp/NOJvQgViBn10JkUh3/y SuRbRolf sZkLKRthBLmbehqfrlPVmBCBQZjSgMHL2jXJWNsktAHwHzLNcIDCWpTbWNBJ8sZk/lldwD3q0DlqG9pB4VwBAyWENUhUSnXIt2T6kFclJZL2VJOvifCjXd8QRIDx4k0Kwi6HshiEwhjfRe4ytlo5zVVAzJGu3UGXIbGqiKLbaG5EVyzHVgTUeidcFAYM6cwdpteZP X-Bogosity: Ham, tests=bogofilter, spamicity=0.000964, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: on 12/13/2024 2:12 PM, Baolin Wang wrote: > > > On 2024/12/13 20:25, Kemeng Shi wrote: >> Similar to issue fixed in commit cbc02854331ed ("XArray: Do not return >> sibling entries from xa_load()"), we may return sibling entries from >> xas_find_marked as following: >>      Thread A:               Thread B: >>                              xa_store_range(xa, entry, 6, 7, gfp); >>                 xa_set_mark(xa, 6, mark) >>      XA_STATE(xas, xa, 6); >>      xas_find_marked(&xas, 7, mark); >>      offset = xas_find_chunk(xas, advance, mark); >>      [offset is 6 which points to a valid entry] >>                              xa_store_range(xa, entry, 4, 7, gfp); >>      entry = xa_entry(xa, node, 6); >>      [entry is a sibling of 4] >>      if (!xa_is_node(entry)) >>          return entry; >> >> Skip sibling entry like xas_find() does to protect caller from seeing >> sibling entry from xas_find_marked() or caller may use sibling entry >> as a valid entry and crash the kernel. >> >> Besides, load_race() test is modified to catch mentioned issue and modified >> load_race() only passes after this fix is merged. >> >> Here is an example how this bug could be triggerred in tmpfs which >> enables large folio in mapping: >> Let's take a look at involved racer: >> 1. How pages could be created and dirtied in shmem file. >> write >>   ksys_write >>    vfs_write >>     new_sync_write >>      shmem_file_write_iter >>       generic_perform_write >>        shmem_write_begin >>         shmem_get_folio >>          shmem_allowable_huge_orders >>          shmem_alloc_and_add_folios >>          shmem_alloc_folio >>          __folio_set_locked >>          shmem_add_to_page_cache >>           XA_STATE_ORDER(..., index, order) >>           xax_store() >>        shmem_write_end >>         folio_mark_dirty() >> >> 2. How dirty pages could be deleted in shmem file. >> ioctl >>   do_vfs_ioctl >>    file_ioctl >>     ioctl_preallocate >>      vfs_fallocate >>       shmem_fallocate >>        shmem_truncate_range >>         shmem_undo_range >>          truncate_inode_folio >>           filemap_remove_folio >>            page_cache_delete >>             xas_store(&xas, NULL); >> >> 3. How dirty pages could be lockless searched >> sync_file_range >>   ksys_sync_file_range >>    __filemap_fdatawrite_range >>     filemap_fdatawrite_wbc > > Seems not a good example, IIUC, tmpfs doesn't support writeback (mapping_can_writeback() will return false), right? > Ahhh, right. Thank you for correcting me. Then I would like to use nfs as low-level filesystem in example and the potential crash could be triggered in the same steps. Invovled racers: 1. How pages could be created and dirtied in nfs. write ksys_write vfs_write new_sync_write nfs_file_write generic_perform_write nfs_write_begin fgf_set_order __filemap_get_folio nfs_write_end nfs_update_folio nfs_writepage_setup nfs_mark_request_dirty filemap_dirty_folio __folio_mark_dirty __xa_set_mark 2. How dirty pages could be deleted in nfs. ioctl do_vfs_ioctl file_ioctl ioctl_preallocate vfs_fallocate nfs42_fallocate nfs42_proc_deallocate truncate_pagecache_range truncate_inode_pages_range truncate_inode_folio filemap_remove_folio page_cache_delete xas_store(&xas, NULL); 3. How dirty pages could be lockless searched sync_file_range ksys_sync_file_range __filemap_fdatawrite_range filemap_fdatawrite_wbc do_writepages writeback_use_writepage writeback_iter writeback_get_folio filemap_get_folios_tag find_get_entry folio = xas_find_marked() folio_try_get(folio) Steps to crash kernel: 1.Create 2.Search 3.Delete /* write page 2,3 */ write ... nfs_write_begin fgf_set_order __filemap_get_folio ... xa_store(&xas, folio) nfs_write_end ... __folio_mark_dirty /* sync page 2 and page 3 */ sync_file_range ... find_get_entry folio = xas_find_marked() /* offset will be 2 */ offset = xas_find_chunk() /* delete page 2 and page 3 */ ioctl ... xas_store(&xas, NULL); /* write page 0-3 */ write ... nfs_write_begin fgf_set_order __filemap_get_folio ... xa_store(&xas, folio) nfs_write_end ... __folio_mark_dirty /* get sibling entry from offset 2 */ entry = xa_entry(.., 2) /* use sibling entry as folio and crash kernel */ folio_try_get(folio)