From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 27933C10F1A for ; Thu, 9 May 2024 17:23:53 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B5D976B0083; Thu, 9 May 2024 13:23:52 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id AF0126B0087; Thu, 9 May 2024 13:23:52 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9D3916B0088; Thu, 9 May 2024 13:23:52 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 7DAC96B0083 for ; Thu, 9 May 2024 13:23:52 -0400 (EDT) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 115F312046A for ; Thu, 9 May 2024 17:23:52 +0000 (UTC) X-FDA: 82099529904.27.1F96277 Received: from out-174.mta1.migadu.com (out-174.mta1.migadu.com [95.215.58.174]) by imf06.hostedemail.com (Postfix) with ESMTP id 21C4E180012 for ; Thu, 9 May 2024 17:23:49 +0000 (UTC) Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=PVQeG805; spf=pass (imf06.hostedemail.com: domain of luis.henriques@linux.dev designates 95.215.58.174 as permitted sender) smtp.mailfrom=luis.henriques@linux.dev; dmarc=pass (policy=none) header.from=linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1715275430; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Q5NwpyR9OTSf8oGKU+ne7+xoirIi77/iJlT2WIV8ttY=; b=xmPZTLt3HOc4ztgL5rVCXTRSuoqs88h1Q/fJb2NkeduGTrQqSmWS+pRvx0QmaLUnVoTpj6 Q4qVIrTi0VqaY95BgXqRyFJqzBrma2Ku7uN2qvsb5CqUliuqnwWIkoCOG4E4Ot3btPbBM3 B+ySX+rUAN5VkfNJ4G9DPRryWR9+H0o= ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=PVQeG805; spf=pass (imf06.hostedemail.com: domain of luis.henriques@linux.dev designates 95.215.58.174 as permitted sender) smtp.mailfrom=luis.henriques@linux.dev; dmarc=pass (policy=none) header.from=linux.dev ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1715275430; a=rsa-sha256; cv=none; b=Z1SzKKUpUgAPFlM6Udl+s82XOgQOxgpCdHhvR2EBlLkFA9ACiFSDM4N5BUYsj/A/DSDDGZ Ej2/fJLEjNOPX2iWdsnACyX1IKaKGZeEhMxlDqydse3kDVVke4kww7zQ8+5ELXeGPkLHaD ya5uv2IYrkpkiwJHxPVnGJ2unAIkuyA= X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1715275427; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=Q5NwpyR9OTSf8oGKU+ne7+xoirIi77/iJlT2WIV8ttY=; b=PVQeG805+z+rJd0kTIjdCzK2cBySgUA9thg2xKhNXMZcGZUpBLTZNP3FCARPpKo1Q2tRBR UpdM1F7kDC2VbJ+YFTS7gITG/HiZbizfrTU/PQ+bWzgYoS27DCpOdfcPGCm5AD68HNxTX8 WTP771jM/gN3ebtg173XF8De52UvCBo= From: Luis Henriques To: "Theodore Ts'o" Cc: Luis Henriques , Zhang Yi , linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, adilger.kernel@dilger.ca, jack@suse.cz, ritesh.list@gmail.com, hch@infradead.org, djwong@kernel.org, willy@infradead.org, zokeefe@google.com, yi.zhang@huawei.com, chengzhihao1@huawei.com, yukuai3@huawei.com, wangkefeng.wang@huawei.com Subject: Re: [PATCH v3 03/26] ext4: correct the hole length returned by ext4_map_blocks() In-Reply-To: <20240509163953.GI3620298@mit.edu> (Theodore Ts'o's message of "Thu, 9 May 2024 12:39:53 -0400") References: <20240127015825.1608160-1-yi.zhang@huaweicloud.com> <20240127015825.1608160-4-yi.zhang@huaweicloud.com> <87zfszuib1.fsf@brahms.olymp> <20240509163953.GI3620298@mit.edu> Date: Thu, 09 May 2024 18:23:44 +0100 Message-ID: <87h6f6vqzj.fsf@brahms.olymp> MIME-Version: 1.0 Content-Type: text/plain X-Migadu-Flow: FLOW_OUT X-Stat-Signature: tzyuayqfyyew77snuajoufhchfe8z331 X-Rspamd-Queue-Id: 21C4E180012 X-Rspamd-Server: rspam10 X-Rspam-User: X-HE-Tag: 1715275429-999747 X-HE-Meta: U2FsdGVkX1/m7JnQO+VQ9EcSQB0pG6v7EFd7+USnEty8g8KALTd3LzPMwzs6ilxQf/TY7OJ7bf9zOUzCFTUC1Jcu9z6cmJLpK5X3433iB6xxjCJMj6WbACStbNF5KduMMqm8VhMeCpW9nkZQMJ/SonoBEXt8OiQgVkS8BdaKjaZPzglRBn1ceUb8S0DBgcSs3E892Alcvl0ZtZdeyIlpKO5va8enjt4984uQzVw2XN6piMcsZTm+2vPpKdGLP0PTWSETjcquD8BE8HgKQboaEyj67DSJrYXXPvHheLza3k8lsRZvr2UjMsh6v2f/UpMzr0z4WvKKq+I+angz4bfygcLvfsLldv3GNe+7gtmRiGjxRzobwGZ7+ky8GgUnRYNH6ORIezfk6QTgPcQaOLLS+h1Iq8LmqrTvtfDJjmV7lBqqC+Ks5XmB56aluQp3Jh2+eJLh3CTc6FspDBafLxcxKRJ9M0cwJVNwYAC54Y8Pl0zFFQfyZS7wU0J+0o+hj4Tz9s6ekeKU/t1BVm07je+1ZoR5gjEPDfGNOm+PAMChcowwGO5S8UOObAi44gHzu8xWMqWQka+dDkS4X2+2LgK7CTO5+9+pPXU7fxWztb3mfoJBNXp3tzJMY/GrBPVlQeS7mkae2i1X2J7gdutMy7nEG7Ygrw0F+HtJXp9qDvLWdsgPuGUsOOIEQSheno8jGkMMpkYBS3Go2yRUGEEYPAbXVKXPCzZf3qNhSxMhnKkEeESP/CNBFELA0c0Z9HnMIODyCdIypEDPknukdBcIKDp1J8tUozrJ6JJf3LGLxiJXR7KA66gQqZ0N19OqQGZaRssBPRXgOZbS54Xui5k1DCbbpA57tbsCG8l1/L5p/0rD/TgWNopk91yWVBPsoGCti4OMlNcSY6pRS0N7aXvTrTRlTO6iBEX0JOopQx6t9qMGrm7sv7UmtjGbGY741gUJPuqvHiY02L6qpGy9Kg3zkrE w2LMYg1c 1JNZjPJWyk7/IkPRxe69S6RUNyNVKUbPi+a+ebwcJrk4iV+PL+2miNrmPDcXz2n9k77nXgs1LXfKsl1UpH+ei2gLMmOBFhumHBor3Rs+gw2qX3ODPqssDeFBcsw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu 09 May 2024 12:39:53 PM -04, Theodore Ts'o wrote; > On Thu, May 09, 2024 at 04:16:34PM +0100, Luis Henriques wrote: >> >> It's looks like it's easy to trigger an infinite loop here using fstest >> generic/039. If I understand it correctly (which doesn't happen as often >> as I'd like), this is due to an integer overflow in the 'if' condition, >> and should be fixed with the patch below. > > Thanks for the report. However, I can't reproduce the failure, and > looking at generic/039, I don't see how it could be relevant to the > code path in question. Generic/039 creates a test symlink with two > hard links in the same directory, syncs the file system, and then > removes one of the hard links, and then drops access to the block > device using dmflakey. So I don't see how the extent code would be > involved at all. Are you sure that you have the correct test listed? Yep, I just retested and it's definitely generic/039. I'm using a simple test environment, with virtme-ng. > Looking at the code in question in fs/ext4/extents.c: > > again: > ext4_es_find_extent_range(inode, &ext4_es_is_delayed, hole_start, > hole_start + len - 1, &es); > if (!es.es_len) > goto insert_hole; > > * There's a delalloc extent in the hole, handle it if the delalloc > * extent is in front of, behind and straddle the queried range. > */ > - if (lblk >= es.es_lblk + es.es_len) { > + if (lblk >= ((__u64) es.es_lblk) + es.es_len) { > /* > * The delalloc extent is in front of the queried range, > * find again from the queried start block. > len -= lblk - hole_start; > hole_start = lblk; > goto again; > > lblk and es.es_lblk are both __u32. So the infinite loop is > presumably because es.es_lblk + es.es_len has overflowed. This should > never happen(tm), and in fact we have a test for this case which If I instrument the code, I can see that es.es_len is definitely set to EXT_MAX_BLOCKS, which will overflow. > *should* have gotten tripped when ext4_es_find_extent_range() calls > __es_tree_search() in fs/ext4/extents_status.c: > > static inline ext4_lblk_t ext4_es_end(struct extent_status *es) > { > BUG_ON(es->es_lblk + es->es_len < es->es_lblk); > return es->es_lblk + es->es_len - 1; > } > > So the patch is harmless, and I can see how it might fix what you were > seeing --- but I'm a bit nervous that I can't reproduce it and the > commit description claims that it reproduces easily; and we should > have never allowed the entry to have gotten introduced into the > extents status tree in the first place, and if it had been introduced, > it should have been caught before it was returned by > ext4_es_find_extent_range(). > > Can you give more details about the reproducer; can you double check > the test id, and how easily you can trigger the failure, and what is > the hardware you used to run the test? So, here's few more details that may clarify, and that I should have added to the commit description: When the test hangs, the test is blocked mounting the flakey device: mount -t ext4 -o acl,user_xattr /dev/mapper/flakey-test /mnt/scratch which will eventually call into ext4_ext_map_blocks(), triggering the bug. Also, some more code instrumentation shows that after the call to ext4_ext_find_hole(), the 'hole_start' will be set to '1' and 'len' to '0xfffffffe'. This '0xfffffffe' value is a bit odd, but it comes from the fact that, in ext4_ext_find_hole(), the call to ext4_ext_next_allocated_block() will return EXT_MAX_BLOCKS and 'len' will thus be set to 'EXT_MAX_BLOCKS - 1'. Does this make sense? Cheers, -- Luis