From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 70CACC02188 for ; Mon, 27 Jan 2025 19:37:24 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 07F0328019E; Mon, 27 Jan 2025 14:37:24 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 02F6B280191; Mon, 27 Jan 2025 14:37:23 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E12C528019E; Mon, 27 Jan 2025 14:37:23 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id C4BAE280191 for ; Mon, 27 Jan 2025 14:37:23 -0500 (EST) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 6F093805FF for ; Mon, 27 Jan 2025 19:37:23 +0000 (UTC) X-FDA: 83054240766.26.1BC37F9 Received: from mail-pl1-f182.google.com (mail-pl1-f182.google.com [209.85.214.182]) by imf11.hostedemail.com (Postfix) with ESMTP id 8B6C94000C for ; Mon, 27 Jan 2025 19:37:21 +0000 (UTC) Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b="2ksjm/9f"; spf=pass (imf11.hostedemail.com: domain of almasrymina@google.com designates 209.85.214.182 as permitted sender) smtp.mailfrom=almasrymina@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1738006641; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=F9gtWxP5YVO9dXyzp6MgoL1ibID3yexs7JWyV1leQyw=; b=E7yv8HmEAvKqxdcfNhVTfg7aa2pfUfSrQc8RKqvCZ25OZpdqTi6fZMc5tuu9Wom/DydAQD LdFgdmusV3R8nDkKDpeCsy9nSRHhA7HkfionhJsuFmpC6bKsxx9fTbgvKq9oeQgmR4jkD0 SgpK6ksiplqIF+JD6f2QBhq7LExTFUI= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b="2ksjm/9f"; spf=pass (imf11.hostedemail.com: domain of almasrymina@google.com designates 209.85.214.182 as permitted sender) smtp.mailfrom=almasrymina@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1738006641; a=rsa-sha256; cv=none; b=A0mbGC1ranmH5qfsoXCYSaRO6ptrRcj6FWFnNNyYeev0951VgyfMr3Am5zDZNeHoxl2yK/ O6R5chQh1/azbG8P32OsQngpt1+ScE2Daqy/Fl7SoT6MDGEg90DX+lTJCICy3jXJ8I4uOF VAaUW1ucsENQYN1Ez/LHWTBzZEQBS0g= Received: by mail-pl1-f182.google.com with SMTP id d9443c01a7336-2163affd184so19595ad.1 for ; Mon, 27 Jan 2025 11:37:21 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1738006640; x=1738611440; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=F9gtWxP5YVO9dXyzp6MgoL1ibID3yexs7JWyV1leQyw=; b=2ksjm/9fbeSl/BIaLXWQnIwabDxILsmLw4ziUlEVUmBnlPhFqEokb48xwKSl4bKblO 0V386YE0gKjRrN3rSEj8u4ZK4yO5yJdL8r+KkaId3DjsUJenYKqaSNEkMa/TeGnBnSfV TzR4QrgP9vkUJ7+3g062yG3ZxiDbpt5hNypmv+XXHcV86NjhLr7BrbNEy9myirSXMj9J UvT111Mpc4TM2iHhyw1qY47f/GH5c2cxGcCCaCLZcPJxzhwUu9kz9g+nP4l45d3PPlVy AtA9zL9iTwelGkOba7lD5TXnjBBP01Ef+JpHzxrPMnBlzsHJWwXepRTHOgEdLQzeEBnB aRNw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1738006640; x=1738611440; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=F9gtWxP5YVO9dXyzp6MgoL1ibID3yexs7JWyV1leQyw=; b=F85DT8VUEXjcEI/rLvDmaUygcOaHyFCqm6zGJc9qgrj+3S7GQL7blVvlYp+piXNjSA aW66zq3gI1MYS2nHvwKbzDxDUw6n7IJzDDbbH/S7/fHOrwEA2je7nytivGCSeiVPlj7A odtV/mRgaUw1bgv0B1DPlty5kp1OPzi7aGlkxrQKz6ib4YXEyiOIvFF0MFaWisO8lXho 849EnFzVMiHxpdDy6NRyY1fypKzI3v0tgw29A3V+uG5YzaN8SHDabOWtGQIL6+jfCSOF G1grzrOH2gQCRlP4uWc5mM8OfF8jsM6eMaSq4LtGvtt30eEUxIg6RNjN8JMkXOr7uvgQ 0H+A== X-Forwarded-Encrypted: i=1; AJvYcCUuTuecLa7d1pS1S9N8DBasuUl08lwzbLDRhABUjzGCYQYqJP3zAyE4aU9Ztaig8MhfAtEZbMUFew==@kvack.org X-Gm-Message-State: AOJu0YzH1ys/dlJg7Wzz4OU6pr/LBOFpR1qgwwHsJkpULMoaQ97VLNlU 6elbnHre8FBHfQnxUVau/8D3+vtizbqiPjnFxVirO7KZ09klmtChfUaaUDS63T2Oi0gaJCnURGh ud4pYevVirwLZGgKM9YWdi8WF76PUcCniF9P1 X-Gm-Gg: ASbGnctfRz+jYVqacVOS8kPCWziMgSB/eviVDDGuRP6KF8qkgt8rQ/1USb+IdXaxKHJ Z/rzKHTXG/fcoGOV8QvwraXwOCZCNQwPd3nJKTkr94UNzE6dsTHSRW0IgmbMdwyBEbdDZyJKayG SFCBZXK+E0WUNZducR X-Google-Smtp-Source: AGHT+IGQqRRB5Vv2uhr09qpKpBT07Ggb5bWiik/H5DljS2H9SBK98daVTcNGnEG+i+EiYTS0aBcngT9qK0fN/XucdlU= X-Received: by 2002:a17:902:d58a:b0:21b:b3b3:ef5f with SMTP id d9443c01a7336-21dccec634fmr223465ad.22.1738006639968; Mon, 27 Jan 2025 11:37:19 -0800 (PST) MIME-Version: 1.0 References: <20250127143201.45453-1-d.dulov@aladdin.ru> <83645f1b-cede-455c-abc0-6f105199eee9@lucifer.local> <4rmkmv5bgryxawl4qnizozlhwnfkhlebut4n2dcf6cdpuvqacb@c73fcytj6dfi> In-Reply-To: <4rmkmv5bgryxawl4qnizozlhwnfkhlebut4n2dcf6cdpuvqacb@c73fcytj6dfi> From: Mina Almasry Date: Mon, 27 Jan 2025 11:37:06 -0800 X-Gm-Features: AWEUYZkSwdMGwlntENuTtyKAFpus0ykLKPj3M_vN1hjVtFVVPB0wa4bXiSGPMRw Message-ID: Subject: Re: [PATCH] mm/vma: Fix hugetlb accounting error in copy_vma() To: Muchun Song Cc: Lorenzo Stoakes , Daniil Dulov , lvc-project@linuxtesting.org, Jann Horn , linux-kernel@vger.kernel.org, Mike Kravetz , "Matthew Wilcox (Oracle)" , linux-mm@kvack.org, "Liam R. Howlett" , Andrew Morton , Vlastimil Babka , stable@vger.kernel.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 8B6C94000C X-Rspam-User: X-Rspamd-Server: rspam11 X-Stat-Signature: u7qz1686zadc7x3ynypus683tf9x1xg3 X-HE-Tag: 1738006641-815873 X-HE-Meta: U2FsdGVkX1/xeSR96dK2Rr9GadALU1fp5K+fpIlPsdupGRG9rdjKjsuk1C1maLrRt2X8ud13turaFD0/azLW0p4y22W3PfP1v3qzKI1h/zf+iwMoKZR0tbwBrLa02H53mbECP2Db0NPHWyEW2tuYxwpvbqI0I1Z2hRzMdzSR1mdWMQGbsywGkf9JQXN0vEk/gOtOJN2QP5BbiEBf0GPwf6TzSuuQlQDgZCLKMpQ2h4rFtM3X6Roe2S5N640lQNG7KaHmHZwowHAi1MRoduKoapB6ghln3Q8yy0J4B5KcnPXcuIb/d4zJscp54jMgTEIowUIfks/EzaD4ezFM/Tvse5y+8K32RiCddVjyhfiZ1R5Z50YyagHi+Iu0gdKc3aJFgL4osktkK2pTkiU/PUHFQaUGYBt3PRI0T3/7GF9TQnbrJnkDSlFswZo3vtyDlbcVsMF9Ctl8IkZ9NQbDGto2/8uyIFilJYcXSAR6MtY2OjHkn1L0NJGQ0DNPK0K5L18IhF1z7RFKKuDsvhpsTJm1aZv8T8R987SU2KjLZa9AGawLUAmB3tZlSqtcmLighSsNXOO4WNDrl7INJTVmBTeklhoqyQanLu6i3Hc5O1sHvmNJiG3BrCFGpw4uxhIULaVFG7mjU6d2kDlU+/94Rrbv9NfYtgeEtAk44I1Ldw/cb1+ReGSx8gAeSpFYp4TrPYn46bcGlgMScsfloi1G9Qww8TqTU4BU0YhTtRphfUYmK4WdJGeYZDUIW4768CU6U1kyN+UusB7CKy8yfYUdwVkW6gZ3b053NKlAWoTDYpsjGy1ROBt/Pnw9up/+4eNKA+JQ0EnWMFDH97AXFF5Ok0rchOXBQgMRrBdGfOl41iTPBDT+FwqsKywCQIZGk616WHMNcqEQry+bQ74yXGKjnVc/RiuPACXYDHO2US+UmasQl39JwRw/Pc9xN8aTMdQk+EeaS8XdexwZJSBiYf6wdbX s/xdu1ca sHPRPX9ISl8QobF9/gARFAzq4IcGc6+T3bZfP/8sqNEVUBIR2nKFfaHoPkVBGp3RAeuBfY/OzNBQyCLaijsIdC/Kc0I8fTClOSGZfalDLBMQ2aUDWfWiOZLk3nHSUmLbeUx1s7mraeU4cdObHkzJNEoBUKO7A90TaTKD7bVI67Y3FX+5Qk1ee1fZcbP/ZwVa5/nc121xux23x8vzj7wxafsGlCQxI7VShUcXR5CH7xcJ9po0kMkJBt8iNdnmnrVT9gQjAcOi63hGQDPklVcsMVFy47BDxU8iZDOtPuYecwwMrDKYklzS/u+zuPKkVQ/3l33iUA0g8KoiYOuUdNri5DVWFSRITPVmtMITMhmdWWxRM6VUQjImo/x58VtbA0tBtgwiTPZJK64+pyEsTsYN9WLw5ht9NdWuERfXgUD/KctOXmRWyVB/FLXFJAk64xAKq8c3S1T7qE16aCJbzWopuUrt/Q9XSiV+r0IawQpUemePNHTDXz4HhaJcZzyceIHD3/xAmdet6sr+Ft7w= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Jan 27, 2025 at 8:45=E2=80=AFAM Fedor Pchelkin = wrote: > > +Cc Mina Almasry and Mike Kravetz > +Cc current Hugetlb maintainer as well. > On Mon, 27. Jan 14:46, Lorenzo Stoakes wrote: > > Thanks for the report. > > > > On Mon, Jan 27, 2025 at 05:32:01PM +0300, Daniil Dulov wrote: > > > In copy_vma() allocation of maple tree nodes may fail. Since page acc= ounting > > > takes place at the close() operation for hugetlb, it is called at the= error > > > path against the new_vma to account pages of the vma that was not suc= cessfully > > > copied and that shares the page_counter with the original vma. Then, = when the > > > process is being terminated, vm_ops->close() is called once again aga= inst the > > > original vma, which results in a page_counter underflow. > > > > This seems like a bug in hugetlb. > > > > I really hate the solution here, it's hacky and assumes only these fiel= ds are > > meaningful for 'close twice' scenarios. > > > > We now use vma_close(), which assigns vma->vm_ops to vma_dummy_vm_ops, = meaning > > no further close() invocations can occur. > > Does the "close twice" scenario exactly mean ->close() is called twice > for the same object of struct vm_area_struct? > > For the observed case I think that's not true. ->close() is called for > two different objects of type vm_area_struct - the first time for the > new_vma on error path of copy_vma(), the second time for the original > vma. It turns out then these objects share the same reservation map > holding page_counters at this point of time. > > > > > If hugetlb is _still_ choosing to internally invoke this, it seems like= it > > should have some if (vma->vm_ops =3D=3D hugetlb_vm_ops) { ... } check f= irst? That > > way it'll account for the closing twice issue. > > > > Can you easily repro in order to check a solution like that fixes your = problem? > > I don't see why it shouldn't > > > > Seems that wouldn't fix the problem (again, two different vma objects). > There's presumably no obvious place in hugetlb internals where this may > be fixed elegantly, at the quick glance. But yep, it does look like a bug > for hugetlb to care about.. Perhaps somehow defer the reservation map > copying? > > > > > > > page_counter underflow: -1024 nr_pages=3D1024 > > > WARNING: CPU: 1 PID: 1086 at mm/page_counter.c:55 page_counter_cancel= +0xd6/0x130 mm/page_counter.c:55 > > > Modules linked in: > > > CPU: 1 PID: 1086 Comm: syz-executor200 Not tainted 6.1.108-syzkaller-= 00078-g9ce77c16947b #0 > > > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 = 04/01/2014 > > > Call Trace: > > > > > > page_counter_uncharge+0x2e/0x70 mm/page_counter.c:158 > > > hugetlb_cgroup_uncharge_counter+0xd2/0x420 mm/hugetlb_cgroup.c:430 > > > hugetlb_vm_op_close+0x435/0x700 mm/hugetlb.c:4886 > > > remove_vma+0x84/0x130 mm/mmap.c:140 > > > exit_mmap+0x32f/0x7a0 mm/mmap.c:3249 > > > __mmput+0x11e/0x430 kernel/fork.c:1199 > > > mmput+0x61/0x70 kernel/fork.c:1221 > > > exit_mm kernel/exit.c:565 [inline] > > > do_exit+0xa4a/0x2790 kernel/exit.c:858 > > > do_group_exit+0xd0/0x2a0 kernel/exit.c:1021 > > > __do_sys_exit_group kernel/exit.c:1032 [inline] > > > __se_sys_exit_group kernel/exit.c:1030 [inline] > > > __x64_sys_exit_group+0x3a/0x50 kernel/exit.c:1030 > > > do_syscall_x64 arch/x86/entry/common.c:51 [inline] > > > do_syscall_64+0x35/0x80 arch/x86/entry/common.c:81 > > > entry_SYSCALL_64_after_hwframe+0x6e/0xd8 > > > > > > > > > > > Since there is no sense in vm accounting for a bad copy of vma, set v= m_start > > > to be equal vm_end and vm_pgoff to be equal 0. Previously, a similar = issue > > > has been fixed in __split_vma() in the same way [1]. > > > > > > [1]: https://lore.kernel.org/all/20220719201523.3561958-1-Liam.Howlet= t@oracle.com/T/ > > > > Understood that we do this elsewhere, I think equally we should not do = this > > there either! :) > > > > > > > > Found by Linux Verification Center (linuxtesting.org) with Syzkaller. > > > > > > Fixes: d4af56c5c7c6 ("mm: start tracking VMAs with maple tree") > > > Cc: stable@vger.kernel.com > > > Signed-off-by: Daniil Dulov > > > --- > > > mm/vma.c | 3 +++ > > > 1 file changed, 3 insertions(+) > > > > > > diff --git a/mm/vma.c b/mm/vma.c > > > index bb2119e5a0d0..dbc68b7cd0ec 100644 > > > --- a/mm/vma.c > > > +++ b/mm/vma.c > > > @@ -1772,6 +1772,9 @@ struct vm_area_struct *copy_vma(struct vm_area_= struct **vmap, > > > return new_vma; > > > > > > out_vma_link: > > > + /* Avoid vm accounting in close() operation */ > > > + new_vma->vm_start =3D new_vma->vm_end; > > > + new_vma->vm_pgoff =3D 0; > > > vma_close(new_vma); > > > > > > if (new_vma->vm_file) > > > -- > > > 2.34.1 > > > --=20 Thanks, Mina