From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id DFCB1C3DA63 for ; Tue, 23 Jul 2024 16:34:01 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 68E306B00DD; Tue, 23 Jul 2024 12:34:01 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 63EBA6B00DE; Tue, 23 Jul 2024 12:34:01 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 52CA06B00DF; Tue, 23 Jul 2024 12:34:01 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 327F26B00DD for ; Tue, 23 Jul 2024 12:34:01 -0400 (EDT) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id E2F7C1C07F1 for ; Tue, 23 Jul 2024 16:34:00 +0000 (UTC) X-FDA: 82371564240.16.CDC0519 Received: from mail-ed1-f48.google.com (mail-ed1-f48.google.com [209.85.208.48]) by imf07.hostedemail.com (Postfix) with ESMTP id F1DA640011 for ; Tue, 23 Jul 2024 16:33:57 +0000 (UTC) Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b="2ZRcbxm/"; spf=pass (imf07.hostedemail.com: domain of jglisse@google.com designates 209.85.208.48 as permitted sender) smtp.mailfrom=jglisse@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1721752415; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=iiHSaiFwYnxeXJNiC6ft2n6E6HnZA5O/Rg0VZMt8MPE=; b=2EHGnanj5bSUV9eY7OPpkgBVOCWJK/NyzNM551Is7a1iaZ7V1NWEw7EumzNMexAxrMsdQa ICKY+xKYbDUVEDswROnnDoe0jpXSrRSUHsuTsbxcwIiUefntncrwrJ8Sok/Y2PYlZLgxPj sNQmwU3CVY7CY01VsJzDdG/hjRHQyy0= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b="2ZRcbxm/"; spf=pass (imf07.hostedemail.com: domain of jglisse@google.com designates 209.85.208.48 as permitted sender) smtp.mailfrom=jglisse@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1721752415; a=rsa-sha256; cv=none; b=rroXyPwNxVwMIYLqd/2wHw1Hxae9KbZ4LGNEuR20IARHaMzwrE28onjqp7h2fBOKZQ2b1M dTEkqId3L1pAto3YZnBPIU4kLCLGjCOlzj/ioL84rPTICCAPk40b7fuemXh8TnFU3kUgb5 dFr3cCMkupvxOrUaq8kLChuIrZ+kYvg= Received: by mail-ed1-f48.google.com with SMTP id 4fb4d7f45d1cf-5a869e3e9dfso159a12.0 for ; Tue, 23 Jul 2024 09:33:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1721752436; x=1722357236; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=iiHSaiFwYnxeXJNiC6ft2n6E6HnZA5O/Rg0VZMt8MPE=; b=2ZRcbxm/io3Rs0PymQnqyny9b6AxcnMx26FxJY7q1GMa6O44XRKbZrPrghL/sqmAHE Sqv7blhK5UhRRA/1qCAkn0qV7xOo7wsOjy96+hYR03+GNnCSOpkJys45bBJ4Nl+ZgO3S +/Gas6LxREydlQJY4Y9z4ne1D2+ex0iy5XRfqmPpDBI0eCkwRxQ93hkvEmR403pJC+CM loqUlqpKe1FaZihebpgjJ9yGjaZksxHH4/0mBISALtSbK8AGAJVr8pBbO3ONVaqBM06r xyldKSFl0JALkJbI7/tB/Nwuq/+tfGYIxYXzKgGaspGlz7nydgpuQyDiukD6q0rFDENh c85A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1721752436; x=1722357236; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=iiHSaiFwYnxeXJNiC6ft2n6E6HnZA5O/Rg0VZMt8MPE=; b=KcHkwxDd2L6v2kSCscy7Y/9sns/8z73BJ0gZ+jzbOKDfcvqVl8Gkl394mr0Dw095rk E0YOhx8q96qPStMUFqHbfVtAIJMVP9llS5J++PEEQyBpFKD0i0xJecQvhd80ZqDbb1bJ DAv9rgTBAkxNBsWkgr+Vi7lH+/c0S/tuxaaZv83oLd9faUG1Y+9gLVlVVobITQNfQBUm /1aV1CWiO0ujcgSYz350ZGC5iOzu/H1tro7ZI8/xYN8zF624rfeFluSqZXhdXojqpjAB cYteYyw5Kb4YJwhSI57zAIG40QytdIZalRkZ4LTrSXvuil8Idcy8204QtrQS9mzuk7Xz bZ5Q== X-Forwarded-Encrypted: i=1; AJvYcCWeeKMau1B1z2EE6dP+HjcmB0X6zWa+a+tC43TMgmOdDcTW8UQibzYHpNasJ1Qb0HX4ZR6XlgwC5w4vELiNpjQ90Gk= X-Gm-Message-State: AOJu0Yx2n2AGuYESo8oU9x9WYlVSnZVH8hq7tjSb1kuKJHQzeNKRPt4u n6WGpzR2Oa+P7SaBnRp7cWyeylAkTQbk8RH3X3uoe4pQEFVSVWFXspSzv3FUXgsZpsZBbrmeTKs JnGwnQHQLx3qcBftMyWiILzHqvY1hGkD/sTS5 X-Google-Smtp-Source: AGHT+IFkcnxlwSrfASAm2b5oqxdiNNK+NxUsk7wv3nDfPHaSyqG2E4MLZ+C7g0N20H/g2abqr2ueJJZjIFXzbC4wLao= X-Received: by 2002:a05:6402:2747:b0:59e:9fb1:a0dc with SMTP id 4fb4d7f45d1cf-5a456a628aemr729704a12.6.1721752435997; Tue, 23 Jul 2024 09:33:55 -0700 (PDT) MIME-Version: 1.0 References: <20240720173543.897972-1-jglisse@google.com> <0c390494-e6ba-4cde-aace-cd726f2409a1@redhat.com> In-Reply-To: <0c390494-e6ba-4cde-aace-cd726f2409a1@redhat.com> From: Jerome Glisse Date: Tue, 23 Jul 2024 09:33:44 -0700 Message-ID: Subject: Re: [PATCH] mm: fix maxnode for mbind(), set_mempolicy() and migrate_pages() To: David Hildenbrand Cc: Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, stable@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam03 X-Rspam-User: X-Rspamd-Queue-Id: F1DA640011 X-Stat-Signature: 5gbftyf8oxwqqkuwx973s4degh6zui3e X-HE-Tag: 1721752437-89307 X-HE-Meta: U2FsdGVkX1+iOmGpXlAMTkeJ7m4KGjXXPsqAQswdDrUYuQb9cOUj80/W89x1DUsdnypeXGL7giZfXg6zFQgaHs45p61UpFXUAKiyVf5bWwd8MU/qC+3Jj5ptPIO+EtvzU1b0RXGrjpeUeI9Ryi504MyE+rkdZZqkopJ7JpA/Kkx76VHVFaY5QxLc47XNqXegvv1qVbR+uzbNn7nF4e5+Uoir3vuxf8E1eP/yWjObziIdk/Eg4TpG94IOdq0krweazqORRfIFten9F5/sYn++A4n0AXaV3HE5yAEf9Z3L50MIYynKbljNKgI4dJ3utaYxU61ZwvUuRKXzJBo7w13PIb2Ddva2VPc/2UPctxiFZD5wQq4dYNPThKO9I9qavg8o73tDPZMGKB/RLV0GBIBFoMqQd2X+OCEcE3laNO3iZHUipjkLV1mMHYI9OiUVDx0axRt7+F0SPP2BJCKai2WyVM4MTSPXpdBq2c5O9fcjMgGTnTpLt7zotDxzOFUBol1ves1ZmBQ0X80diZMV+mPSmVt1nvEhnK5SMUnNBCAaNQ35ihJKyAOyhqH6ACY2zMgbyjBQIRde4QZkl1nh4OWBJpKwNa7cwZ/dAXwGzygNes328Iq2sD/1yLLCdg/eiFMDLttoo3McvOcwtHhcMsVxPB/JKoUxSmk+v31ESZQHr4YqgCblLIIuLT2g0MrjhOhANLQlUBdvhNOXQ4ywmbr03fF/hl6+9vhhCRRUdxbkV7LkQQsStjy0q/cCHQplpJXTEULg72RfT5rpH0JoyVBM2ALycaahVu0DooXpkAsIlQSjjNjMuoLn2jTEWJj7+IZcO+bphTsHH5V+psvVGzvffxpn9Jhm/Yy7vmPMcIeuyM5x4gQJ8/3eUoXPezVW9E/CHNuQuPJshf/z5ajEkcsQRk2Ua114DCKBkJPVAuAbJ0cn3DBZKkg+NSvb/3SCOnb3+cN0t+U81379Thot+mY MJyeTEgx ywi/FJduKglCXb0bs3sqHCyI2GSpQnpf1ctHh5oIUis5DfP3rjR2fy6YlGU8aQlo3BVLWhw4smByOC1jQ4cFXIoIC0jVJckVLrpNStHnPRwrqOWJc2O2YdvGSVC7V1b3kZ/Ej6rIJSONMMkxESO2YbAg4gGBps2EPS9na3BfoYTmzyld7137AzfeSu6kdjwbEnSX0Xc7YDhDhgUxsm/ic3Rm4pTCxtJTgRA93E2C0HCSLmww= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000001, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, 22 Jul 2024 at 06:09, David Hildenbrand wrote: > > On 20.07.24 19:35, Jerome Glisse wrote: > > Because maxnode bug there is no way to bind or migrate_pages to the > > last node in multi-node NUMA system unless you lie about maxnodes > > when making the mbind, set_mempolicy or migrate_pages syscall. > > > > Manpage for those syscall describe maxnodes as the number of bits in > > the node bitmap ("bit mask of nodes containing up to maxnode bits"). > > Thus if maxnode is n then we expect to have a n bit(s) bitmap which > > means that the mask of valid bits is ((1 << n) - 1). The get_nodes() > > decrement lead to the mask being ((1 << (n - 1)) - 1). > > > > The three syscalls use a common helper get_nodes() and first things > > this helper do is decrement maxnode by 1 which leads to using n-1 bits > > in the provided mask of nodes (see get_bitmap() an helper function to > > get_nodes()). > > > > The lead to two bugs, either the last node in the bitmap provided will > > not be use in either of the three syscalls, or the syscalls will error > > out and return EINVAL if the only bit set in the bitmap was the last > > bit in the mask of nodes (which is ignored because of the bug and an > > empty mask of nodes is an invalid argument). > > > > I am surprised this bug was never caught ... it has been in the kernel > > since forever. > > Let's look at QEMU: backends/hostmem.c > > /* > * We can have up to MAX_NODES nodes, but we need to pass maxnode+1 > * as argument to mbind() due to an old Linux bug (feature?) which > * cuts off the last specified node. This means backend->host_nodes > * must have MAX_NODES+1 bits available. > */ > > Which means that it's been known for a long time, and the workaround > seems to be pretty easy. > > So I wonder if we rather want to update the documentation to match realit= y. [Sorry resending as text ... gmail insanity] I think it is kind of weird if we ask to supply maxnodes+1 to work around the bug. If we apply this patch qemu would continue to work as is while fixing users that were not aware of that bug. So I would say applying this patch does more good. Long term qemu can drop its workaround or keep it for backward compatibility with old kernel. Thank you, J=C3=A9r=C3=B4me