From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id AF6E5C43334 for ; Tue, 14 Jun 2022 16:40:21 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 063986B0071; Tue, 14 Jun 2022 12:40:21 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 014136B0072; Tue, 14 Jun 2022 12:40:20 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DF5456B0074; Tue, 14 Jun 2022 12:40:20 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id D09816B0071 for ; Tue, 14 Jun 2022 12:40:20 -0400 (EDT) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id A109635153 for ; Tue, 14 Jun 2022 16:40:20 +0000 (UTC) X-FDA: 79577404200.15.19AAAD1 Received: from mail-vk1-f173.google.com (mail-vk1-f173.google.com [209.85.221.173]) by imf10.hostedemail.com (Postfix) with ESMTP id 3BD25C008D for ; Tue, 14 Jun 2022 16:40:20 +0000 (UTC) Received: by mail-vk1-f173.google.com with SMTP id 140so4284164vky.10 for ; Tue, 14 Jun 2022 09:40:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=H+CLgNnvG4ejMcDaCC6CBSZfQpDp7ynC+U7+7Ur1OXA=; b=ShGFnS6qjkhb3C+9rWluX3kBN1e3f0bBdM/co+arNUTANvlydk9n5NXypsAxPlEC27 IOZRjEny9+YhMUoVouZDLjfVklZeuo0AWnOiiNyhmUgBv4oSYYGqVkZKMP2lTDB1ZiQ8 E1HUZXNwiGO9SP415KZh4eCR0PFoWleU/HPDbOaYf5NLPJbXylcDIEMW2punvUZ/28q9 aTaDsGxb52XIcN2wJSj5bvAedlqfsWDksxzH8f4BDHDU8Yfh9tU3n6ym7kVSds3ckvmj 7ALAroQH7tmDsELWQH0rCFswsgWOqZQISg00OznNru4eaCHKNErhNSX3hNgghWQU5T0T JmNA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=H+CLgNnvG4ejMcDaCC6CBSZfQpDp7ynC+U7+7Ur1OXA=; b=NtK+IoQKRjWkEx7/ky7yaoaQxwCGRSZayOOl85e1clSOTWj6bpE1SwvyQU9Nye3WUg f4Yu2gQPwwtaUiSvof4u+4SZlXHp/NXR6kuif/bOjRcmezFoHtAmqXnWGOmRtoDrgvPL wWGG1yXfpoO2dSiV+eI11LfEdHKg51wzfzHOiqMUdg5R91M6T0EQaTT3MpVJPMzfVwbg e4SfDQXchyyUe+wCHjqwWvP+QOckTA+yKqh7VyV7tCBW6srJVY71qIpqHqiSi8qSGw9a pOBEBhAFQpVLjxwlHSs+AGk2toND7jAsyRG8fjavRKrG0ODpE6nWgddvque2mB2CgT5j X+1Q== X-Gm-Message-State: AJIora9Cv+9TGu0vr1H0A0mS20idhkKokMmjWn2eU0INx38hHyXJsZ5l aH5PAJt8Uj2EFOQlIrF0D2KGvTgS6MGzGDJR/gw= X-Google-Smtp-Source: AGRyM1uNwzfWHrr7+8bKRxefAncB5rUaV2CjMVG3ah+isvWQbfeyKTpgV08VCjoT3Mp0YlhixWHAVDTUBPP39Vre1Bw= X-Received: by 2002:a1f:3482:0:b0:368:4e6a:8ea5 with SMTP id b124-20020a1f3482000000b003684e6a8ea5mr2613686vka.2.1655224819297; Tue, 14 Jun 2022 09:40:19 -0700 (PDT) MIME-Version: 1.0 References: <202206141145339651323@zte.com.cn> In-Reply-To: <202206141145339651323@zte.com.cn> From: Yury Norov Date: Tue, 14 Jun 2022 09:40:08 -0700 Message-ID: Subject: Re: [PATCH] bitmap: fix a unproper remap when mpol_rebind_nodemask() To: wang.yi59@zte.com.cn, Andrew Morton , linux-mm Cc: andriy.shevchenko@linux.intel.com, linux@rasmusvillemoes.dk, linux-kernel@vger.kernel.org, xue.zhihong@zte.com.cn, wang.liang82@zte.com.cn, Liu.Jianjun3@zte.com.cn, Yury Norov Content-Type: text/plain; charset="UTF-8" ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1655224820; a=rsa-sha256; cv=none; b=v7RFJPgev6rVO3JGSQPQpnvhqq45bRBDxoGNkq26ewHJvdVnBQB1daDonuLj7wf5aSPUk+ U8tXobQbRCw2p0uck1EXJKni+T5nqX5tc4yNpa9Rar4XTAaMW7tSCPhf8XBxAWGc4xVTgQ oyvOQ740TJMrgIiJKdbPjEJ7lshwHIs= ARC-Authentication-Results: i=1; imf10.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=ShGFnS6q; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf10.hostedemail.com: domain of yury.norov@gmail.com designates 209.85.221.173 as permitted sender) smtp.mailfrom=yury.norov@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1655224820; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=H+CLgNnvG4ejMcDaCC6CBSZfQpDp7ynC+U7+7Ur1OXA=; b=oMg0CgQXGc35yk9GSj2ouqhvbpMozmSH18aOzoUxj/ySYgFsexpx5QhOpGS/l1354MOCGH cBGfmtBa9UsPqqGAUvM7iz/Pv45g0IGQ16EqQ9Z/ZPKabRGIECOoaYvWvmcwtwuNhOefd9 SQHu5jC2kgJhsUY2iiiYcQfFcJy8iVA= X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 3BD25C008D X-Stat-Signature: uonkgfhzep6cpnf6p3pjp46msh1x7dtb Authentication-Results: imf10.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=ShGFnS6q; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf10.hostedemail.com: domain of yury.norov@gmail.com designates 209.85.221.173 as permitted sender) smtp.mailfrom=yury.norov@gmail.com X-Rspam-User: X-HE-Tag: 1655224820-915588 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: + Andrew Morton + linux-mm@kvack.org On Mon, Jun 13, 2022 at 8:45 PM wrote: > > Hi Yury, > > Thanks for your quick and clear response! > > > On Mon, Jun 13, 2022 at 4:31 AM Yi Wang wrote: > > > > > > Consider one situation: > > > > > > The app have two vmas which mbind() to node 1 and node3 respectively, > > > and its cpuset.mems is 0-3, now set its cpuset.mems to 1,3, according > > > to current bitmap_remap(), we got: > > > > Regarding the original problem - can you please confirm that > > it's reproduced on current kernels, show the execution path etc. > > From what I see on modern kernel, the only user of nodes_remap() > > is mpol_rebind_nodemask(). Is that the correct path? > > Yes, it's mpol_rebind_nodemask() calls nodes_remap() from > mpol_rebind_policy(). The stacks are as follow: > [ 290.836747] bitmap_remap+0x84/0xe0 > [ 290.836753] mpol_rebind_nodemask+0x64/0x2a0 > [ 290.836764] mpol_rebind_mm+0x3a/0x90 > [ 290.836768] update_tasks_nodemask+0x8a/0x1e0 > [ 290.836774] cpuset_write_resmask+0x563/0xa00 > [ 290.836780] cgroup_file_write+0x81/0x150 > [ 290.836784] kernfs_fop_write_iter+0x12d/0x1c0 > [ 290.836791] new_sync_write+0x109/0x190 > [ 290.836800] vfs_write+0x218/0x2a0 > [ 290.836809] ksys_write+0x59/0xd0 > [ 290.836812] do_syscall_64+0x37/0x80 > [ 290.836818] entry_SYSCALL_64_after_hwframe+0x46/0xb0 > > To reproduce this situation, I write a program which seems like this: > unsigned int flags = MAP_PRIVATE | MAP_ANONYMOUS; > unsigned long size = 262144 << 12; > unsigned long node1 = 2; // node 1 > unsigned long node2 = 8; // node 3 > > p1 = vma1 = mmap(NULL, size, PROT_READ | PROT_WRITE, flags, -1, 0); > p2 = vma2 = mmap(NULL, size, PROT_READ | PROT_WRITE, flags, -1, 0); > > assert(!mbind(vma1, size, MPOL_BIND, &node1, MAX_NODES, MPOL_MF_STRICT | MPOL_MF_MOVE)); > assert(!mbind(vma2, size, MPOL_BIND, &node2, MAX_NODES, MPOL_MF_STRICT | MPOL_MF_MOVE)); > > Start the program whos name is mbind_tester, and do follow steps: > > mkdir && cd /sys/fs/cgroup/cpuset/mbind > echo 0-31 > cpuset.cpus > echo 0-3 > cpuset.mems > > cat /proc/`pidof mbind_tester`/numa_maps |grep bind -w > 7ff73e200000 bind:3 anon=262144 dirty=262144 active=0 N3=262144 kernelpagesize_kB=4 > 7ff77e200000 bind:1 anon=262144 dirty=262144 active=0 N1=262144 kernelpagesize_kB=4 > > echo 1,3 > cpuset.mems > cat /proc/`pidof mbind_tester`/numa_maps |grep bind -w > 7ff73e200000 bind:3 anon=262144 dirty=262144 active=0 N3=262144 kernelpagesize_kB=4 > 7ff77e200000 bind:3 anon=262144 dirty=262144 active=0 N1=262144 kernelpagesize_kB=4 > > As you see, after set cpuset.mems to 1,3, the nodes which one of vma > binded to changed from 1 to 3. > > This maybe confused, the original nodes binded is 1, after modify > cpuset.mems to 1,3 which include the node 3, it changed to 3... Ok, thanks for the reproducer. I'll take a look at it closer to the weekend. > > Anyways, as per name, bitmap_remap() is intended to change bit > > positions, and it doesn't look wrong if it does so. > > > > This is not how the function is supposed to work. For example, > > old: 00111000 > > new: 00011100 > > > > means: > > old: 00111 000 > > || \\\||| > > new: 000 11100 > > > > And after this patch it would be: > > old: 001 11000 > > || \||||| > > new: 000 11100 > > > > Which is not the same, right? > > Right. So, we both agree that bitmap_remap() works as advertised. This is good. Let's try figuring out a solution without touching it. > Actually this is what makes me embarrassed. If we want to fix this > situtation, we can: > > - change the bitmap_remap() as this patch did, but this changed the > behavior of this routine which looks does the right thing. One good > news is this function is only called by mpol_rebind_nodemask(). There are users of bitmap_remap() in drivers/gpio/gpio-xilinx.c > - don't change the bitmap_remap(), to be honest, I didn't figure out > a way. Any suggestions? I haven't had a chance to play with it (because of my dayjob), but I have a strong feeling that the proper solution should come from existing functionality. Did you experiment with MPOL_F_{STATIC,RELATIVE}_NODES? Those flags enable nodes_and() and mpol_relative_nodemask() paths correspondingly. > > If mpol_rebind() wants to keep previous relations, then according to > > the comment: > > * The positions of unset bits in @old are mapped to themselves > > * (the identify map). > > > > , you can just clear @old bits that already have good relations > > you'd like to preserve. > > Actually this does not work for me :) What I suggested is: 328 node_clear(1, pol->w.cpuset_mems_allowed); 329 node_clear(3, pol->w.cpuset_mems_allowed); 330 nodes_remap(tmp, pol->nodes, pol->w.cpuset_mems_allowed, 331 *nodes); 332 pol->w.cpuset_mems_allowed = *nodes; > In the example above, if set cpuset.mems to 0,2 firstly, the nodes > binds will change from 1 to 2. And then set cpuset.mems to 1,3, it will > change from 2 to 3 again. I bet you can find a sequence that will finally give you the desired binding. And probably somebody does this black magic in production. For me it looks just scary. Can you try those static/relative flags, and if it doesn't work, we'll have to invent another policy for nodes binding. Adding Andrew and linux-mm, as it's definitely beyond bitmaps scope. Thanks, Yury > --- > Best wishes > Yi Wang