From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id CA633C4332F for ; Sun, 13 Nov 2022 16:41:30 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 500B88000B; Sun, 13 Nov 2022 11:41:30 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 4B0AE8E0002; Sun, 13 Nov 2022 11:41:30 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 32B5E8000B; Sun, 13 Nov 2022 11:41:30 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 18ACD8E0002 for ; Sun, 13 Nov 2022 11:41:30 -0500 (EST) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id D8456120925 for ; Sun, 13 Nov 2022 16:41:29 +0000 (UTC) X-FDA: 80128984698.24.0AE14E6 Received: from mail-pf1-f170.google.com (mail-pf1-f170.google.com [209.85.210.170]) by imf14.hostedemail.com (Postfix) with ESMTP id A2A33100003 for ; Sun, 13 Nov 2022 16:41:28 +0000 (UTC) Received: by mail-pf1-f170.google.com with SMTP id g62so8982209pfb.10 for ; Sun, 13 Nov 2022 08:41:28 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:in-reply-to:from:references:cc:to:subject :user-agent:mime-version:date:message-id:from:to:cc:subject:date :message-id:reply-to; bh=+QrikNB/GMZeY3ufdrYnFiRrHx7b40lntxGNUYf5p3U=; b=DEsPXPkx4VzqvFb/BNADTLQhtFuR2MiVR6MQA/nq9lpHzO3CRtZw1rrdnZXcAUnZln uNtFnG1jMSD7MbPvszAtMSvtYiQe3kVI8zn6phL6ldMUlNTajEkJT4D1jMXe7yMTQ7Uj KQzJzP9UxB2sZUGQiwtB2R5J/8+uN8KUgAPkHwDfdX9d4GdcWf9xquF1Pp7DfzPMsBZq LQIX1jqb3v6SFTIxFr3drmEHE4p89UB2VMtW8+u58oVfmpMxw7bq4woCtZPTaOL9uRZT ebdYqvS0JiXc/02HFlybd8IU9Pp4/xcVhIg3dyEHdbuFZiNeRXLCwL8wDQtDD9W3jxsX o05g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:in-reply-to:from:references:cc:to:subject :user-agent:mime-version:date:message-id:x-gm-message-state:from:to :cc:subject:date:message-id:reply-to; bh=+QrikNB/GMZeY3ufdrYnFiRrHx7b40lntxGNUYf5p3U=; b=NxLlIETuNgExpg65wnR25kJ6llOBQ58EdygOhYJpTd8XU9m5Wdkrc48JsuxGMztLL1 OzuGBvXCgOgMktfStAk/0bQVKEsWRAkWDO4JqejjqrdecWmxn3tFeXzylNC73J/rO+Ue kGxwRGAEN/t7bivuBxIlLbx+Sj/iO2XeYrpUT6bLuSWhbhtn30QKDvOGuC8z5M/lvUNo fOJ1qVEnJo0+Spl8wf5u99uzXEpkZAVnk00TehPv3sil4I5+99LiNq2/SzozlXoI51p7 61LwT048LGcm5Bdud/YBdu2abRO1pzPJcDCSLjZ9NBkok1IdYAlNTsiJwPsGSP8IAwG4 Xf9Q== X-Gm-Message-State: ANoB5pngegJFgLv+sr5l1nyDJsNwq2o1AVYINsGmn8q9Po6iTx/jXL9L +bHs57gwQWhiSMHKBzqWhfkyLQ== X-Google-Smtp-Source: AA0mqf61Hhi23BxOCwNEasLV8GXX2B6mVAH7ThC0PZPXQNtBDFDI4bOxR0Hn5F0usudfIlB4j7q4SQ== X-Received: by 2002:a65:49c6:0:b0:46f:ed3a:ac42 with SMTP id t6-20020a6549c6000000b0046fed3aac42mr9107496pgs.617.1668357687296; Sun, 13 Nov 2022 08:41:27 -0800 (PST) Received: from [10.4.223.134] ([139.177.225.226]) by smtp.gmail.com with ESMTPSA id n3-20020a17090ab80300b00210c84b8ae5sm4772471pjr.35.2022.11.13.08.41.23 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Sun, 13 Nov 2022 08:41:26 -0800 (PST) Message-ID: Date: Mon, 14 Nov 2022 00:41:21 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.3.3 Subject: Re: [External] Re: [PATCH v2] mm: add new syscall pidfd_set_mempolicy(). To: Andrew Morton Cc: corbet@lwn.net, mhocko@suse.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-api@vger.kernel.org, linux-doc@vger.kernel.org References: <20221111084051.2121029-1-hezhongkun.hzk@bytedance.com> <20221111112732.30e1696bcd0d5b711c188a9a@linux-foundation.org> From: Zhongkun He In-Reply-To: <20221111112732.30e1696bcd0d5b711c188a9a@linux-foundation.org> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1668357689; a=rsa-sha256; cv=none; b=EfJRwZko//OG7nPXmiPxkq9Io+Qp3MjKSMwaEVBsd4vOEU4XgOiM3vp7yiWCt4e31Cpwmt MdhcJ4qrxdSbxFhiS184maAqRzfCjGMAZOmSMLpGfWNFDCHn8uYWgllSWJPvVdKplY2rkY gMy1hDoCLKbXrrLRNFwZodb7Dltk6UI= ARC-Authentication-Results: i=1; imf14.hostedemail.com; dkim=pass header.d=bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b=DEsPXPkx; dmarc=pass (policy=none) header.from=bytedance.com; spf=pass (imf14.hostedemail.com: domain of hezhongkun.hzk@bytedance.com designates 209.85.210.170 as permitted sender) smtp.mailfrom=hezhongkun.hzk@bytedance.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1668357689; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=+QrikNB/GMZeY3ufdrYnFiRrHx7b40lntxGNUYf5p3U=; b=bk1cQZFW7Ry8fn9jjnr1amR93OqbnYlLK9kM7EYqFeH0SdMuvJ6l7WQy4sN2mOuL8nZwOS Lhk54VYYDkuRjsWPsK6/jKv6WZapeZ3wEWs3tr5bEL3AMvIItTaUJC7UMwIXZJVa+lnIyR zpoEdPWPH5StWl6ewm8YgCnnk4eZJVM= X-Rspamd-Queue-Id: A2A33100003 Authentication-Results: imf14.hostedemail.com; dkim=pass header.d=bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b=DEsPXPkx; dmarc=pass (policy=none) header.from=bytedance.com; spf=pass (imf14.hostedemail.com: domain of hezhongkun.hzk@bytedance.com designates 209.85.210.170 as permitted sender) smtp.mailfrom=hezhongkun.hzk@bytedance.com X-Rspam-User: X-Rspamd-Server: rspam01 X-Stat-Signature: bj55mmxan9nbib3xjkta3p59d3s6e5iy X-HE-Tag: 1668357688-13991 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hi Andrew, thanks for your replay. > This sounds a bit suspicious. Please share much more detail about > these races. If we proced with this design then mpol_put_async() > shouild have comments which fully describe the need for the async free. > > How do we *know* that these races are fully prevented with this > approach? How do we know that mpol_put_async() won't free the data > until the race window has fully passed? A mempolicy can be either associated with a process or with a VMA. All vma manipulation is somewhat protected by a down_read on mmap_lock.In process context there is no locking because only the process accesses its own state before. Now we need to change the process context mempolicy specified in pidfd. the mempolicy may about to be freed by pidfd_set_mempolicy() while alloc_pages() is using it, the race condition appears. process context mempolicy is used in: alloc_pages() alloc_pages_bulk_array_mempolicy() policy_mbind_nodemask() mempolicy_slab_node() ..... Say something like the following: pidfd_set_mempolicy() target task stack: alloc_pages: mpol = p->mempolicy; task_lock(task); old = task->mempolicy; task->mempolicy = new; task_unlock(task); mpol_put(old); /*old mpol has been freed.*/ policy_node(...., mpol) __alloc_pages(mpol); To reduce the use of locks and atomic operations(mpol_get/put) in the hot path,task_work is used in mpol_put_async(), when the target task exit to user mode, the process context mempolicy is not used anymore, mpol_free_async() will be called as task_work to free mempolicy in target context. > Also, in some situations mpol_put_async() will free the data > synchronously anyway, so aren't these races still present? > If the task has run exit_task_work(),task_work_add() will fail. we can free the mempolicy directly because mempolicy is not used. > > Secondly, why was the `flags' argument added? We might use it one day? > For what purpose? I mean, every syscall could have a does-nothing > `flags' arg, but we don't do that. What's the plan here? > I found that some functions use 'flags' for scalability, such as process_madvise(), set_mempolicy_home_node(). back to our case, This operation has per-thread rather than per-process semantic ,we could use flags to switch for future extension if any. but I'm not sure. Thanks.