From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D97FAC4332F for ; Tue, 20 Dec 2022 07:27:04 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7806F8E0003; Tue, 20 Dec 2022 02:27:04 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 72F698E0001; Tue, 20 Dec 2022 02:27:04 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5D11A8E0003; Tue, 20 Dec 2022 02:27:04 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 4C60E8E0001 for ; Tue, 20 Dec 2022 02:27:04 -0500 (EST) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 28AACAB134 for ; Tue, 20 Dec 2022 07:27:04 +0000 (UTC) X-FDA: 80261853168.20.DF15B80 Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by imf06.hostedemail.com (Postfix) with ESMTP id B3CB2180012 for ; Tue, 20 Dec 2022 07:27:01 +0000 (UTC) Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=gh68qDQN; spf=none (imf06.hostedemail.com: domain of chao.p.peng@linux.intel.com has no SPF policy when checking 192.55.52.93) smtp.mailfrom=chao.p.peng@linux.intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1671521222; h=from:from:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=QBVW75tDSTEUN/LWDtDy44cTMi0Kbp8rpJX6lj7otnE=; b=WgqqpaGST0NLENqQD/w6l5ljg5uwzDwm6azs4kq29dAgnkJy3D/pVh8zHqycU2soishK0m f8dj2feh1vqXdZVugGmSuDgLe8CNuWp0JG+SoCB7BmOC1DaTj70MmP5M10raur+//FI+Hy o2rMqTGo9G8Bir2Ya5uuGAJZIGdqkIY= ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=gh68qDQN; spf=none (imf06.hostedemail.com: domain of chao.p.peng@linux.intel.com has no SPF policy when checking 192.55.52.93) smtp.mailfrom=chao.p.peng@linux.intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1671521222; a=rsa-sha256; cv=none; b=V6fVqCRL9TEiv3fViBrBaADGiVTc/DU/vKBtezRxdQpwNt2dVRWHRel68WU0Sp3qFLsy9t vyoB+yiQHXlCUWcN85dLnzKvWyt24w7kfZG5noA6EJR/ek8JmSJYnwTKBffb0/1Nptadsc TNkGmnGwipmI29fNn56SVOvtjw0tlTE= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1671521221; x=1703057221; h=date:from:to:cc:subject:message-id:reply-to:references: mime-version:content-transfer-encoding:in-reply-to; bh=c7MFanM5OsQxf4mfVMxJdtJD6xfrjMgUk4/0Zk73muk=; b=gh68qDQNB+tpJWPELMyRFvZIHC5/d4WkVru8NndEAotqTxRfEeFPIYGD RApWuN4OCah+d17K6zaeHuc/pQCmP9gAh/6lBmmsN6pquWp2LPadvF4+0 mRteSD3zaNgtSbZ7HpqMj/qiuERzrn8uUXqKmigO/A/dOKrZKDe+jMcy1 lCkODlj9M8VSPMek1PvEL99PfEGacRjpuA86ne+6nPhVKMUmO/M3jbL7j iT/Y4bQAMyI04tw+MNxKcwmZPyAro8KXNi0GL+xNXki0T0Zw57E2hkADm NB86MXiDenE2le/jHTDQGqAqp16lZaTdOZ/0Jnx3BvdjDtccCrqIeP2Bs Q==; X-IronPort-AV: E=McAfee;i="6500,9779,10566"; a="317184660" X-IronPort-AV: E=Sophos;i="5.96,258,1665471600"; d="scan'208";a="317184660" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Dec 2022 23:26:55 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6500,9779,10566"; a="714332739" X-IronPort-AV: E=Sophos;i="5.96,258,1665471600"; d="scan'208";a="714332739" Received: from chaop.bj.intel.com (HELO localhost) ([10.240.193.75]) by fmsmga008.fm.intel.com with ESMTP; 19 Dec 2022 23:26:44 -0800 Date: Tue, 20 Dec 2022 15:22:28 +0800 From: Chao Peng To: "Huang, Kai" Cc: "tglx@linutronix.de" , "linux-arch@vger.kernel.org" , "kvm@vger.kernel.org" , "Wang, Wei W" , "jmattson@google.com" , "Lutomirski, Andy" , "ak@linux.intel.com" , "kirill.shutemov@linux.intel.com" , "david@redhat.com" , "qemu-devel@nongnu.org" , "tabba@google.com" , "Hocko, Michal" , "michael.roth@amd.com" , "corbet@lwn.net" , "linux-fsdevel@vger.kernel.org" , "dhildenb@redhat.com" , "bfields@fieldses.org" , "linux-kernel@vger.kernel.org" , "x86@kernel.org" , "bp@alien8.de" , "vannapurve@google.com" , "rppt@kernel.org" , "shuah@kernel.org" , "vkuznets@redhat.com" , "vbabka@suse.cz" , "mail@maciej.szmigiero.name" , "linux-api@vger.kernel.org" , "qperret@google.com" , "arnd@arndb.de" , "pbonzini@redhat.com" , "ddutile@redhat.com" , "naoya.horiguchi@nec.com" , "Christopherson,, Sean" , "wanpengli@tencent.com" , "yu.c.zhang@linux.intel.com" , "hughd@google.com" , "aarcange@redhat.com" , "mingo@redhat.com" , "hpa@zytor.com" , "Nakajima, Jun" , "jlayton@kernel.org" , "joro@8bytes.org" , "linux-mm@kvack.org" , "steven.price@arm.com" , "Hansen, Dave" , "linux-doc@vger.kernel.org" , "akpm@linux-foundation.org" , "linmiaohe@huawei.com" Subject: Re: [PATCH v10 1/9] mm: Introduce memfd_restricted system call to create restricted user memory Message-ID: <20221220072228.GA1724933@chaop.bj.intel.com> Reply-To: Chao Peng References: <20221202061347.1070246-1-chao.p.peng@linux.intel.com> <20221202061347.1070246-2-chao.p.peng@linux.intel.com> <5c6e2e516f19b0a030eae9bf073d555c57ca1f21.camel@intel.com> <20221219075313.GB1691829@chaop.bj.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Rspamd-Queue-Id: B3CB2180012 X-Stat-Signature: 89nuw51pjy8rthdks1pcmboqj9htpfop X-Rspam-User: X-Rspamd-Server: rspam08 X-HE-Tag: 1671521221-259147 X-HE-Meta: U2FsdGVkX1+hXbBEVtaC0vc3mPowYeLYbYPV/KTq6bep9Z9gZydExK5mN8/3tJse2D5e4S3xVzpRbPkb407uG8UgBw0KgYFGJp9aLYxYGJNj2cSJowDVLQ/ZdMQj0KwIZ30sVox03Hv854JsqJtLxdyI4JBaQjF7ocMgkeUbLZc6yM/cxrfuXa1mOPwwYHDpmtfzpWBsk1Qd0KMT/4grz+MaXVqx9QNthFMB3v/osjbKnol3bCMtz2OZpEu1XqMAMWukc8433fTU+J7WVDZWvLst8IHvd4eobNcuZpGG1dr4f/Zbs0TWoTimqRc+CzGnBOMagr8KXHDislh+spdGAUHSg4L89jNwtw5Hw3bK4xUPFSIHkWGy0XXGcE2WgaWOmrA2J+yh5YauCCNt/r2CJGykLePTRZV/yYPFoCMriD1jYb9/k2sVygcwv9ZY+a/htIfOUD7Y6mqVa28zvKO8aiR4vZwTWhJ0WjOUQh2LIauRaNObGuMrffzPUTxYEpKaQnWXVboDx9SvBrhZQuw4Fb4ysiK51bJFWWFljJop2vY+WcyRKQnMprp/kMmWOG39H3NfnFfPqVb575EV/16M8tJBif2MsQsFob6a/KZ52blbD3J6iodQeJAYbjZfvLZDga/OhnxMRToD8m7yVU00DqkivzKhTZmL1uRTr18UWFEv/X1bZhmYYlsaB5TslrUah+eXPF33lvREzaMazL+vttzCK/CrL3LM9aCC/zuhPvank+AtuT2DcgFimKZp7Q31LyTmQsefem+ZLJw0cYxFl25Vrd4oejPKu8az4+U98S/s1lcd+hJ9r7amFzJ3zE3PkeY7Av1WQw9M8u1rQ/QM2hRgKsmkJLbu6nyy6rW2ReRub6+KO6VMeP2+Hv6HceOV8HnbiRhjtryKBBhQEuOR+y/eAO20xE6m+36Gfp0g9898zoikAUr02qn0M++8KjEN7IsFqT0ZJJ24S7fc9xA x0TIvcmo LndTF4iV5AizrDdx+TFvFCvRl0YHKYwX7FEUm5PSULe6ONjQVHzI3P5rEse/mZ59CMazX X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, Dec 19, 2022 at 08:48:10AM +0000, Huang, Kai wrote: > On Mon, 2022-12-19 at 15:53 +0800, Chao Peng wrote: > > > > > > [...] > > > > > > > + > > > > + /* > > > > + * These pages are currently unmovable so don't place them into > > > > movable > > > > + * pageblocks (e.g. CMA and ZONE_MOVABLE). > > > > + */ > > > > + mapping = memfd->f_mapping; > > > > + mapping_set_unevictable(mapping); > > > > + mapping_set_gfp_mask(mapping, > > > > +      mapping_gfp_mask(mapping) & ~__GFP_MOVABLE); > > > > > > But, IIUC removing __GFP_MOVABLE flag here only makes page allocation from > > > non- > > > movable zones, but doesn't necessarily prevent page from being migrated.  My > > > first glance is you need to implement either a_ops->migrate_folio() or just > > > get_page() after faulting in the page to prevent. > > > > The current api restrictedmem_get_page() already does this, after the > > caller calling it, it holds a reference to the page. The caller then > > decides when to call put_page() appropriately. > > I tried to dig some history. Perhaps I am missing something, but it seems Kirill > said in v9 that this code doesn't prevent page migration, and we need to > increase page refcount in restrictedmem_get_page(): > > https://lore.kernel.org/linux-mm/20221129112139.usp6dqhbih47qpjl@box.shutemov.name/ > > But looking at this series it seems restrictedmem_get_page() in this v10 is > identical to the one in v9 (except v10 uses 'folio' instead of 'page')? restrictedmem_get_page() increases page refcount several versions ago so no change in v10 is needed. You probably missed my reply: https://lore.kernel.org/linux-mm/20221129135844.GA902164@chaop.bj.intel.com/ The current solution is clear: unless we have better approach, we will let restrictedmem user (KVM in this case) to hold the refcount to prevent page migration. Thanks, Chao > > Anyway if this is not fixed, then it should be fixed. Otherwise, a comment at > the place where page refcount is increased will be helpful to help people > understand page migration is actually prevented. >