From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C11B6C61DA4 for ; Tue, 14 Feb 2023 22:35:53 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 37C856B0074; Tue, 14 Feb 2023 17:35:53 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 32C6F6B0075; Tue, 14 Feb 2023 17:35:53 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1F3A76B0078; Tue, 14 Feb 2023 17:35:53 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 0F6816B0074 for ; Tue, 14 Feb 2023 17:35:53 -0500 (EST) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id AB85D120F15 for ; Tue, 14 Feb 2023 22:35:52 +0000 (UTC) X-FDA: 80467356144.14.69BD99B Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf14.hostedemail.com (Postfix) with ESMTP id 26D06100011 for ; Tue, 14 Feb 2023 22:35:49 +0000 (UTC) Authentication-Results: imf14.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=dX1KwI2K; spf=pass (imf14.hostedemail.com: domain of peterx@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1676414150; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=YheRg9Vw4P/y4jjZohytFNgbAeq2StqYfeQb5/kAN8o=; b=ppdllTEZQzwslcVMSwswYijKcvhQK7Fb4LdIktaVnGUdNbs5Rx609I2JHc5V6ZSuz8S6hA u10KHt6cIMBEg60PZO6qIcSZdufQekhiPEnGN2uZ7ssMq5O8VhUFDMB4rXrX8Qn0XDvUnH 2uIsStrALc5hOx2mOZWsR9EKFVgm63Y= ARC-Authentication-Results: i=1; imf14.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=dX1KwI2K; spf=pass (imf14.hostedemail.com: domain of peterx@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1676414150; a=rsa-sha256; cv=none; b=WjE2r91rgIDIZarCIKkbJqwz2P5QTVW0J5i6GnLI3LLKRMsg9Zzm0qVDY9pDhZALDnfxuk bfOiyWTRvCoiCx3ds0t7mZsgjrIUN7BWGVxmwOFoxej7ZjGuXg8wLjAgBMxb31CoABJsgE g/OpMEUR53Mg/AtJRZam/VmxpYozVzc= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1676414149; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=YheRg9Vw4P/y4jjZohytFNgbAeq2StqYfeQb5/kAN8o=; b=dX1KwI2KG6ZuI+KeZe02m2x+6YOyMhrRIRwEFLLBh/m7FXiEdM/KvV7CoFsMyD/bUYUO/Z cmi2Ik1aR8ao7h5Nb0AF2PeCkFxVdmkM+fStobAYpc4MqgeLBi5+5+aMpOwBvQeu/tAdcH JE2Y02o7/O83ZJRj3mEBzK2FmoYINcM= Received: from mail-qt1-f198.google.com (mail-qt1-f198.google.com [209.85.160.198]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-135-sh83KxV1OV2uAeLkR1cOeQ-1; Tue, 14 Feb 2023 17:35:48 -0500 X-MC-Unique: sh83KxV1OV2uAeLkR1cOeQ-1 Received: by mail-qt1-f198.google.com with SMTP id c14-20020ac87d8e000000b003ba2d72f98aso10191092qtd.10 for ; Tue, 14 Feb 2023 14:35:48 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1676414147; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=YheRg9Vw4P/y4jjZohytFNgbAeq2StqYfeQb5/kAN8o=; b=Jz/iUI0wWx1PEq694b5685Ov6eiZ8OC/b2VWFXYsPbVdM3blz06Be1lbJa5nR6ko1H I9iSKt7l6kj+akEh6xyUAg/7G2stOhHldY1RZDankvw0MLPlYU9DBciy2r3+/xndCdmW PzUvknL9Go3smAwIEI3/+9OBXlM1C+lPRfCZoZ0o+pKi8STH6iD8JqyMQ/uuRuCX7y7P 7Z5hH8bvoCqdOCj9EceMzhJNcan4611xTqonxCgDVIvZ+l5px7O2KPN2UunbVKfZSTZG nVjT1aip7CatCjqnTHmZiF3YZeKZH6ztffdZlzbfttzgDecoABUpM6OhmDvy729loOaQ vMRg== X-Gm-Message-State: AO0yUKWrJIfujwYerh7hXsO1G9tjtvP5cuzkmzLvC3wQZDjdKQLFDK0G C2xnEKfAUAjzDlxpBHX1yLK+/mjWl2CaFJAkHs9MCh5yf8biZoesZlx/wuSbFhrWx/1FDCEdfkY nKBfNflL1uoY= X-Received: by 2002:a05:622a:1a11:b0:3b8:4951:57bb with SMTP id f17-20020a05622a1a1100b003b8495157bbmr7602120qtb.5.1676414147656; Tue, 14 Feb 2023 14:35:47 -0800 (PST) X-Google-Smtp-Source: AK7set9ZAa+oY9vbMlSxvaFJ8usz7qINFYUhnR8zRXwUH/Xz+UYY5ixxvOvEGXCPq8mv0aKJwXoApg== X-Received: by 2002:a05:622a:1a11:b0:3b8:4951:57bb with SMTP id f17-20020a05622a1a1100b003b8495157bbmr7602069qtb.5.1676414147296; Tue, 14 Feb 2023 14:35:47 -0800 (PST) Received: from x1n ([70.30.145.63]) by smtp.gmail.com with ESMTPSA id e16-20020a05620a015000b0073b399700adsm5597387qkn.3.2023.02.14.14.35.46 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 14 Feb 2023 14:35:46 -0800 (PST) Date: Tue, 14 Feb 2023 17:35:41 -0500 From: Peter Xu To: David Stevens Cc: linux-mm@kvack.org, Matthew Wilcox , Andrew Morton , "Kirill A . Shutemov" , Yang Shi , David Hildenbrand , Hugh Dickins , linux-kernel@vger.kernel.org Subject: Re: [PATCH 2/2] mm/khugepaged: skip shmem with userfaultfd Message-ID: References: <20230214075710.2401855-1-stevensd@google.com> <20230214075710.2401855-2-stevensd@google.com> MIME-Version: 1.0 In-Reply-To: <20230214075710.2401855-2-stevensd@google.com> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Disposition: inline X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 26D06100011 X-Stat-Signature: z3s5tiiakfpe6ofeo7eg7cg8gmf561mi X-Rspam-User: X-HE-Tag: 1676414149-879764 X-HE-Meta: U2FsdGVkX18iMlcIUUfJOMf0vyUp9563st1DR1gectHEhOtUgBRSS9Y0VDIAyBGSbYK4LxIfywvqyWpy3dyGrl3YoL6vGdDhIfjRZXg3XwzgGgt2GnTzOAUmf7Sm/vRar0yQsFxfxQnt02Eyvg5tGPFq3mHH+AepIpfro2Jls1yIPB260a4FYe40TVfSznVdcFYHoI12m2eTJDnnij6isHlvnFH66Lu17aMdOD852jWf3+owXM6NiECpsF7xvNmxqi26uKihgGs9f3gAbtLajIqK4buKAnk4LM9r+mZwqCQAxQEnmxc8WqPeVmsUeir9O4Ymco4Ss6PnfU2l6srSppEWjz8CA90oKBV2g14kPsDucs3aZziuxhSAuUVx9U/A91Ie1prgbsT5S9WBDmytB86ExY2hKPw6ZWGv33BUty9l8yVNQBb3bm0TW0O+qR8AWgSCVqp7HgjgsBU+ijQ01jKQyKehoscSaxahqzspovrp38dCJtk3DQJHOGsSPOzwuyrCoVPMLIXQPFPjRZ8t5M0bBg6uUtX/tMJjyr8Z+sAko8Pe3x4GpxhAEQkLtmuJSHGhVxVHC7N/wA7r5q/ALUANpaLlIA7NZXBBPtrc5XcKjozr92BfdkFALu3nTf/YGSfhPVlkduuKjsVhDlkobw6+RebO6/AusWm7BAgi8cUK/z+0l7NXv3LvH+RYdWELaxA3AoGIwZOrWyLzdhLzAZRCpo4XNoDOnU7Q+J8ZneEl+5UbjzXEVjFn5M66Hq2hoy/Z8+bCadDpFg3CElQGoin0pb2nc+cfW4phstJJVdBJbY1TseivCqnUyH2l3sxFxWCp/NkFryQxkgb9Zl4SfrdRN76SXrhLwXMxm19+9pKCuYOn0iNs6k2gQV6ZJFt44nNoG7jKKXQwY8iVhKO3Ri/RAS/gcZhfZN0Rp01CCRm5RhtWWPZAHQ8gATFMASsgT0wXpI1NDlv+F63440o KUKdAEuD 2YxOLT5ZKLyU9R4Ms0An7Mz1I0uvkkcpt8dRnvhmwH5HNRqP6WRnVlZcGDBJZ5obQDNJN3LXG7VBhrqV07yWR08H26cY5EZFSRAXOuI9sJ50GWFHNNL00orim3u7hZCM2YlY8o2wVD59+ScImJLM7e/FIqDmxKpJX047LyvSxSqkdrDixBu0S31BdLNQgIB/e7SvbfJLI1K5iqN4yPB/LEQ6Ci30CFoYjYmyw2/su5AK6KzweoKBBULiWwPRp34krkajVac6k29N+CPHifx/Sdjybepb5uBU/49FkwwiBsa6Mc+4EMn4jujerhafHBEBOmzfv1EOPtNBPrYnEZYypif/glbvMBdIpuIhBJj84Enh4zPgAnzbc5cL6MEPkpChRcbRFP1OGoDACbMj/ymaTHX+d7kvxHhdxEaoLspomP/vbEeTYU22/hFUHkQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hi, David, On Tue, Feb 14, 2023 at 04:57:10PM +0900, David Stevens wrote: > From: David Stevens > > Make sure that collapse_file respects any userfaultfds registered with > MODE_MISSING. If userspace has any such userfaultfds registered, then > for any page which it knows to be missing, it may expect a > UFFD_EVENT_PAGEFAULT. This means collapse_file needs to take care when > collapsing a shmem range would result in replacing an empty page with a > THP, so that it doesn't break userfaultfd. > > Synchronization when checking for userfaultfds in collapse_file is > tricky because the mmap locks can't be used to prevent races with the > registration of new userfaultfds. Instead, we provide synchronization by > ensuring that userspace cannot observe the fact that pages are missing > before we check for userfaultfds. Although this allows registration of a > userfaultfd to race with collapse_file, it ensures that userspace cannot > observe any pages transition from missing to present after such a race. > This makes such a race indistinguishable to the collapse occurring > immediately before the userfaultfd registration. > > The first step to provide this synchronization is to stop filling gaps > during the loop iterating over the target range, since the page cache > lock can be dropped during that loop. The second step is to fill the > gaps with XA_RETRY_ENTRY after the page cache lock is acquired the final > time, to avoid races with accesses to the page cache that only take the > RCU read lock. > > This fix is targeted at khugepaged, but the change also applies to > MADV_COLLAPSE. MADV_COLLAPSE on a range with a userfaultfd will now > return EBUSY if there are any missing pages (instead of succeeding on > shmem and returning EINVAL on anonymous memory). There is also now a > window during MADV_COLLAPSE where a fault on a missing page will cause > the syscall to fail with EAGAIN. > > The fact that intermediate page cache state can no longer be observed > before the rollback of a failed collapse is also technically a > userspace-visible change (via at least SEEK_DATA and SEEK_END), but it > is exceedingly unlikely that anything relies on being able to observe > that transient state. > > Signed-off-by: David Stevens > --- > mm/khugepaged.c | 66 +++++++++++++++++++++++++++++++++++++++++++------ > 1 file changed, 58 insertions(+), 8 deletions(-) Could you attach a changelog in your next post (probably with a cover letter when patches more than one)? Your patch 1 reminded me that, I think both lseek and mincore will not report DATA but HOLE on the thp holes during collapse, no matter we fill hpage in (as long as hpage being !uptodate) or not (as what you do with this one). However I don't understand how this new patch can avoid the same race issue I mentioned in the last version at all. -- Peter Xu