From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3D6EFC433F5 for ; Mon, 24 Jan 2022 08:54:42 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id AB67E6B0081; Mon, 24 Jan 2022 03:54:41 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id A667F6B0083; Mon, 24 Jan 2022 03:54:41 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 92E906B0085; Mon, 24 Jan 2022 03:54:41 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0098.hostedemail.com [216.40.44.98]) by kanga.kvack.org (Postfix) with ESMTP id 82B556B0081 for ; Mon, 24 Jan 2022 03:54:41 -0500 (EST) Received: from smtpin10.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 41520181A8712 for ; Mon, 24 Jan 2022 08:54:41 +0000 (UTC) X-FDA: 79064569962.10.91BC63C Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf29.hostedemail.com (Postfix) with ESMTP id 814CE12000F for ; Mon, 24 Jan 2022 08:54:40 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1643014479; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=5OVepNS05vWemruw80HMvgP2Cce0RvFNyMp9MhW0PGU=; b=Nk9kjGexxMeNXCZfWPUKtHtKyWa0MtLX3ZzHh7wpT1pnQlTxJHNdMILa8w/L18h1NoJ+NP Ux+RzyV5L6r+r10DhzraGOoXxv3CimR2RG+usQddFZsjmJEiLfNr9aEXc8HcWWZfcjWwUb exOTW7Nuc5xaPVa7Mt+GjhUVOQscSdY= Received: from mail-pg1-f200.google.com (mail-pg1-f200.google.com [209.85.215.200]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-423-6CEIJKGEPvGaHx0InXRtBg-1; Mon, 24 Jan 2022 03:54:37 -0500 X-MC-Unique: 6CEIJKGEPvGaHx0InXRtBg-1 Received: by mail-pg1-f200.google.com with SMTP id i25-20020a635859000000b003486e456e52so9549642pgm.8 for ; Mon, 24 Jan 2022 00:54:37 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=5OVepNS05vWemruw80HMvgP2Cce0RvFNyMp9MhW0PGU=; b=YCjx/pWCFqpWNc2lC99vgmRsvf+rRsPiWKmiH4phcIpA0t9AJT56IspzRe3w9s2e+z NjxDKvDULhrgWo6A83v3/4hKsosO4XEGpTpj70qOCabBE7Ez+fGbx389W1BVL0UtJRYb kZYL6NWAzik0GIgWA4kIRTf51EoylM7RPh93V6dglp0iAQe0kwEs5pdpquFXnGrAdaMx QlxyKL32HO6WK9u5EFwdNts0NiB8EFKwETsFFbI59a4loPHXLmyT6fqNvWqg4Ro45k12 us/Ya4nyfpecP2mR7SaOGSPD11WYkz4mttrnkYGyJHzrG0HQImGbuzolv3RIsh6Hgk5Z XSgg== X-Gm-Message-State: AOAM532HK1AUucAO6lbFRu8nbSzS0Rfe+ZUz+o09qtrFA7DYqU5Cdtav kLK5aHOa1bQY4fd0gIErQ8enoH90Brp/cWM2aYTbE6yBHZpMotF7/IKxwpVnsLtXuQKDk14CMpP 2xvVbjwO0mwg= X-Received: by 2002:a63:9044:: with SMTP id a65mr11395419pge.552.1643014476623; Mon, 24 Jan 2022 00:54:36 -0800 (PST) X-Google-Smtp-Source: ABdhPJyVMGkl2qG/CyPTPJJcywYWbLzejBTKWESJbfEVMgtqVEjJmtr72g+99laInA7PB78pKKgtRA== X-Received: by 2002:a63:9044:: with SMTP id a65mr11395404pge.552.1643014476338; Mon, 24 Jan 2022 00:54:36 -0800 (PST) Received: from xz-m1.local ([94.177.118.73]) by smtp.gmail.com with ESMTPSA id x18sm3639230pfc.123.2022.01.24.00.54.31 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 24 Jan 2022 00:54:36 -0800 (PST) Date: Mon, 24 Jan 2022 16:54:29 +0800 From: Peter Xu To: Hugh Dickins Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, David Hildenbrand , Andrea Arcangeli , Yang Shi , Vlastimil Babka , Andrew Morton , Alistair Popple , "Kirill A . Shutemov" , Matthew Wilcox Subject: Re: [PATCH RFC v2 1/2] mm: Don't skip swap entry even if zap_details specified Message-ID: References: <20211115134951.85286-1-peterx@redhat.com> <20211115134951.85286-2-peterx@redhat.com> <9937aaa-d9ab-2839-b0b7-691d85c9141@google.com> <391aa58d-ce84-9d4-d68d-d98a9c533255@google.com> <93dd745c-5e8b-a50-4ec5-b3f3728ad8b@google.com> MIME-Version: 1.0 In-Reply-To: <93dd745c-5e8b-a50-4ec5-b3f3728ad8b@google.com> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Disposition: inline X-Rspamd-Queue-Id: 814CE12000F X-Stat-Signature: uuff57s45iy3p7bmiujzgkywqswyb8ib Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=Nk9kjGex; spf=none (imf29.hostedemail.com: domain of peterx@redhat.com has no SPF policy when checking 170.10.129.124) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com X-Rspamd-Server: rspam01 X-HE-Tag: 1643014480-563555 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Sun, Jan 23, 2022 at 10:29:40PM -0800, Hugh Dickins wrote: > > However, as I stated before, all these use cases always have another step to > > take the lock and redo the range. Then even if some migration entry got > > wrongly skipped it'll always be fixed. What we need to find is some caller > > that calls zap_pte_range() without later taking the page lock and redo that. > > That's the only possibility to trigger a real issue on the shmem accounting. > > I agree that the fallback "if (folio_mapped() unmap_mapping_folio()", > while holding folio lock, ensures that there cannot be a migration entry > substituted for present pte at that time, so no problem if migration entry > was wrongly skipped on the earlier unlocked pass. > > But you're forgetting the complementary mistake: that the earlier unlocked > pass might have zapped a migration entry (corresponding to an anon COWed > page) when it should have skipped it (while punching a hole). IMHO we won't wrongly zap a migration entry because when it's file backed we've got non-NULL zap_details, so we'll skip all migration entries. IOW, we can only wrongly skip some entries, not wrongly zap some. But I get your point, and thanks for pointing out what I missed - I think I forgot the private mappings completely somehow when writting that up.. I have a quick idea on reproducer now (perhaps file size shrinking on private pages being swapped out), I'll try to write a real reproducer and update later. [...] > I did not understand what you were asking there; but in your followup > mail, I think you came to understand what I meant better. Yes, I > believe you could safely replace struct address_space *zap_mapping > by a more understandable boolean (skip_cows? its inverse would be > easier to understand, but we don't want almost everyone to have to > pass a zap_details initialized to true there). The only even_cows==true for zap_details is with unmap_mapping_range(), where its caller passed over even_cows==true as parameter. So IMHO that helper helped to construct the zap_details anyway. I'll try it out starting with naming it zap_details.even_cows; I'll make it the last patch as a cleanup. > > > > > > > rss[mm_counter(page)]--; > > > > > > > } > > > > > > I have given no thought as to whether more "else"s are needed there. > > > > It's hwpoison that's in the else. Nothing else should. > > > > I didn't mention it probably because I forgot. I did think about it when > > drafting, and IMHO we should simply zap that hwpoison entry because: > > > > (1) Zap means the user knows this data is meaningless, so at least we > > shouldn't SIGBUS for that anymore. > > > > (2) If we keep it there, it could errornously trigger SIGBUS later if the > > guest accessed that pte again somehow. > > > > I plan to mention that in the commit message, but I can also add some comments > > directly into the code. Let me know your thoughts. > > It's comes down, again, to what punching a hole in a file should do > to the private data that has been COWed from it. Strictly, it should > not interfere with it, even when the COWed page has become poisonous: > the entry should be left in to generate SIGBUS. Whereas ordinary > unmapping or truncating or MADV_DONTNEEDing would zap it. Makes sense, I'll take care of that in the new version too. Thanks, -- Peter Xu