From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id CC7D3C433EF for ; Fri, 3 Dec 2021 03:22:14 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 205466B0072; Thu, 2 Dec 2021 22:22:04 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 1B3B76B0074; Thu, 2 Dec 2021 22:22:04 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 07B456B0075; Thu, 2 Dec 2021 22:22:04 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0168.hostedemail.com [216.40.44.168]) by kanga.kvack.org (Postfix) with ESMTP id E76786B0072 for ; Thu, 2 Dec 2021 22:22:03 -0500 (EST) Received: from smtpin03.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 9D71A853F7 for ; Fri, 3 Dec 2021 03:21:53 +0000 (UTC) X-FDA: 78875033706.03.AC51AC0 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf21.hostedemail.com (Postfix) with ESMTP id B6BE9D0369C2 for ; Fri, 3 Dec 2021 03:21:52 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1638501712; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=vzuVgvyhxiBkVbvcwpuOqvhosMeEWn5XKKq2zzIjTdo=; b=UW4+lbloSSG5jTOuTeDwEh81yH4QKZAnIEY+O1KzTp34b+Rr4DmCQvaCZTJzASh/4gsNfr eocaEFfjdqNUgC4tJ9/8t9i4T/va1j1Mc4X75w91prPF1ln1jwjrRII+9SmkjVxLNUfwiE K1RcEVr8UvvfSH+zgY0VKPXqfUh8zj8= Received: from mail-wr1-f72.google.com (mail-wr1-f72.google.com [209.85.221.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-525-icCpqAdAN02g5xFX9p_dUA-1; Thu, 02 Dec 2021 22:21:51 -0500 X-MC-Unique: icCpqAdAN02g5xFX9p_dUA-1 Received: by mail-wr1-f72.google.com with SMTP id q5-20020a5d5745000000b00178abb72486so305290wrw.9 for ; Thu, 02 Dec 2021 19:21:51 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=vzuVgvyhxiBkVbvcwpuOqvhosMeEWn5XKKq2zzIjTdo=; b=cjKLzfWV0CDwJISv9ca7XGf2oPlRQVdCN+PycaGmuZ31H6N66x1h/BvlS1XEy0fISS kWeyl5Y8c51cW1k462zoicLMaYBYIAOQhbIeHO/GjykF+xcc4uWVvKoGHgfh+1qwFthi nqcestUFqhxksyrnKepJPEQ8zl70ftHxSeeKseySfGPNnDVGpyGZ1/0vLmo1dpBNFcsT W8VH1hTqs2Y3N1yqYQWz1N23GPeU5YNT64+K+ViMRT12lzklx9FwiUAQU5EuDu8kZZg9 sl8Gx7w65NzK/Ccy1ldLny7TbqRkUNpNS3UjU/FPYwZmp79Xq39eVXNVrSFUYMDEG0DE ksFw== X-Gm-Message-State: AOAM530o2l8PdqdTs2QngkUlzznICiey+1Ub1ntu3F4Fr1MlgB+ghwFg bVrDh+FgSifH4LaJOUfAN6ZC0dO4rEeLjbn6ciZEcO6HstA9oeA8x9rFUD3/Iez6LgfyFejKd5Z AZ40bZhllKPs= X-Received: by 2002:a5d:4d07:: with SMTP id z7mr18592841wrt.487.1638501710210; Thu, 02 Dec 2021 19:21:50 -0800 (PST) X-Google-Smtp-Source: ABdhPJyQIEnz7SvNrCA3Y4Vx5xKKG6OXNdeRVpggKagF7Rpozu5MuTOaGKqE2aTuS2skTbD3beoQGw== X-Received: by 2002:a5d:4d07:: with SMTP id z7mr18592813wrt.487.1638501709975; Thu, 02 Dec 2021 19:21:49 -0800 (PST) Received: from xz-m1.local ([64.64.123.26]) by smtp.gmail.com with ESMTPSA id o12sm2056007wrc.85.2021.12.02.19.21.44 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 02 Dec 2021 19:21:49 -0800 (PST) Date: Fri, 3 Dec 2021 11:21:41 +0800 From: Peter Xu To: Alistair Popple Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, David Hildenbrand , Andrea Arcangeli , Yang Shi , Vlastimil Babka , Hugh Dickins , Andrew Morton , "Kirill A . Shutemov" Subject: Re: [PATCH RFC v2 1/2] mm: Don't skip swap entry even if zap_details specified Message-ID: References: <20211115134951.85286-1-peterx@redhat.com> <20211115134951.85286-2-peterx@redhat.com> <5393877.lttFOZEo4r@nvdebian> MIME-Version: 1.0 In-Reply-To: <5393877.lttFOZEo4r@nvdebian> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Disposition: inline X-Stat-Signature: j8oougbthcakcunjychgzi988gs3nse6 Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=UW4+lblo; spf=none (imf21.hostedemail.com: domain of peterx@redhat.com has no SPF policy when checking 170.10.129.124) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: B6BE9D0369C2 X-HE-Tag: 1638501712-788347 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Dec 02, 2021 at 10:06:46PM +1100, Alistair Popple wrote: > On Tuesday, 16 November 2021 12:49:50 AM AEDT Peter Xu wrote: > > This check existed since the 1st git commit of Linux repository, but at that > > time there's no page migration yet so I think it's okay. > > > > With page migration enabled, it should logically be possible that we zap some > > shmem pages during migration. When that happens, IIUC the old code could have > > the RSS counter accounted wrong on MM_SHMEMPAGES because we will zap the ptes > > without decreasing the counters for the migrating entries. I have no unit test > > to prove it as I don't know an easy way to trigger this condition, though. > > > > Besides, the optimization itself is already confusing IMHO to me in a few points: > > I've spent a bit of time looking at this and think it would be good to get > cleaned up as I've found it hard to follow in the past. What I haven't been > able to confirm is if anything relies on skipping swap entries or not. From > you're description it sounds like skipping swap entries was done as an > optimisation rather than for some functional reason is that correct? Thanks again for looking into this patch, Alistair. I appreciate it a lot. I should say that it's how I understand this, and I could be wrong, that's the major reason why I marked this patch as RFC. As I mentioned this behavior existed in the 1st commit of git history of Linux, that's the time when there's no special swap entries at all but all the swap entries are "real" swap entries for anonymous. That's why I think it should be an optimization because when previously zap_details (along with zap_details->mapping in the old code) is non-null, and that's definitely not an anonymous page. Then skipping swap entry for file backed memory sounds like a good optimization. However after that we've got all kinds of swap entries introduced, and as you spotted at least the migration entry should be able to exist to some file backed memory type (shmem). > > > - The wording "skip swap entries" is confusing, because we're not skipping all > > swap entries - we handle device private/exclusive pages before that. > > > > - The skip behavior is enabled as long as zap_details pointer passed over. > > It's very hard to figure that out for a new zap caller because it's unclear > > why we should skip swap entries when we have zap_details specified. > > > > - With modern systems, especially performance critical use cases, swap > > entries should be rare, so I doubt the usefulness of this optimization > > since it should be on a slow path anyway. > > > > - It is not aligned with what we do with huge pmd swap entries, where in > > zap_huge_pmd() we'll do the accounting unconditionally. > > > > This patch drops that trick, so we handle swap ptes coherently. Meanwhile we > > should do the same mapping check upon migration entries too. > > I agree, and I'm not convinced the current handling is very good - if we > skip zapping a migration entry then the page mapping might get restored when > the migration entry is removed. > > In practice I don't think that is a problem as the migration entry target page > will be locked, and if I'm understanding things correctly callers of > unmap_mapping_*() need to have the page(s) locked anyway if they want to be > sure the page is unmapped. But it seems removing the migration entries better > matches the intent and I can't think of a reason why they should be skipped. Exactly, that's what I see this too. I used to think there is a bug for shmem migration (if you still remember I mentioned it in some of my previous patchset cover letters), but then I found migration requires page lock then it's probably not a real bug at all. However that's never a convincing reason to ignore swap entries. I wanted to "ignore" this problem by the "adding a flag to skip swap entry" patch, but as you saw it was very not welcomed anyway, so I have no choice to try find the fundamental reason for skipping swap entries. When I figured I cannot really find any good reason and skipping seems to be even buggy, hence this patch. If this is the right way, the zap pte path can be simplified quite a lot after patch 2 of this series. -- Peter Xu