From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.9 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1, USER_IN_DEF_DKIM_WL autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 03E90C43463 for ; Sun, 20 Sep 2020 00:17:15 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 668D320DD4 for ; Sun, 20 Sep 2020 00:17:14 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="uvYqGsdB" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 668D320DD4 Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id B5FBD6B0003; Sat, 19 Sep 2020 20:17:13 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B0E5C6B0055; Sat, 19 Sep 2020 20:17:13 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A4CE56B005A; Sat, 19 Sep 2020 20:17:13 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0026.hostedemail.com [216.40.44.26]) by kanga.kvack.org (Postfix) with ESMTP id 7EB726B0003 for ; Sat, 19 Sep 2020 20:17:13 -0400 (EDT) Received: from smtpin25.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 4BF79181AEF1E for ; Sun, 20 Sep 2020 00:17:13 +0000 (UTC) X-FDA: 77281525146.25.nest73_430f56727138 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin25.hostedemail.com (Postfix) with ESMTP id 2B8AA1804E3A8 for ; Sun, 20 Sep 2020 00:17:13 +0000 (UTC) X-HE-Tag: nest73_430f56727138 X-Filterd-Recvd-Size: 7810 Received: from mail-ot1-f68.google.com (mail-ot1-f68.google.com [209.85.210.68]) by imf27.hostedemail.com (Postfix) with ESMTP for ; Sun, 20 Sep 2020 00:17:12 +0000 (UTC) Received: by mail-ot1-f68.google.com with SMTP id n61so9010964ota.10 for ; Sat, 19 Sep 2020 17:17:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:from:to:cc:subject:in-reply-to:message-id:references :user-agent:mime-version; bh=oC/hVouNsJRqHx4d5Y9JP014dEn9+Y6OJHXD1JMwOVc=; b=uvYqGsdBbo/3VhAtUJWqpcFhbWrdNzU6D07c6leInVPEn6MC9bfcjBvXT6SNlz1ihI ZKFPlEzB8s3kmqwIN06anOuZSRlxlu5BiKM2WxXq+/iXM08j8tSHdXGN03ZcZBDld6uP ZVp6nCUOAMpQ/zeEM2/ehE/HPbVxtB8Q1tuBaZKb8zyA8Wpt7OZ+LYJEb8haDro4uyHw xWLZ91nxMJ0RvKqJnZpzQcA8SQLmbhpIghgEQfS2HsUjtuunuIbj3rvWBCjyO0mmVvay SHMpP33SaIMycb3MRIOZ+EkrMzNP1pJG1HJVUe0JBeQnRcgNHcTRhl+70raxwghlj1q3 mnMA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:in-reply-to:message-id :references:user-agent:mime-version; bh=oC/hVouNsJRqHx4d5Y9JP014dEn9+Y6OJHXD1JMwOVc=; b=YFVF4LxdCRTCDg+2ChP2dzsXBhXfMcr4vBMEF3+tAWnTknWoXdOobPpAgASSNh2D98 ieZzBsBwy1vm5G8+/QCOp+xYfCefNJ2VI+F2PbKZtGIGTVFI3BLhVjCLlJsSC4O8XR8K vKUsJ/28qTN0a+Pt0k5elJ6K4vkELV7h8DnGEHn2lyf8Snc58eiVlpP2nT2kf1lgYBEE 5pFcL5KBYkoe1LfLworzZYrPgLehnpxjyx+qz/baGEs4BZy/VRSdwRC2BVq2ZyDGdwlb KqhOgKLSql2vq0kydxgbXwVgcJDz7cEm1tvZirzn+QXtsXljS5soyF9zP3FvW2eYhpQA WERQ== X-Gm-Message-State: AOAM533U5rjF4mYQq4m8kUoTJs9iVDply8sD1xfjGRncJUvd6M61L1Dx ApMjjzCwbmD4eRYmFbC8VmZ9jA== X-Google-Smtp-Source: ABdhPJy/cuAdvDzRlEC3vV8Dz7gY2Gl8GZbB4r0wkEX83f510JjLdg7hd38/vlWCJIBTkoHDqmtofA== X-Received: by 2002:a05:6830:14d9:: with SMTP id t25mr28835098otq.188.1600561031883; Sat, 19 Sep 2020 17:17:11 -0700 (PDT) Received: from eggly.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id g23sm6506479ooh.45.2020.09.19.17.17.09 (version=TLS1 cipher=ECDHE-ECDSA-AES128-SHA bits=128/128); Sat, 19 Sep 2020 17:17:10 -0700 (PDT) Date: Sat, 19 Sep 2020 17:16:57 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@eggly.anvils To: Matthew Wilcox cc: Hugh Dickins , Andrew Morton , alex.shi@linux.alibaba.com, cai@lca.pw, hannes@cmpxchg.org, linux-mm@kvack.org, mhocko@suse.com, mike.kravetz@oracle.com, mm-commits@vger.kernel.org, shakeelb@google.com, shy828301@gmail.com, stable@vger.kernel.org, torvalds@linux-foundation.org Subject: Re: [patch 04/15] shmem: shmem_writepage() split unlikely i915 THP In-Reply-To: <20200919161847.GN32101@casper.infradead.org> Message-ID: References: <20200918211925.7e97f0ef63d92f5cfe5ccbc5@linux-foundation.org> <20200919042009.bomzxmrg7%akpm@linux-foundation.org> <20200919044408.GL32101@casper.infradead.org> <20200919161847.GN32101@casper.infradead.org> User-Agent: Alpine 2.11 (LSU 23 2013-08-11) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Sat, 19 Sep 2020, Matthew Wilcox wrote: > On Fri, Sep 18, 2020 at 10:44:32PM -0700, Hugh Dickins wrote: > > It behaves a lot better with this patch in than without it; but you're > > right, only the head will get written to swap, and the tails left in > > memory; with dirty cleared, so they may be left indefinitely (I've > > not yet looked to see when if ever PageDirty might get set later). > > > > Hmm. It may just be a matter of restyling the i915 code with > > > > if (!page_mapped(page)) { > > clear_page_dirty_for_io(page); > > > > but I don't want to rush to that conclusion - there might turn > > out to be a good way of doing it at the shmem_writepage() end, but > > probably only hacks available. I'll mull it over: it deserves some > > thought about what would suit, if a THP arrived here some other way. > > I think the ultimate solution is to do as I have done for iomap and make > ->writepage handle arbitrary sized pages. However, I don't know the > swap I/O path particularly well, and I would rather not learn it just yet. Understood ;) > > How about this for a band-aid until we sort that out properly? Just mark > the page as dirty before splitting it so subsequent iterations see the > subpages as dirty. This is certainly much better than my original, and I've had no problem in running with it (briefly); and it's what I'll now keep in my testing tree - thanks. But it is more obviously a hack, or as you put it, a band-aid: fit for Linus's tree? This issue has only come up with i915 and shmem_enabled "force", nobody has been bleeding except me: but if it's something that impedes or may impede your own testing, or that you feel should go in anyway, say so and I'll push your new improved version to akpm (adding your signoff to mine). > Arguably, we should use set_page_dirty() instead of SetPageDirty, My position on that has changed down the years: I went through a phase of replacing shmem.c's SetPageDirtys by set_page_dirtys, with the idea that its "if (!PageDirty)" is good to avoid setting cacheline dirty. Then Spectre changed the balance, so now I'd rather avoid the indirect function call, and go with your SetPageDirty. But the bdi flags are such that they do the same thing, and if they ever diverge, then we make the necessary changes at that time. > but I don't think i915 cares. In particular, it uses > an untagged iteration instead of searching for PAGECACHE_TAG_DIRTY. PAGECACHE_TAG_DIRTY comes with bdi flags that shmem does not use: with one exception, every shmem page is always dirty (since its swap is freed as soon as the page is moved from swap cache to file cache); unless I'm forgetting something (like the tails in my patch!), the only exception is a page allocated to a hole on fault (after which a write fault will soon set pte dirty). So I didn't quite get what i915 was doing with its set_page_dirty()s: but very probably being a good citizen, marking when it exposed the page to changes itself, making no assumption of what shmem.c does. It would be good to have a conversation with i915 guys about huge pages sometime (they forked off their own mount point in i915_gemfs_init(), precisely in order to make use of huge pages, but couldn't get them working, so removed the option to ask for them, but kept the separate mount point. But not a conversation I'll be responsive on at present.) Hugh > > diff --git a/mm/shmem.c b/mm/shmem.c > index 271548ca20f3..6231207ab1eb 100644 > --- a/mm/shmem.c > +++ b/mm/shmem.c > @@ -1362,8 +1362,21 @@ static int shmem_writepage(struct page *page, struct writeback_control *wbc) > swp_entry_t swap; > pgoff_t index; > > - VM_BUG_ON_PAGE(PageCompound(page), page); > BUG_ON(!PageLocked(page)); > + > + /* > + * If /sys/kernel/mm/transparent_hugepage/shmem_enabled is "force", > + * then drivers/gpu/drm/i915/gem/i915_gem_shmem.c gets huge pages, > + * and its shmem_writeback() needs them to be split when swapping. > + */ > + if (PageTransCompound(page)) { > + /* Ensure the subpages are still dirty */ > + SetPageDirty(page); > + if (split_huge_page(page) < 0) > + goto redirty; > + ClearPageDirty(page); > + } > + > mapping = page->mapping; > index = page->index; > inode = mapping->host;