From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.9 required=3.0 tests=DKIMWL_WL_MED,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_SANE_1, USER_IN_DEF_DKIM_WL autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id CE9E6C35DF2 for ; Tue, 25 Feb 2020 03:46:26 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 83E5F24656 for ; Tue, 25 Feb 2020 03:46:26 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="kCsw87TQ" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 83E5F24656 Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id F24EC6B0003; Mon, 24 Feb 2020 22:46:25 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id ED4146B0005; Mon, 24 Feb 2020 22:46:25 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DEA6A6B0006; Mon, 24 Feb 2020 22:46:25 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0078.hostedemail.com [216.40.44.78]) by kanga.kvack.org (Postfix) with ESMTP id C6CBC6B0003 for ; Mon, 24 Feb 2020 22:46:25 -0500 (EST) Received: from smtpin16.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 79DB2181AC9CB for ; Tue, 25 Feb 2020 03:46:25 +0000 (UTC) X-FDA: 76527261930.16.meat13_3802778deb532 X-HE-Tag: meat13_3802778deb532 X-Filterd-Recvd-Size: 8215 Received: from mail-pl1-f196.google.com (mail-pl1-f196.google.com [209.85.214.196]) by imf03.hostedemail.com (Postfix) with ESMTP for ; Tue, 25 Feb 2020 03:46:24 +0000 (UTC) Received: by mail-pl1-f196.google.com with SMTP id g6so4911831plp.6 for ; Mon, 24 Feb 2020 19:46:24 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:from:to:cc:subject:in-reply-to:message-id:references :user-agent:mime-version; bh=nBKEYujmH34H+BGRlspRZjgflh5FhAi1GM2LK9lHp9Y=; b=kCsw87TQVBAcKM9r518ZN2dY0JnXQLbgsixeuPItAFDTAuz5aQNk+je1mKa7WdsGRL 8d/R06elnlYBbC+yPkshoU+re5Q7XFmwPCF5k5/+Sjt4B4Q+a81bY0D1zt1ZXiKgItrL XYNqnMgeU32ZWBYGF09U51mzFVXb8FSufhT8q3A38NBg2vjBmWb02pNNdiFdhpy/brNY MInuc4adDRI7gZreobYOslgrfqFvh+QvUGpD1akF49oU8BAIddVpzgYY1ToSunW++66R nAzTOB9UV9jwBmUgY+GCx/JWtOh0pS4/8QNMoWlXP2DI7CQt1VfS8BQTwTD1zZd4Ts1s dSqA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:in-reply-to:message-id :references:user-agent:mime-version; bh=nBKEYujmH34H+BGRlspRZjgflh5FhAi1GM2LK9lHp9Y=; b=DsgjH3yCcnAbTmp1+8Lkgt5MXRKE42OO/8De3qC84Npab3dIG223rDFspeqCQEQ7LL h81Bx04YcZx1zd7rFapZ5xi38wgsLMthMKSzCs30vHLvRvf8DWNbIuLG+Lma3WQPiyGv ey2ENytUxeBlljpsrYQ07glChm0DYlIG1XGNaxMZ0ChwKgFblEbnMy19T4E66wBnp8AL NNS7682Ich0MdvEJVRgsBqbu0XRLhoRWvEV4GI9SuIiL8PBkt5ltrZCgqauz+CtuYnkl EX5yWC2ECmpaK1tE4e3xGRciZEGjq5c+47RC2tL3YvYdSVBrOw0uncTQUuj+8dz+28Gp mTRg== X-Gm-Message-State: APjAAAWEQkZ8Od67FkXhO88EAPnuJolLlGw8ac5kWtdoB2T2nhbXCjTK 5qnBILikDtx1G0TnAi4Yji2HOw== X-Google-Smtp-Source: APXvYqzR8NjdGIYtc3wv0nyfI59xBnPCoN8iMgjTvfZXon/zEuw8qQ4yH3Tzya+emU21iHtqHC8mEg== X-Received: by 2002:a17:90a:a617:: with SMTP id c23mr2897794pjq.32.1582602383213; Mon, 24 Feb 2020 19:46:23 -0800 (PST) Received: from [100.112.92.218] ([104.133.9.106]) by smtp.gmail.com with ESMTPSA id z3sm14598681pfz.155.2020.02.24.19.46.22 (version=TLS1 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Mon, 24 Feb 2020 19:46:22 -0800 (PST) Date: Mon, 24 Feb 2020 19:46:02 -0800 (PST) From: Hugh Dickins X-X-Sender: hugh@eggly.anvils To: Yang Shi cc: Hugh Dickins , kirill.shutemov@linux.intel.com, aarcange@redhat.com, akpm@linux-foundation.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [v2 PATCH] mm: shmem: allow split THP when truncating THP partially In-Reply-To: <00f0bb7d-3c25-a65f-ea94-3e2de8e9bcdd@linux.alibaba.com> Message-ID: References: <1575420174-19171-1-git-send-email-yang.shi@linux.alibaba.com> <00f0bb7d-3c25-a65f-ea94-3e2de8e9bcdd@linux.alibaba.com> User-Agent: Alpine 2.11 (LSU 23 2013-08-11) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, 14 Jan 2020, Yang Shi wrote: > On 12/4/19 4:15 PM, Hugh Dickins wrote: > > On Wed, 4 Dec 2019, Yang Shi wrote: > > > > > Currently when truncating shmem file, if the range is partial of THP > > > (start or end is in the middle of THP), the pages actually will just get > > > cleared rather than being freed unless the range cover the whole THP. > > > Even though all the subpages are truncated (randomly or sequentially), > > > the THP may still be kept in page cache. This might be fine for some > > > usecases which prefer preserving THP. > > > > > > But, when doing balloon inflation in QEMU, QEMU actually does hole punch > > > or MADV_DONTNEED in base page size granulairty if hugetlbfs is not used. > > > So, when using shmem THP as memory backend QEMU inflation actually > > > doesn't > > > work as expected since it doesn't free memory. But, the inflation > > > usecase really needs get the memory freed. Anonymous THP will not get > > > freed right away too but it will be freed eventually when all subpages > > > are > > > unmapped, but shmem THP would still stay in page cache. > > > > > > Split THP right away when doing partial hole punch, and if split fails > > > just clear the page so that read to the hole punched area would return > > > zero. > > > > > > Cc: Hugh Dickins > > > Cc: Kirill A. Shutemov > > > Cc: Andrea Arcangeli > > > Signed-off-by: Yang Shi > > > --- > > > v2: * Adopted the comment from Kirill. > > > * Dropped fallocate mode flag, THP split is the default behavior. > > > * Blended Huge's implementation with my v1 patch. TBH I'm not very > > > keen to > > > Hugh's find_get_entries() hack (basically neutral), but without > > > that hack > > Thanks for giving it a try. I'm not neutral about my find_get_entries() > > hack: it surely had to go (without it, I'd have just pushed my own patch). > > I've not noticed anything wrong with your patch, and it's in the right > > direction, but I'm still not thrilled with it. I also remember that I > > got the looping wrong in my first internal attempt (fixed in what I sent), > > and need to be very sure of the try-again-versus-move-on-to-next conditions > > before agreeing to anything. No rush, I'll come back to this in days or > > month ahead: I'll try to find a less gotoey blend of yours and mine. > > Hi Hugh, > > Any update on this one? I apologize for my dreadful unresponsiveness. I've spent the last day trying to love yours, and considering how mine might be improved; but repeatedly arrived at the conclusion that mine is about as nice as we can manage, just needing more comment to dignify it. I did willingly call my find_get_entries() stopping at PageTransCompound a hack; but now think we should just document that behavior and accept it. The contortions of your patch come from the need to release those 14 extra unwanted references: much better not to get them in the first place. Neither of us handle a failed split optimally, we treat every tail as an opportunity to retry: which is good to recover from transient failures, but probably excessive. And we both have to restart the pagevec after each attempt, but at least I don't get 14 unwanted extras each time. What of other find_get_entries() users and its pagevec_lookup_entries() wrapper: does an argument to select this "stop at PageTransCompound" behavior need to be added? No. The pagevec_lookup_entries() calls from mm/truncate.c prefer the new behavior - evicting the head from page cache removes all the tails along with it, so getting the tails a waste of time there too, just as it was in shmem_undo_range(). Whereas shmem_unlock_mapping() and shmem_seek_hole_data(), as they stand, are not removing pages from cache, but actually wanting to plod through the tails. So those two would be slowed a little, while the pagevec collects 1 instead of 15 pages. However: if we care about those two at all, it's clear that we should speed them up, by noticing the PageTransCompound case and accelerating over it, instead of plodding through the tails. Since we haven't bothered with that optimization yet, I'm not very worried to slow them down a little until it's done. I must take a look at where we stand with tmpfs 64-bit ino tomorrow, then recomment my shmem_punch_compound() patch and post it properly, probably day after. (Reviewing it, I see I have a "page->index + HPAGE_PMD_NR <= end" test which needs correcting - I tend to live in the past, before v4.13 doubled the 32-bit MAX_LFS_FILESIZE.) I notice that this thread has veered off into QEMU ballooning territory: which may indeed be important, but there's nothing at all that I can contribute on that. I certainly do not want to slow down anything important, but remain convinced that the correct filesystem implementation for punching a hole is to punch a hole. Hugh