From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-0.7 required=3.0 tests=FREEMAIL_FORGED_FROMDOMAIN, FREEMAIL_FROM,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id DD776CA9ECF for ; Mon, 4 Nov 2019 04:34:36 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 85CF321D7D for ; Mon, 4 Nov 2019 04:34:36 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 85CF321D7D Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=sina.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 0B0CB6B0005; Sun, 3 Nov 2019 23:34:36 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 0622C6B0006; Sun, 3 Nov 2019 23:34:36 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EB9986B0007; Sun, 3 Nov 2019 23:34:35 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0053.hostedemail.com [216.40.44.53]) by kanga.kvack.org (Postfix) with ESMTP id D8A646B0005 for ; Sun, 3 Nov 2019 23:34:35 -0500 (EST) Received: from smtpin08.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with SMTP id 578244DB8 for ; Mon, 4 Nov 2019 04:34:35 +0000 (UTC) X-FDA: 76117328910.08.twist06_ac46f1afbd32 X-HE-Tag: twist06_ac46f1afbd32 X-Filterd-Recvd-Size: 4954 Received: from mail3-164.sinamail.sina.com.cn (mail3-164.sinamail.sina.com.cn [202.108.3.164]) by imf44.hostedemail.com (Postfix) with SMTP for ; Mon, 4 Nov 2019 04:34:33 +0000 (UTC) Received: from unknown (HELO localhost.localdomain)([221.219.0.223]) by sina.com with ESMTP id 5DBFAA540002E73C; Mon, 4 Nov 2019 12:34:30 +0800 (CST) X-Sender: hdanton@sina.com X-Auth-ID: hdanton@sina.com X-SMAIL-MID: 77901215075430 From: Hillf Danton To: John Hubbard Cc: Hillf Danton , linux-mm , Andrew Morton , linux-kernel , Vlastimil Babka , Jan Kara , Mel Gorman , Jerome Glisse , Dan Williams , Ira Weiny , Christoph Hellwig , Jonathan Corbet Subject: Re: [RFC] mm: gup: add helper page_try_gup_pin(page) Date: Mon, 4 Nov 2019 12:34:20 +0800 Message-Id: <20191104043420.15648-1-hdanton@sina.com> In-Reply-To: <20191103112113.8256-1-hdanton@sina.com> References: <20191103112113.8256-1-hdanton@sina.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Sun, 3 Nov 2019 12:20:10 -0800 John Hubbard wrote: > On 11/3/19 3:21 AM, Hillf Danton wrote: > >=20 > > A helper is added for mitigating the gup issue described at > > https://lwn.net/Articles/784574/. It is unsafe to write out > > a dirty page that is already gup pinned for DMA. > >=20 > > In the current writeback context, dirty pages are written out with > > no detecting whether they have been gup pinned; Nor mark to keep > > gupers off. In the gup context, file pages can be pinned with other > > gupers and writeback ignored. > >=20 > > The factor, that no room, supposedly even one bit, in the current > > page struct can be used for tracking gupers, makes the issue harder > > to tackle. >=20 > Well, as long as we're counting bits, I've taken 21 bits (!) to track > "gupers". :) More accurately, I'm sharing 31 bits with get_page()...pl= ease Would you please specify the reasoning of tracking multiple gupers for a dirty page? Do you mean that it is all fine for guper-A to add changes to guper-B's data without warning and vice versa? > see my recently posted patchset for tracking dma-pinned pages: >=20 > https://lore.kernel.org/r/20191030224930.3990755-1-jhubbard@nvidia.com >=20 > Once that is merged, you will have this available: >=20 > static inline bool page_dma_pinned(struct page *page); >=20 > ...which will reliably track dma-pinned pages. >=20 > After that, we still need to convert a some more call sites (block/bio=20 > in particular) to the new pin_user_pages()/put_user_page() system, in=20 > order for filesystems to take advantage of it, but this approach has=20 > the advantage of actually tracking such pages, rather than faking it by= =20 > hoping that there is only one gup caller at a time. >=20 >=20 > >=20 > > The approach here is, because it makes no sense to allow a file page > > to have multiple gupers at the same time, looking to make gupers >=20 > ohhh...no, that's definitely not a claim you can make. >=20 >=20 > > mutually exclusive, and then guper's singulairty helps to tell if a > > guper is existing by staring at the change in page count. > >=20 > > The result of that sigularity is not yet 100% correct but something > > of "best effort" as the effect of random get_page() is perhaps also > > folded in it. > > It is assumed the best effort is feasible/acceptable in practice > > without the the cost of growing the page struct size by one bit, > > were it true that something similar has been applied to the page > > migrate and reclaim contexts for a while. > >=20 > > With the helper in place, we skip writing out a dirty page if a > > guper is detected; On gupping, we give up pinning a file page due > > to writeback or losing the race to become a guper. > >=20 > > The end result is, no gup-pinned page will be put under writeback. >=20 > I think you must have missed the many contentious debates about the > tension between gup-pinned pages, and writeback. File systems can't > just ignore writeback in all cases. This patch leads to either > system hangs or filesystem corruption, in the presence of long-lasting > gup pins. The current risk of data corruption due to writeback with long-lived gup references all ignored is zeroed out by detecting gup-pinned dirty pages and skipping them; that may lead to problems you mention above. Though I doubt anything helpful about it can be expected from fs in near future, we have options for instance that gupers periodically release their references and re-pin pages after data sync the same way as the current flusher does. > Really, this won't work. sorry. >=20 >=20 > thanks, >=20 > John Hubbard > NVIDIA