From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.2 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8786CC47258 for ; Tue, 5 May 2020 14:04:06 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 48E7220735 for ; Tue, 5 May 2020 14:04:06 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 48E7220735 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=de.ibm.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id DBA998E00D1; Tue, 5 May 2020 10:04:05 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D442C8E00C0; Tue, 5 May 2020 10:04:05 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C0C438E00D1; Tue, 5 May 2020 10:04:05 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0081.hostedemail.com [216.40.44.81]) by kanga.kvack.org (Postfix) with ESMTP id A8A2D8E00C0 for ; Tue, 5 May 2020 10:04:05 -0400 (EDT) Received: from smtpin17.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 6E19A19B13 for ; Tue, 5 May 2020 14:04:05 +0000 (UTC) X-FDA: 76782834450.17.net22_40afdb1c8ba16 X-HE-Tag: net22_40afdb1c8ba16 X-Filterd-Recvd-Size: 11420 Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by imf01.hostedemail.com (Postfix) with ESMTP for ; Tue, 5 May 2020 14:04:04 +0000 (UTC) Received: from pps.filterd (m0098410.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.42/8.16.0.42) with SMTP id 045E3dOg071480; Tue, 5 May 2020 10:03:57 -0400 Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com with ESMTP id 30u8sgsyya-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 05 May 2020 10:03:57 -0400 Received: from m0098410.ppops.net (m0098410.ppops.net [127.0.0.1]) by pps.reinject (8.16.0.36/8.16.0.36) with SMTP id 045E3uaf073628; Tue, 5 May 2020 10:03:57 -0400 Received: from ppma02fra.de.ibm.com (47.49.7a9f.ip4.static.sl-reverse.com [159.122.73.71]) by mx0a-001b2d01.pphosted.com with ESMTP id 30u8sgsybh-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 05 May 2020 10:03:56 -0400 Received: from pps.filterd (ppma02fra.de.ibm.com [127.0.0.1]) by ppma02fra.de.ibm.com (8.16.0.27/8.16.0.27) with SMTP id 045E0B3j017794; Tue, 5 May 2020 14:03:06 GMT Received: from b06cxnps4076.portsmouth.uk.ibm.com (d06relay13.portsmouth.uk.ibm.com [9.149.109.198]) by ppma02fra.de.ibm.com with ESMTP id 30s0g5aw53-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 05 May 2020 14:03:06 +0000 Received: from d06av25.portsmouth.uk.ibm.com (d06av25.portsmouth.uk.ibm.com [9.149.105.61]) by b06cxnps4076.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 045E31PZ57868530 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 5 May 2020 14:03:01 GMT Received: from d06av25.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id B427011C05C; Tue, 5 May 2020 14:03:01 +0000 (GMT) Received: from d06av25.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id A9A7311C050; Tue, 5 May 2020 14:03:00 +0000 (GMT) Received: from oc7455500831.ibm.com (unknown [9.145.49.139]) by d06av25.portsmouth.uk.ibm.com (Postfix) with ESMTP; Tue, 5 May 2020 14:03:00 +0000 (GMT) Subject: Re: [PATCH v2 1/1] fs/splice: add missing callback for inaccessible pages From: Christian Borntraeger To: Ulrich Weigand , Dave Hansen Cc: Claudio Imbrenda , viro@zeniv.linux.org.uk, david@redhat.com, akpm@linux-foundation.org, aarcange@redhat.com, linux-mm@kvack.org, frankja@linux.ibm.com, sfr@canb.auug.org.au, jhubbard@nvidia.com, linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org, jack@suse.cz, kirill@shutemov.name, peterz@infradead.org, sean.j.christopherson@intel.com, Ulrich.Weigand@de.ibm.com References: <20200430143825.3534128-1-imbrenda@linux.ibm.com> <1a3f5107-9847-73d4-5059-c6ef9d293551@de.ibm.com> <3d379d9e-241c-ef3b-dcef-20fdd3b8740d@de.ibm.com> <20200504134154.GA21001@oc3748833570.ibm.com> <231da2f1-a6ef-0cf9-7f57-95e8b925997b@intel.com> <20200505135556.GA9920@oc3748833570.ibm.com> Autocrypt: addr=borntraeger@de.ibm.com; prefer-encrypt=mutual; keydata= xsFNBE6cPPgBEAC2VpALY0UJjGmgAmavkL/iAdqul2/F9ONz42K6NrwmT+SI9CylKHIX+fdf J34pLNJDmDVEdeb+brtpwC9JEZOLVE0nb+SR83CsAINJYKG3V1b3Kfs0hydseYKsBYqJTN2j CmUXDYq9J7uOyQQ7TNVoQejmpp5ifR4EzwIFfmYDekxRVZDJygD0wL/EzUr8Je3/j548NLyL 4Uhv6CIPf3TY3/aLVKXdxz/ntbLgMcfZsDoHgDk3lY3r1iwbWwEM2+eYRdSZaR4VD+JRD7p8 0FBadNwWnBce1fmQp3EklodGi5y7TNZ/CKdJ+jRPAAnw7SINhSd7PhJMruDAJaUlbYaIm23A +82g+IGe4z9tRGQ9TAflezVMhT5J3ccu6cpIjjvwDlbxucSmtVi5VtPAMTLmfjYp7VY2Tgr+ T92v7+V96jAfE3Zy2nq52e8RDdUo/F6faxcumdl+aLhhKLXgrozpoe2nL0Nyc2uqFjkjwXXI OBQiaqGeWtxeKJP+O8MIpjyGuHUGzvjNx5S/592TQO3phpT5IFWfMgbu4OreZ9yekDhf7Cvn /fkYsiLDz9W6Clihd/xlpm79+jlhm4E3xBPiQOPCZowmHjx57mXVAypOP2Eu+i2nyQrkapaY IdisDQfWPdNeHNOiPnPS3+GhVlPcqSJAIWnuO7Ofw1ZVOyg/jwARAQABzUNDaHJpc3RpYW4g Qm9ybnRyYWVnZXIgKDJuZCBJQk0gYWRkcmVzcykgPGJvcm50cmFlZ2VyQGxpbnV4LmlibS5j b20+wsF5BBMBAgAjBQJdP/hMAhsDBwsJCAcDAgEGFQgCCQoLBBYCAwECHgECF4AACgkQEXu8 gLWmHHy/pA/+JHjpEnd01A0CCyfVnb5fmcOlQ0LdmoKWLWPvU840q65HycCBFTt6V62cDljB kXFFxMNA4y/2wqU0H5/CiL963y3gWIiJsZa4ent+KrHl5GK1nIgbbesfJyA7JqlB0w/E/SuY NRQwIWOo/uEvOgXnk/7+rtvBzNaPGoGiiV1LZzeaxBVWrqLtmdi1iulW/0X/AlQPuF9dD1Px hx+0mPjZ8ClLpdSp5d0yfpwgHtM1B7KMuQPQZGFKMXXTUd3ceBUGGczsgIMipZWJukqMJiJj QIMH0IN7XYErEnhf0GCxJ3xAn/J7iFpPFv8sFZTvukntJXSUssONnwiKuld6ttUaFhSuSoQg OFYR5v7pOfinM0FcScPKTkrRsB5iUvpdthLq5qgwdQjmyINt3cb+5aSvBX2nNN135oGOtlb5 tf4dh00kUR8XFHRrFxXx4Dbaw4PKgV3QLIHKEENlqnthH5t0tahDygQPnSucuXbVQEcDZaL9 WgJqlRAAj0pG8M6JNU5+2ftTFXoTcoIUbb0KTOibaO9zHVeGegwAvPLLNlKHiHXcgLX1tkjC DrvE2Z0e2/4q7wgZgn1kbvz7ZHQZB76OM2mjkFu7QNHlRJ2VXJA8tMXyTgBX6kq1cYMmd/Hl OhFrAU3QO1SjCsXA2CDk9MM1471mYB3CTXQuKzXckJnxHkHOwU0ETpw8+AEQAJjyNXvMQdJN t07BIPDtbAQk15FfB0hKuyZVs+0lsjPKBZCamAAexNRk11eVGXK/YrqwjChkk60rt3q5i42u PpNMO9aS8cLPOfVft89Y654Qd3Rs1WRFIQq9xLjdLfHh0i0jMq5Ty+aiddSXpZ7oU6E+ud+X Czs3k5RAnOdW6eV3+v10sUjEGiFNZwzN9Udd6PfKET0J70qjnpY3NuWn5Sp1ZEn6lkq2Zm+G 9G3FlBRVClT30OWeiRHCYB6e6j1x1u/rSU4JiNYjPwSJA8EPKnt1s/Eeq37qXXvk+9DYiHdT PcOa3aNCSbIygD3jyjkg6EV9ZLHibE2R/PMMid9FrqhKh/cwcYn9FrT0FE48/2IBW5mfDpAd YvpawQlRz3XJr2rYZJwMUm1y+49+1ZmDclaF3s9dcz2JvuywNq78z/VsUfGz4Sbxy4ShpNpG REojRcz/xOK+FqNuBk+HoWKw6OxgRzfNleDvScVmbY6cQQZfGx/T7xlgZjl5Mu/2z+ofeoxb vWWM1YCJAT91GFvj29Wvm8OAPN/+SJj8LQazd9uGzVMTz6lFjVtH7YkeW/NZrP6znAwv5P1a DdQfiB5F63AX++NlTiyA+GD/ggfRl68LheSskOcxDwgI5TqmaKtX1/8RkrLpnzO3evzkfJb1 D5qh3wM1t7PZ+JWTluSX8W25ABEBAAHCwV8EGAECAAkFAk6cPPgCGwwACgkQEXu8gLWmHHz8 2w//VjRlX+tKF3szc0lQi4X0t+pf88uIsvR/a1GRZpppQbn1jgE44hgF559K6/yYemcvTR7r 6Xt7cjWGS4wfaR0+pkWV+2dbw8Xi4DI07/fN00NoVEpYUUnOnupBgychtVpxkGqsplJZQpng v6fauZtyEcUK3dLJH3TdVQDLbUcL4qZpzHbsuUnTWsmNmG4Vi0NsEt1xyd/Wuw+0kM/oFEH1 4BN6X9xZcG8GYUbVUd8+bmio8ao8m0tzo4pseDZFo4ncDmlFWU6hHnAVfkAs4tqA6/fl7RLN JuWBiOL/mP5B6HDQT9JsnaRdzqF73FnU2+WrZPjinHPLeE74istVgjbowvsgUqtzjPIG5pOj cAsKoR0M1womzJVRfYauWhYiW/KeECklci4TPBDNx7YhahSUlexfoftltJA8swRshNA/M90/ i9zDo9ySSZHwsGxG06ZOH5/MzG6HpLja7g8NTgA0TD5YaFm/oOnsQVsf2DeAGPS2xNirmknD jaqYefx7yQ7FJXXETd2uVURiDeNEFhVZWb5CiBJM5c6qQMhmkS4VyT7/+raaEGgkEKEgHOWf ZDP8BHfXtszHqI3Fo1F4IKFo/AP8GOFFxMRgbvlAs8z/+rEEaQYjxYJqj08raw6P4LFBqozr nS4h0HDFPrrp1C2EMVYIQrMokWvlFZbCpsdYbBI= Message-ID: Date: Tue, 5 May 2020 16:03:00 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.6.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit X-TM-AS-GCONF: 00 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.138,18.0.676 definitions=2020-05-05_08:2020-05-04,2020-05-05 signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 clxscore=1015 priorityscore=1501 adultscore=0 malwarescore=0 bulkscore=0 lowpriorityscore=0 mlxscore=0 impostorscore=0 phishscore=0 suspectscore=0 mlxlogscore=927 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2005050114 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 05.05.20 16:01, Christian Borntraeger wrote: > > > On 05.05.20 15:55, Ulrich Weigand wrote: >> On Tue, May 05, 2020 at 05:34:45AM -0700, Dave Hansen wrote: >>> On 5/4/20 6:41 AM, Ulrich Weigand wrote: >>>> You're right that there is no mechanism to prevent new references, >>>> but that's really never been the goal either. We're simply trying >>>> to ensure that no I/O is ever done on a page that is in the "secure" >>>> (or inaccessible) state. To do so, we rely on the assumption that >>>> all code that starts I/O on a page cache page will *first*: >>>> - mark the page as pending I/O by either taking an extra page >>>> count, or by setting the Writeback flag; then: >>>> - call arch_make_page_accessible(); then: >>>> - start I/O; and only after I/O has finished: >>>> - remove the "pending I/O" marker (Writeback and/or extra ref) >>> >>> Let's ignore writeback for a moment because get_page() is the more >>> general case. The locking sequence is: >>> >>> 1. get_page() (or equivalent) "locks out" a page from converting to >>> inaccessbile, >>> 2. followed by a make_page_accessible() guarantees that the page >>> *stays* accessible until >>> 3. I/O is safe in this region >>> 4. put_page(), removes the "lock out", I/O now unsafe >> >> Yes, exactly. >> >>> They key is, though, the get_page() must happen before >>> make_page_accessible() and *every* place that acquires a new reference >>> needs a make_page_accessible(). >> >> Well, sort of: every place that acquires a new reference *and then >> performs I/O* needs a make_page_accessible(). There seem to be a >> lot of plain get_page() calls that aren't related to I/O. >> >>> try_get_page() is obviously one of those "new reference sites" and it >>> only has one call site outisde of the gup code: generic_pipe_buf_get(), >>> which is effectively patched by the patch that started this thread. The >>> fact that this one oddball site _and_ gup are patched now is a good sign. >>> >>> *But*, I still don't know how that could work nicely: >>> >>>> static inline __must_check bool try_get_page(struct page *page) >>>> { >>>> page = compound_head(page); >>>> if (WARN_ON_ONCE(page_ref_count(page) <= 0)) >>>> return false; >>>> page_ref_inc(page); >>>> return true; >>>> } >>> >>> If try_get_page() collides with a freeze_page_refs(), it'll hit the >>> WARN_ON_ONCE(), which is surely there for a good reason. I'm not sure >>> that warning is _actually_ valid since freeze_page_refs() isn't truly a >>> 0 refcount. But, the fact that this hasn't been encountered means that >>> the testing here is potentially lacking. >> >> This is indeed interesting. In particular if you compare try_get_page >> with try_get_compound_head in gup.c, which does instead: >> >> if (WARN_ON_ONCE(page_ref_count(head) < 0)) >> return NULL; >> >> which seems more reasonable to me, given the presence of the >> page_ref_freeze method. So I'm not sure why try_get_page has <= 0. > > > Just looked at > commit 88b1a17dfc3ed7728316478fae0f5ad508f50397 mm: add 'try_get_page()' helper function > > which says: > Also like 'get_page()', you can't use this function unless you already > had a reference to the page. The intent is that you can use this > exactly like get_page(), but in situations where you want to limit the > maximum reference count. > > The code currently does an unconditional WARN_ON_ONCE() if we ever hit > the reference count issues (either zero or negative), as a notification > that the conditional non-increment actually happened. > > If try_get_page must not be called with an existing reference, that means s/not// > that when we call it the page reference is already higher and our freeze > will never succeed. That would imply that we cannot trigger this. No? >