From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 83504EB64DC for ; Mon, 3 Jul 2023 09:48:36 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CFCB78E00A0; Mon, 3 Jul 2023 05:48:35 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id CACB78E007C; Mon, 3 Jul 2023 05:48:35 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B9CD88E00A0; Mon, 3 Jul 2023 05:48:35 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id ABAE58E007C for ; Mon, 3 Jul 2023 05:48:35 -0400 (EDT) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 7A1CA40877 for ; Mon, 3 Jul 2023 09:48:35 +0000 (UTC) X-FDA: 80969825790.04.4E1BF30 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf14.hostedemail.com (Postfix) with ESMTP id 4EC6D100008 for ; Mon, 3 Jul 2023 09:48:33 +0000 (UTC) Authentication-Results: imf14.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf14.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1688377713; a=rsa-sha256; cv=none; b=slz59TC6KITL9ndqLxdZWSfmsrWq7ehKalZUnMNo8EiWa+7KtC+NNmAuYq3i/r2d9kJ61/ mk10bVRuTHpO/U6hzrAy2wYdMy6MovA8nndGsomysfaUxEOR/Q2Ht9LTU+4iVTPpWND5Ks VLfZMADw/+or4ZY0NsH8urzDt9cDO2U= ARC-Authentication-Results: i=1; imf14.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf14.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1688377713; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=akXlpqlxNZ7+9F7MUCQSHOEoQCxYiycEGUMuuJ/ftZw=; b=x/7zVq9dTkwiYMUBtHGywPSUEkBRe0N6gE5eOpoymy1MSo5a0lW2Z6nWkXgGK3OW9/5jUp RAkPxngvq2LMNTXSxIAWvivb1If6wtAEec+tBDSdp4N6O0Gb1UyDVhmQHpjSS5sfW2QtYr YdQZ15NafCJ4Fq9nqxK6+L6uIBxgVu4= Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id CF4EF1FB; Mon, 3 Jul 2023 02:49:14 -0700 (PDT) Received: from [10.57.76.103] (unknown [10.57.76.103]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 0970A3F762; Mon, 3 Jul 2023 02:48:28 -0700 (PDT) Message-ID: Date: Mon, 3 Jul 2023 10:48:27 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:102.0) Gecko/20100101 Thunderbird/102.12.0 Subject: Re: [PATCH v1 11/14] arm64/mm: Wire up PTE_CONT for user mappings To: John Hubbard , Catalin Marinas , Will Deacon , Ard Biesheuvel , Marc Zyngier , Oliver Upton , James Morse , Suzuki K Poulose , Zenghui Yu , Andrey Ryabinin , Alexander Potapenko , Andrey Konovalov , Dmitry Vyukov , Vincenzo Frascino , Andrew Morton , Anshuman Khandual , Matthew Wilcox , Yu Zhao , Mark Rutland Cc: linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org References: <20230622144210.2623299-1-ryan.roberts@arm.com> <20230622144210.2623299-12-ryan.roberts@arm.com> From: Ryan Roberts In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 4EC6D100008 X-Stat-Signature: 6eqsuroxtxts16s8n8o4cbzti79sr6az X-Rspam-User: X-HE-Tag: 1688377713-294173 X-HE-Meta: U2FsdGVkX1/nOqD8ohL4/WZ+T7GwkXPW8h7AGcOIEymQDe71QhsMou01n41Yo27orzB179GDzp/SkDnvH3A4Tss81A/XbHqHFnTrpD2CcLNUuXVMzx+vHLIeqb9Y3h1bFET6ls0dEYmlEli82Wct3qiYgd6brQxUmN925ULKq0ZNCXLC/wbebNdSEEjQvf6tWqM92Yi+fyeiYs2drGzGH8ZbQdI04MYjQ7woAjp2EB1r7xVciYvqjpROccCUMy+R2nNMq5TAWZKSkXiZKDnPbHAg1h+irER0uAd8DzUpmQnpXBMd6XdUP/y7nxpew9zlkJ19HI1JmW64Ml01L39v65EryNFhJSC8KWqoRvVDvRmudSxLi5095lb11qYCy8Fv291HWDoeFbXET2duvizSZTdDOXJSftbKxTGmu8Vrx9z/S1Txix0pPM6UsQ3dlzNw4DMoNxkdgdCq3maaVeJFrrcD/v0pZ20LoFBJwJNuOAe94xVYuT+CTcHv+Zq6idtZb2yLVXVveL77cDiHleyKZA/Wqzn5ZQ3j3zylEqlvYgCdPTkcJTl/svLyRsOy8rY4NR7lS3MZl5fp8xEn2aOyJEB4VnXtLQ+3b3/xDeqT3sLi26zGcswqQ7bsC4llo7CL6ubXtzVfUgIk2w8UwRM2qeXAdjyQFpr3+7HgAuRf/nzvS8cU03VIaPg1DsQHsEIH8vFRxlG0406KjtLXhGtelBSUXqDW9W/bjkACH4cFHqyJY3swW55uZ947v/y6e7NOsVPj95VlhSFg+jnWBtRg7RdqtDZWqPN4tuEqrZTNBj6LF9HhleFsFVAA4jTEzRBxNPeFHc8t2DWEbOe+MpjleDQabl8p3LKB1TLlQlIv0NknCHEUBXasxPU8MXgf8ArRUMcE24MUw26tnMPC+2KrdRw15vHm46D7lm1kGI1p7vjmeQ8txbzssORT8XfA0fqWn1vUnGaKvR94neGjki/ zsSDogvJ wtoFcIG4IbhDrySxiWOuyi3fv/6SBGkF1PhB9uzg67qypikBBL4IFdcGCPca3JsKmUXi9BGEeawSWemLyRr0fSnQRXYJ8jkNcusHDxZbWJ7t8lqsY+K/zcmJoyVH55GfB6Z3OFU6uq/J1ijEEvuEZvmDJR/uXvP5dV6RlY84l9YY2HfUGkDi9XcmkZbASqfL4Wg7ZsrH9yXCvj7qi3v+VxNNNmczlpWlDzeQcvWYsFwgRbvWgzDjjmJ2yXGhuWmg2qKldfyr+elNxuTFzs3BmDKjiAFWQiTyy42PM X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 30/06/2023 02:54, John Hubbard wrote: > On 6/22/23 07:42, Ryan Roberts wrote: >> With the ptep API sufficiently refactored, we can now introduce a new >> "contpte" API layer, which transparently manages the PTE_CONT bit for >> user mappings. Whenever it detects a set of PTEs that meet the >> requirements for a contiguous range, the PTEs are re-painted with the >> PTE_CONT bit. >> >> This initial change provides a baseline that can be optimized in future >> commits. That said, fold/unfold operations (which imply tlb >> invalidation) are avoided where possible with a few tricks for >> access/dirty bit management. >> >> Write-enable and write-protect modifications are likely non-optimal and >> likely incure a regression in fork() performance. This will be addressed >> separately. >> >> Signed-off-by: Ryan Roberts >> --- > > Hi Ryan! > > While trying out the full series from your gitlab features/granule_perf/all > branch, I found it necessary to EXPORT a symbol in order to build this. Thanks for the bug report! > Please see below: > > ... >> + >> +pte_t contpte_ptep_get(pte_t *ptep, pte_t orig_pte) >> +{ >> +    /* >> +     * Gather access/dirty bits, which may be populated in any of the ptes >> +     * of the contig range. We are guarranteed to be holding the PTL, so any >> +     * contiguous range cannot be unfolded or otherwise modified under our >> +     * feet. >> +     */ >> + >> +    pte_t pte; >> +    int i; >> + >> +    ptep = contpte_align_down(ptep); >> + >> +    for (i = 0; i < CONT_PTES; i++, ptep++) { >> +        pte = __ptep_get(ptep); >> + >> +        /* >> +         * Deal with the partial contpte_ptep_get_and_clear_full() case, >> +         * where some of the ptes in the range may be cleared but others >> +         * are still to do. See contpte_ptep_get_and_clear_full(). >> +         */ >> +        if (pte_val(pte) == 0) >> +            continue; >> + >> +        if (pte_dirty(pte)) >> +            orig_pte = pte_mkdirty(orig_pte); >> + >> +        if (pte_young(pte)) >> +            orig_pte = pte_mkyoung(orig_pte); >> +    } >> + >> +    return orig_pte; >> +} > > Here we need something like this, in order to get it to build in all > possible configurations: > > EXPORT_SYMBOL_GPL(contpte_ptep_get); > > (and a corresponding "#include linux/export.h" at the top of the file). > > Because, the static inline functions invoke this routine, above. A quick grep through the drivers directory shows: ptep_get() is used by: - drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c - drivers/misc/sgi-gru/grufault.c - drivers/vfio/vfio_iommu_type1.c - drivers/xen/privcmd.c ptep_set_at() is used by: - drivers/gpu/drm/i915/i915_mm.c - drivers/xen/xlate_mmu.c None of the other symbols are called, but I guess it is possible that out of tree modules are calling others. So on the basis that these symbols were previously pure inline, I propose to export all the contpte_* symbols using EXPORT_SYMBOL() so that anything that was previously calling them successfully continue to do so. Will include in v2. Thanks, Ryan > > thanks,