From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.3 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS, USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 83EEEC4CEC4 for ; Thu, 19 Sep 2019 15:59:28 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 392F821D79 for ; Thu, 19 Sep 2019 15:59:28 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 392F821D79 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.ibm.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id B9EE56B0386; Thu, 19 Sep 2019 11:59:27 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B4E586B0387; Thu, 19 Sep 2019 11:59:27 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9EF0C6B0389; Thu, 19 Sep 2019 11:59:27 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0178.hostedemail.com [216.40.44.178]) by kanga.kvack.org (Postfix) with ESMTP id 6D1316B0386 for ; Thu, 19 Sep 2019 11:59:27 -0400 (EDT) Received: from smtpin09.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with SMTP id 18DC822007 for ; Thu, 19 Sep 2019 15:59:27 +0000 (UTC) X-FDA: 75952129974.09.goose72_6f839c381e531 X-HE-Tag: goose72_6f839c381e531 X-Filterd-Recvd-Size: 15764 Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by imf28.hostedemail.com (Postfix) with ESMTP for ; Thu, 19 Sep 2019 15:59:26 +0000 (UTC) Received: from pps.filterd (m0098410.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.27/8.16.0.27) with SMTP id x8JFw0Cl056129 for ; Thu, 19 Sep 2019 11:59:25 -0400 Received: from e06smtp02.uk.ibm.com (e06smtp02.uk.ibm.com [195.75.94.98]) by mx0a-001b2d01.pphosted.com with ESMTP id 2v4cnfrd52-1 (version=TLSv1.2 cipher=AES256-GCM-SHA384 bits=256 verify=NOT) for ; Thu, 19 Sep 2019 11:59:24 -0400 Received: from localhost by e06smtp02.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Thu, 19 Sep 2019 16:59:22 +0100 Received: from b06cxnps4076.portsmouth.uk.ibm.com (9.149.109.198) by e06smtp02.uk.ibm.com (192.168.101.132) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; (version=TLSv1/SSLv3 cipher=AES256-GCM-SHA384 bits=256/256) Thu, 19 Sep 2019 16:59:18 +0100 Received: from d06av23.portsmouth.uk.ibm.com (d06av23.portsmouth.uk.ibm.com [9.149.105.59]) by b06cxnps4076.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id x8JFxHD652232332 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 19 Sep 2019 15:59:17 GMT Received: from d06av23.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id B1AB4A4040; Thu, 19 Sep 2019 15:59:17 +0000 (GMT) Received: from d06av23.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 4E19CA4059; Thu, 19 Sep 2019 15:59:17 +0000 (GMT) Received: from pomme.tls.ibm.com (unknown [9.101.4.33]) by d06av23.portsmouth.uk.ibm.com (Postfix) with ESMTP; Thu, 19 Sep 2019 15:59:17 +0000 (GMT) Subject: Re: [PATCH v2 1/2] powperc/mm: read TLB Block Invalidate Characteristics To: Michael Ellerman , benh@kernel.crashing.org, paulus@samba.org, aneesh.kumar@linux.ibm.com, npiggin@gmail.com, linuxppc-dev@lists.ozlabs.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org References: <20190916095543.17496-1-ldufour@linux.ibm.com> <20190916095543.17496-2-ldufour@linux.ibm.com> <87zhj1vhz6.fsf@mpe.ellerman.id.au> From: Laurent Dufour Date: Thu, 19 Sep 2019 17:59:16 +0200 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:60.0) Gecko/20100101 Thunderbird/60.9.0 MIME-Version: 1.0 In-Reply-To: <87zhj1vhz6.fsf@mpe.ellerman.id.au> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US X-TM-AS-GCONF: 00 x-cbid: 19091915-0008-0000-0000-00000318818E X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 19091915-0009-0000-0000-00004A37079B Message-Id: X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:,, definitions=2019-09-19_05:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 suspectscore=0 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 mlxscore=0 impostorscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1908290000 definitions=main-1909190145 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Le 18/09/2019 =C3=A0 15:42, Michael Ellerman a =C3=A9crit=C2=A0: > Hi Laurent, >=20 > Comments below ... Hi Michael, Thanks for your review. One comment below (at the end). >=20 > Laurent Dufour writes: >> The PAPR document specifies the TLB Block Invalidate Characteristics w= hich >> tells for each couple segment base page size, actual page size, the si= ze of > ^ > "pair of" again >=20 >> the block the hcall H_BLOCK_REMOVE is supporting. > ^ > "supports" is fine. >=20 >> These characteristics are loaded at boot time in a new table hblkr_siz= e. >> The table is appart the mmu_psize_def because this is specific to the > ^ > "separate from" >=20 >> pseries architecture. > ^ > platform >> >> A new init service, pseries_lpar_read_hblkr_characteristics() is added= to > ^ > function >=20 >> read the characteristics. In that function, the size of the buffer is = set >> to twice the number of known page size, plus 10 bytes to ensure we hav= e >> enough place. This new init function is called from pSeries_setup_arch= (). > ^ > space >> >> Signed-off-by: Laurent Dufour >> --- >> .../include/asm/book3s/64/tlbflush-hash.h | 1 + >> arch/powerpc/platforms/pseries/lpar.c | 138 ++++++++++++++++= ++ >> arch/powerpc/platforms/pseries/setup.c | 1 + >> 3 files changed, 140 insertions(+) >> >> diff --git a/arch/powerpc/include/asm/book3s/64/tlbflush-hash.h b/arch= /powerpc/include/asm/book3s/64/tlbflush-hash.h >> index 64d02a704bcb..74155cc8cf89 100644 >> --- a/arch/powerpc/include/asm/book3s/64/tlbflush-hash.h >> +++ b/arch/powerpc/include/asm/book3s/64/tlbflush-hash.h >> @@ -117,4 +117,5 @@ extern void __flush_hash_table_range(struct mm_str= uct *mm, unsigned long start, >> unsigned long end); >> extern void flush_tlb_pmd_range(struct mm_struct *mm, pmd_t *pmd, >> unsigned long addr); >> +extern void pseries_lpar_read_hblkr_characteristics(void); >=20 > This doesn't need "extern", and also should go in > arch/powerpc/platforms/pseries/pseries.h as it's local to that director= y. >=20 > You're using "hblkr" in a few places, can we instead make it "hblkrm" - > "rm" is a well known abbreviation for "remove". >=20 >=20 >> diff --git a/arch/powerpc/platforms/pseries/lpar.c b/arch/powerpc/plat= forms/pseries/lpar.c >> index 36b846f6e74e..98a5c2ff9a0b 100644 >> --- a/arch/powerpc/platforms/pseries/lpar.c >> +++ b/arch/powerpc/platforms/pseries/lpar.c >> @@ -56,6 +56,15 @@ EXPORT_SYMBOL(plpar_hcall); >> EXPORT_SYMBOL(plpar_hcall9); >> EXPORT_SYMBOL(plpar_hcall_norets); >> =20 >> +/* >> + * H_BLOCK_REMOVE supported block size for this page size in segment = who's base >> + * page size is that page size. >> + * >> + * The first index is the segment base page size, the second one is t= he actual >> + * page size. >> + */ >> +static int hblkr_size[MMU_PAGE_COUNT][MMU_PAGE_COUNT]; >=20 > Can you make that __ro_after_init, it goes at the end, eg: >=20 > static int hblkr_size[MMU_PAGE_COUNT][MMU_PAGE_COUNT] __ro_after_init; >=20 >> @@ -1311,6 +1320,135 @@ static void do_block_remove(unsigned long numb= er, struct ppc64_tlb_batch *batch, >> (void)call_block_remove(pix, param, true); >> } >> =20 >> +/* >> + * TLB Block Invalidate Characteristics These characteristics define = the size of > ^ > newline before here? >=20 >> + * the block the hcall H_BLOCK_REMOVE is able to process for each cou= ple segment >> + * base page size, actual page size. >> + * >> + * The ibm,get-system-parameter properties is returning a buffer with= the >> + * following layout: >> + * >> + * [ 2 bytes size of the RTAS buffer (without these 2 bytes) ] > ^ > "excluding" >=20 >> + * ----------------- >> + * TLB Block Invalidate Specifiers: >> + * [ 1 byte LOG base 2 of the TLB invalidate block size being specifi= ed ] >> + * [ 1 byte Number of page sizes (N) that are supported for the speci= fied >> + * TLB invalidate block size ] >> + * [ 1 byte Encoded segment base page size and actual page size >> + * MSB=3D0 means 4k segment base page size and actual page s= ize >> + * MSB=3D1 the penc value in mmu_psize_def ] >> + * ... >> + * ----------------- >> + * Next TLB Block Invalidate Specifiers... >> + * ----------------- >> + * [ 0 ] >> + */ >> +static inline void __init set_hblk_bloc_size(int bpsize, int psize, >> + unsigned int block_size) >=20 > "static inline" and __init are sort of contradictory. >=20 > Just make it "static void __init" and the compiler will inline it > anyway, but if it decides not to the section will be correct. >=20 > This one uses "hblk"? Should it be set_hblkrm_block_size() ? >=20 >> +{ >> + if (block_size > hblkr_size[bpsize][psize]) >> + hblkr_size[bpsize][psize] =3D block_size; >> +} >> + >> +/* >> + * Decode the Encoded segment base page size and actual page size. >> + * PAPR specifies with bit 0 as MSB: >> + * - bit 0 is the L bit >> + * - bits 2-7 are the penc value >=20 > Can we just convert those to normal bit ordering for the comment, eg: >=20 > PAPR specifies: > - bit 7 is the L bit > - bits 0-5 are the penc value >=20 >> + * If the L bit is 0, this means 4K segment base page size and actual= page size >> + * otherwise the penc value should be readed. > ^ > "read" ? >> + */ >> +#define HBLKR_L_BIT_MASK 0x80 >=20 > "HBLKRM_L_MASK" would do I think? >=20 >> +#define HBLKR_PENC_MASK 0x3f >> +static inline void __init check_lp_set_hblk(unsigned int lp, >> + unsigned int block_size) >> +{ >> + unsigned int bpsize, psize; >> + >=20 > One blank line is sufficient :) >=20 >> + >> + /* First, check the L bit, if not set, this means 4K */ >> + if ((lp & HBLKR_L_BIT_MASK) =3D=3D 0) { >> + set_hblk_bloc_size(MMU_PAGE_4K, MMU_PAGE_4K, block_size); >> + return; >> + } >> + >> + lp &=3D HBLKR_PENC_MASK; >> + for (bpsize =3D 0; bpsize < MMU_PAGE_COUNT; bpsize++) { >> + struct mmu_psize_def *def =3D &mmu_psize_defs[bpsize]; > ^ > stray space there >> + >> + for (psize =3D 0; psize < MMU_PAGE_COUNT; psize++) { >> + if (def->penc[psize] =3D=3D lp) { >> + set_hblk_bloc_size(bpsize, psize, block_size); >> + return; >> + } >> + } >> + } >> +} >> + >> +#define SPLPAR_TLB_BIC_TOKEN 50 >> +#define SPLPAR_TLB_BIC_MAXLENGTH (MMU_PAGE_COUNT*2 + 10) >=20 > The +10 is just a guess I think? >=20 > If I'm counting right that ends up as 42 bytes, which is not much at > all. You could probably just make it a cache line, 128 bytes, which > should be plenty of space? >=20 >> +void __init pseries_lpar_read_hblkr_characteristics(void) >> +{ >> + int call_status; >=20 > This should be grouped with the other ints below on one line. >=20 >> + unsigned char local_buffer[SPLPAR_TLB_BIC_MAXLENGTH]; >> + int len, idx, bpsize; >> + >> + if (!firmware_has_feature(FW_FEATURE_BLOCK_REMOVE)) { >> + pr_info("H_BLOCK_REMOVE is not supported"); >=20 > That's going to trigger on a lot of older machines, and all KVM VMs, so > just return silently. >=20 >> + return; >> + } >> + >> + memset(local_buffer, 0, SPLPAR_TLB_BIC_MAXLENGTH); >=20 > Here you memset the whole buffer ... >=20 >> + spin_lock(&rtas_data_buf_lock); >> + memset(rtas_data_buf, 0, RTAS_DATA_BUF_SIZE); >> + call_status =3D rtas_call(rtas_token("ibm,get-system-parameter"), 3,= 1, >> + NULL, >> + SPLPAR_TLB_BIC_TOKEN, >> + __pa(rtas_data_buf), >> + RTAS_DATA_BUF_SIZE); >> + memcpy(local_buffer, rtas_data_buf, SPLPAR_TLB_BIC_MAXLENGTH); >=20 > But then here you memcpy over the entire buffer, making the memset abov= e > unnecessary. >=20 >> + local_buffer[SPLPAR_TLB_BIC_MAXLENGTH - 1] =3D '\0'; >> + spin_unlock(&rtas_data_buf_lock); >> + >> + if (call_status !=3D 0) { >> + pr_warn("%s %s Error calling get-system-parameter (0x%x)\n", >> + __FILE__, __func__, call_status); >=20 > __FILE__ and __func__ is pretty verbose for what should be a rare case > (I realise you copied that from existing code). >=20 > pr_warn("Error calling get-system-parameter(%d, ...) (0x%x)\n", > SPLPAR_TLB_BIC_TOKEN, call_status); >=20 > Should do I think? >=20 >> + return; >> + } >> + >> + /* >> + * The first two (2) bytes of the data in the buffer are the length = of >> + * the returned data, not counting these first two (2) bytes. >> + */ >> + len =3D local_buffer[0] * 256 + local_buffer[1] + 2; >=20 > This took me a minute to parse. I guess I was expecting something more > like: > len =3D be16_to_cpu(local_buffer) + 2; >=20 > ?? >=20 >> + if (len >=3D SPLPAR_TLB_BIC_MAXLENGTH) { >=20 > I think it's allowed for len to be =3D=3D SPLPAR_TLB_BIC_MAXLENGTH isn'= t it? >=20 >> + pr_warn("%s too large returned buffer %d", __func__, len); >> + return; >> + } >> + >> + idx =3D 2; >> + while (idx < len) { >> + unsigned int block_size =3D local_buffer[idx++]; >=20 > This is a little subtle, local_buffer is char, so no endian issue, but > you're loading that into an unsigned int which makes it look like there > should be an endian swap. >=20 > Maybe instead do: > u8 block_shift =3D local_buffer[idx++]; > u32 block_size; >=20 > if (!block_shift) > break; >=20 > block_size =3D 1 << block_shift; >=20 > ?? >=20 >> + unsigned int npsize; >> + >> + if (!block_size) >> + break; >> + >> + block_size =3D 1 << block_size; >> + >> + for (npsize =3D local_buffer[idx++]; npsize > 0; npsize--) >> + check_lp_set_hblk((unsigned int) local_buffer[idx++], >> + block_size); >=20 > This could overflow if npsize is too big. I think you can just ad > "&& idx < len" to the for condition to make it safe? >=20 >> + } >> + >> + for (bpsize =3D 0; bpsize < MMU_PAGE_COUNT; bpsize++) >> + for (idx =3D 0; idx < MMU_PAGE_COUNT; idx++) >> + if (hblkr_size[bpsize][idx]) >> + pr_info("H_BLOCK_REMOVE supports base psize:%d psize:%d block siz= e:%d", >> + bpsize, idx, hblkr_size[bpsize][idx]); >=20 > I think this is probably too verbose, except for debugging. Probably > just remove it? I'd like to keep that displayed because this is the only way to get these= =20 values. I tried to use pr_debug() and rely on a dyndbg kernel command line option= ,=20 but the dynamic debug system is not yet initialised at this time of boot. When a system is booting, those messages are interleaved with other user=20 cryptic messages ;): [ 0.000000] numa: NODE_DATA [mem 0x7ffdd7000-0x7ffddbfff] [ 0.000000] numa: NODE_DATA(0) on node 2 [ 0.000000] numa: NODE_DATA [mem 0x7ffdd2000-0x7ffdd6fff] [ 0.000000] rfi-flush: fallback displacement flush available [ 0.000000] rfi-flush: mttrig type flush available [ 0.000000] count-cache-flush: full software flush sequence enabled. [ 0.000000] stf-barrier: eieio barrier available [ 0.000000] lpar: H_BLOCK_REMOVE supports base psize:0 psize:0 block s= ize:8 [ 0.000000] lpar: H_BLOCK_REMOVE supports base psize:0 psize:2 block s= ize:8 [ 0.000000] lpar: H_BLOCK_REMOVE supports base psize:0 psize:10 block = size:8 [ 0.000000] lpar: H_BLOCK_REMOVE supports base psize:2 psize:2 block s= ize:8 [ 0.000000] lpar: H_BLOCK_REMOVE supports base psize:2 psize:10 block = size:8 [ 0.000000] PPC64 nvram contains 15360 bytes [ 0.000000] barrier-nospec: using ORI speculation barrier [ 0.000000] Zone ranges: [ 0.000000] Normal [mem 0x0000000000000000-0x00000007ffffffff] [ 0.000000] Movable zone start for each node [ 0.000000] Early memory node ranges [ 0.000000] node 2: [mem 0x0000000000000000-0x00000007ffffffff] [ 0.000000] Could not find start_pfn for node 0 Is this an issue to keep those lines ? Cheers