[PATCH v3 00/12] J-core J2 cpu and SoC peripherals support

classic Classic list List threaded Threaded
30 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH v3 05/12] of: add J-Core SPI master bindings

Rob Herring-3
On Wed, May 25, 2016 at 05:43:03AM +0000, Rich Felker wrote:
> Signed-off-by: Rich Felker <[hidden email]>
> ---
>  .../devicetree/bindings/spi/jcore,spi.txt          | 23 ++++++++++++++++++++++
>  1 file changed, 23 insertions(+)
>  create mode 100644 Documentation/devicetree/bindings/spi/jcore,spi.txt

Acked-by: Rob Herring <[hidden email]>
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH v3 00/12] J-core J2 cpu and SoC peripherals support

Rich Felker-2
In reply to this post by Mark Brown-2
On Wed, May 25, 2016 at 10:54:44AM +0100, Mark Brown wrote:
> On Wed, May 25, 2016 at 05:43:02AM +0000, Rich Felker wrote:
>
> > As arch/sh co-maintainer my intent is to include as much as possible
> > in my pull request for the linux-sh tree. If there are parts outside
> > of arch/sh that can be included in this, please let me know. I'm not
>
> Do *not* include the SPI driver, you shouldn't be including any drivers
> unless it's been explicitly discussed with the subsystem maintainers.

See the "please let me know". I thought this was plenty clear that I
was asking for permission for including things outside of arch/sh, and
that short of getting an ack, the default permission is no. You also
snipped the part of my message that mentioned the specific subsystems
I was asking about (which were non-SPI because you already made quite
a point about not taking the SPI driver):

> > clear yet on what the right path to upstream is for the clocksource
> > and irq drivers that are currently only useful/interesting for one
> > arch, or for the DT binding patches. Even if some drivers are delayed
> > [...]

> Quite aside from the fact that like Geert says drivers are expected to
> go through the subsystem trees to repeat what I said last time it wasn't
> posted until after the merge window and we're now a few days before the
> end of the merge window and a new version is being posted.  The
> turnaround times you are demanding on review are unreasonable - people
> get busy, have holidays and so on - and you really need to pay attention
> to what people are telling you about the process or you're just going to
> annoy people.

If you can't review and ack the code on short notice, that's fine;
just say so. There's no need to be overerly hostile about it. I've
gotten arch/sh patches during the merge window before and I try to be
polite with the contributor and ask if there's something seriously
broken that would be improved by my making an effort to check it at
the last minute, or it if can happily wait until next time.

Being that the driver in question here is for a new platform that was
not previously supported upstream and has zero chance of breaking
anything else, and that its inclusion would be a big plus for users of
the platform, I don't see any reason for you making such a big deal
out of it unless enforcing policy for its own sake makes you feel
good, but I have better things to do than argue about it.

> > arch, or for the DT binding patches. Even if some drivers are delayed
> > going upstream, I would really like to get DT bindings acked and
> > ideally merged, because we want to go ahead with moving the DTB into
> > J2 boot rom where it belongs, and that should only happen with stable
>
> If you want people to review DT bindings you're going to need to submit
> them.

I have, twice now, and I Cc'd the the linux-spi list too on v3 for the
spi binding. Rob Herring acked it.

Rich
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH v3 02/12] of: add J-Core cpu bindings

Rich Felker-2
In reply to this post by Mark Rutland
On Wed, May 25, 2016 at 11:22:15AM +0100, Mark Rutland wrote:

> > +Optional properties:
> > +
> > +- enable-method: Required only for SMP systems. If present, must be
> > +  "jcore,spin-table".
> > +
> > +
> > +--------------------
> > +Individual cpu nodes
> > +--------------------
> > +
> > +Required properties:
> > +
> > +- device_type: Must be "cpu".
> > +
> > +- compatible: Must be "jcore,j2".
> > +
> > +- reg: Must be 0 on uniprocessor systems, or the sequential, zero-based
> > +  hardware cpu id on SMP systems.
> > +
> > +Optional properties:
> > +
> > +- clock-frequency: Clock frequency of the cpu in Hz.
> > +
> > +- cpu-release-addr: Necessary only for secondary processors on SMP systems
> > +  using the "jcore,spin-table" enable method. If present, must consist of
> > +  two cells containing physical addresses. The first cell contains an
> > +  address which, when written, unblocks the secondary cpu. The second cell
> > +  contains an address from which the cpu will read its initial program
> > +  counter when unblocked.
>
> I take it this follows the example of the arm64 spin-table rather than
> the ePAPR spin-table, given the lack of an ePAPR reference or struct
> definition.

Indeed, I wasn't aware of the ePAPR spec for it. Would you prefer that
we use a different name or something?

> From my experience with the arm64 spin-table, I would *strongly*
> recommend that you tighten this up, and define things more thoroughly,
> before a variety of FW/bootloader implementations appear. e.g.
>
> * What initial/invalid value should the location contain? Is zero a
>   valid address that you might want to jump a secondary CPU to?

I believe the hardware is implemented such that just the appearance of
the address for write on the bus unblocks the secondary cpu waiting on
it. As for the address to jump to, this is provided by the kernel and,
for Linux purposes, is always _stext. As I understand the mechanism,
the actual initial PC for secondary cpus is in rom, and the code there
is responsible for loading the application-desired (i.e.
kernel-desired) initial PC and jumping to it.

> * How must the value be written?
>   - Which endianness?

CPU native.

>   - With a single store? Or is there a more involved sequence to prevent
>     the secondary CPU from seeing a torn value?

The start address is just a physical ram address (internal sram) and
how it's written does not matter, because it's only read once the
release write occurs.

> * Must CPUs have a unique cpu-release-addr? I would *strongly* recommend
>   that they do.

There is currently no spec or implementation with more than one
secondary cpu.

>   - Is any minimal padding required around cpu-release-addrs? e.g. can
>     FW or bootlaoder put data in the same cacheline-aligned region and
>     exepct the OS not to corrupt this?

The word-sized memory (for J2, 32-bit) at the address is what's being
addressed. There is no implicit license for the kernel to clobber
other nearby data.

> * How should the OS map the region (i.e. which MMU/cache attributes)?

For J2, there is no mmu. All these specs need extension for future
models with mmu, because the properties of the mmu should be described
in the DT.

>   - Is the address permitted to be a device register?

I don't see a strong reason to disallow it. Do you?

> * Where can this memory live?
>   - Should it be abesnt from any memory node? To which padding?
>   - Should the memory be described with a memreserve? If so, what are
>     your architecture-specific memreserve semantics (i.e. which
>     MMU/cache attributes are permitted for a memreserve'd region)?

If it's in the memory regions exposed by the DT to the kernel, it
should be reserved, I think. In practice it's not.

> * What state should the CPU be in when it branches to the provided
>   address?
>   - Must the MMU be off?

Current models are nommu.

>   - What state must any cache be in?
>     Should FW perform any implementation defined coherency and cache
>     management prior to branching?

The current dcache implementation is fully coherent, but we want to
relax that If this changes the hw/fw should ensure that the secondary
cpu being started does not see stale dcache. The icache requires
explicit flush so secondary should start with it off (as it does now)
or ensure that it's flushed before jumping to kernel-provided start
address.

>   - Must the CPU be in a particular endianness?

There are not any switchable-endian J-Core cpus. If such a thing is
added it would make sense to describe this.

>   - Which exceptions must be masked?

In practice I think it's the same as how cpu0 starts. I suspect that's
with everything masked, but I'm not sure right off.

>   - Are interrupts permitted to be pending?

In practice they won't be for current implementations, but I don't
have an answer that can forsee the future.

>   - Should debug logic (e.g. breakpoint and watchpoints) be placed into
>     a quiescent state?

This depends on having a model that has these features.

>   - Should any registers be programmed with specific values?

No, beyond perhaps things like control registers (IMASK).

> At some point, you are likely to want CPU hotplug and/or cpuidle. We
> didn't provision the arm64 spin-table for either of these and never
> extended it, but you may want to put in place some discoverability now
> to allow future OSs to use that new support while allowing current OSs
> to retain functional (e.g. not requiring a new enable-method string).
>
> > +---------------------
> > +Cache controller node
> > +---------------------
> > +
> > +Required properties:
> > +
> > +- compatible: Must be "jcore,cache".
> > +
> > +- reg: A memory range for the cache controller registers.
>
> There is a well-defined memory map for the cache controller?
>
> If so, please refer to documentation for it here (either in this
> section, or the top of this document if common with other elements
> described herein).

The current version "jcore,cache" has a single 32-bit control register
per cpu that can be used to enable/disable/flush icache and/or dcache.
There is no finer-grained control. If/when we do larger caches in the
future where it makes sense, there will be a new binding for it. (For
example it may make sense to do one that matches the original SH
memory-mapped cache interface.)

> > +--------
> > +IPI node
> > +--------
> > +
> > +Device trees for SMP systems must have an IPI node representing the mechanism
> > +used for inter-processor interrupt generation.
> > +
> > +Required properties:
> > +
> > +- compatible: Must be "jcore,ipi-controller".
> > +
> > +- reg: A memory range used to IPI generation.
> > +
> > +- interrupts: An irq on which IPI will be received.
>
> Likewise.

It's the same (actually even the same memory range, though I didn't
see a sense in requiring that; there's also an IPI-generate bit).

> > +----------
> > +CPUID node
> > +----------
> > +
> > +Device trees for SMP systems must have a CPUID node representing the mechanism
> > +used to identify the current processor on which execution is taking place.
> > +
> > +Required properties:
> > +
> > +- compatible: Must be "jcore,cpuid-mmio".
> > +
> > +- reg: A memory range containing a single 32-bit mmio register which produces
> > +  the current cpu id (matching the "reg" property of the cpu performing the
> > +  read) when read.
>
> Likewise.

One general question I have about all of your comments -- is the DT
binding file really supposed to amount to a hardware programming
manual, fully specifying all of the programming interfaces? I don't
see that in other binding files, and it feels like what you're asking
for is beyond the scope of just specifying the bindings.

If you really do want a lot more detail for SMP-related bindings, I
could consider submitting a version with SMP omitted for now (since
the kernel patches submitted at this point don't include SMP) and do
the addition of SMP as a separate patch later. But with the launch of
open-hardware boards capable of running SMP J2 systems (see
https://twitter.com/jcoreeng/status/730330848306700288) near, I'd like
to be getting bindings we can use stabilized so that we're properly
including DTB in the boot rom and not relying on external DTB files or
linking DTB in kernel.

Rich
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH v3 03/12] of: add J-Core interrupt controller bindings

Rich Felker-2
In reply to this post by Mark Rutland
On Wed, May 25, 2016 at 11:25:04AM +0100, Mark Rutland wrote:

> On Wed, May 25, 2016 at 05:43:03AM +0000, Rich Felker wrote:
> > Signed-off-by: Rich Felker <[hidden email]>
> > ---
> >  .../bindings/interrupt-controller/jcore,aic.txt    | 29 ++++++++++++++++++++++
> >  1 file changed, 29 insertions(+)
> >  create mode 100644 Documentation/devicetree/bindings/interrupt-controller/jcore,aic.txt
> >
> > diff --git a/Documentation/devicetree/bindings/interrupt-controller/jcore,aic.txt b/Documentation/devicetree/bindings/interrupt-controller/jcore,aic.txt
> > new file mode 100644
> > index 0000000..5dc99b9
> > --- /dev/null
> > +++ b/Documentation/devicetree/bindings/interrupt-controller/jcore,aic.txt
> > @@ -0,0 +1,29 @@
> > +J-Core Advanced Interrupt Controller
> > +
> > +Required properties:
> > +
> > +- compatible : Should be "jcore,aic1" for the (obsolete) first-generation aic
> > +  with 8 interrupt lines with programmable priorities, or "jcore,aic2" for
> > +  the "aic2" core with 64 interrupts.
> > +
> > +- reg : Memory region for configuration.
> > +
> > +- interrupt-controller : Identifies the node as an interrupt controller
> > +
> > +- #interrupt-cells : Specifies the number of cells needed to encode an
> > +  interrupt source. The value shall be 1.
> > +
> > +Optional properties:
> > +
> > +- cpu-offset : For SMP, the offset to the per-cpu memory region for
> > +  configuration, to be scaled by the cpu number.
>
> I take is that "cpu number" means the "sequential, zero-based hardware
> cpu id" defined in patch 2. I would recommend that you explicitly
> mention that (e.g. here say "hardware cpu id" rather than "cpu number"),
> so as to not have this confused with Linux logical IDs.

OK. The current arch/sh SMP framework only has nominal support for hw
cpuid != logical cpuid; it's not actually used/usable right now. But
the DT binding spec should be clear on this anyway.

Rich
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH v3 12/12] sh: add device tree source for J2 FPGA on Mimas v2 board

Rich Felker-2
In reply to this post by Mark Rutland
On Wed, May 25, 2016 at 11:33:50AM +0100, Mark Rutland wrote:

> On Wed, May 25, 2016 at 05:43:03AM +0000, Rich Felker wrote:
> > Signed-off-by: Rich Felker <[hidden email]>
> > ---
> >  arch/sh/boot/dts/j2_mimas_v2.dts | 87 ++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 87 insertions(+)
> >  create mode 100755 arch/sh/boot/dts/j2_mimas_v2.dts
> >
> > diff --git a/arch/sh/boot/dts/j2_mimas_v2.dts b/arch/sh/boot/dts/j2_mimas_v2.dts
> > new file mode 100755
> > index 0000000..4a66cda
> > --- /dev/null
> > +++ b/arch/sh/boot/dts/j2_mimas_v2.dts
> > @@ -0,0 +1,87 @@
> > +/dts-v1/;
> > +
> > +/ {
> > + compatible = "jcore,j2-soc";
> > + model = "J2 FPGA SoC on Mimas v2 board";
> > +
> > + #address-cells = <1>;
> > + #size-cells = <1>;
> > +
> > + interrupt-parent = <&aic>;
> > +
> > + cpus {
> > + #address-cells = <1>;
> > + #size-cells = <0>;
> > +
> > + cpu@0 {
> > + device_type = "cpu";
> > + compatible = "jcore,j2";
> > + reg = < 0 >;
> > + clock-frequency = < 50000000 >;
>
> Nit: please remove the spacing around the '<' and '>' here. If nothing
> else, it's inconsistent with the rest of this file.
>
> > + };
> > + };
> > +
> > + memory@10000000 {
> > + device_type = "memory";
> > + reg = < 0x10000000 0x4000000 >;
> > + };
>
> Likewise.

OK, I'll change both of these.

> > +
> > + chosen {
> > + stdout-path = "/soc@abcd0000/serial@100";
> > + };
>
> Please use a label for the serial node, have an alias, and describe the
> pre-configured rate per the stdout-path binding, e.g.

Per Documentation/devicetree/bindings/xilinx.txt, current-speed is a
required property for "xlnx,xps-uartlite-1.00.a". Note that uartlite
does not actually have a programmable baud rate; the property is
instead being used to describe the hardware-provided rate.

BTW our uartlite is not actually derived from Xilinx's IP core, just
interface-compatible with it, so I'd actually like to add a
"jcore,uartlite" binding too in order to express this, with
"xlnx,xps-uartlite-1.00.a" as the fallback compatible-tag. I'll
propose that as a separate patch later.

Rich
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH v3 00/12] J-core J2 cpu and SoC peripherals support

Mark Brown-2
In reply to this post by Rich Felker-2
On Wed, May 25, 2016 at 06:24:41PM -0400, Rich Felker wrote:
> On Wed, May 25, 2016 at 10:54:44AM +0100, Mark Brown wrote:
> > On Wed, May 25, 2016 at 05:43:02AM +0000, Rich Felker wrote:

> > > As arch/sh co-maintainer my intent is to include as much as possible
> > > in my pull request for the linux-sh tree. If there are parts outside
> > > of arch/sh that can be included in this, please let me know. I'm not

> > Do *not* include the SPI driver, you shouldn't be including any drivers
> > unless it's been explicitly discussed with the subsystem maintainers.

> See the "please let me know". I thought this was plenty clear that I
> was asking for permission for including things outside of arch/sh, and
> that short of getting an ack, the default permission is no. You also
> snipped the part of my message that mentioned the specific subsystems
> I was asking about (which were non-SPI because you already made quite
> a point about not taking the SPI driver):

Given that you started off with "my intent is to include as much as
possible" and the general apparent lack of clarity about the process
it's really not sufficiently obvious to me based on your message that
this is clear to you.  The presentation of your message especially in
the context of the prior discussion suggests that it is expected for
things to go in at this point which haven't even been in -next yet and
that this is all perfectly normal, it is really not clear enough to me
reading it that you are looking for acks but instead sounds like you
might possibly intend to try to send anything that doesn't get
explicitly nacked.

It would really have helped to have some explicit mention of the fact
that you understand that what you are asking for is unusual and some
discussion of why you think it should still go in.  The best and
clearest thing to do would have been to post the series you were
considering sending as one series and everything else separately.  This
is one of the reasons why it was recommended to you that you should
split things up, it helps make things clear - the normal thing would be
that a series like this would be what you were planning to send.
Failing that another thing that'd have helped would be an explicit
mention of the bits you knew weren't going to be included in any pull
request.

> > end of the merge window and a new version is being posted.  The
> > turnaround times you are demanding on review are unreasonable - people
> > get busy, have holidays and so on - and you really need to pay attention
> > to what people are telling you about the process or you're just going to
> > annoy people.

> If you can't review and ack the code on short notice, that's fine;
> just say so. There's no need to be overerly hostile about it. I've

Since you were still talking about sending pull requests for this code
during this mergen window after the previous thread I want to be as sure
as I can be that you do understand the process and remove any hint of
ambiguity.

Note that you should not expect that people are going to send you an
explicit message about when they intend to review things and usually a
week is considered the lowest bound for chasing on things that aren't
urgent.

> gotten arch/sh patches during the merge window before and I try to be
> polite with the contributor and ask if there's something seriously
> broken that would be improved by my making an effort to check it at
> the last minute, or it if can happily wait until next time.

You're not a new contributor posting some patches here, you are talking
about sending pull requests as the architecture maintainer.  That's
rather different.  

If you were just sending patches that would be fine and not at all
disruptive, that is a perfectly normal part of the workflow, but that's
not what's happening here.  I'm actually one of the people who's more
gung ho about applying things - I do tend to apply patches right up to
the wire and will carry on reviewing and applying new code through the
merge window (I looked at your first version after all) but only fixes
will get queued up for Linus, anything else that sits on topic branches
until after the merge window.

> Being that the driver in question here is for a new platform that was
> not previously supported upstream and has zero chance of breaking
> anything else, and that its inclusion would be a big plus for users of

Even if people aren't going to run the code it's buildable on other
architectures (as it should be so we can compile test things, do static
analysis and so on) and breaking the build or even introducing new
warnings during the merge window isn't helpful.  People build and test
things like all*config as a matter of routine and if those get broken
then that takes people's time can mask other issues.

> the platform, I don't see any reason for you making such a big deal
> out of it unless enforcing policy for its own sake makes you feel
> good, but I have better things to do than argue about it.

Like I say it's the bit where you're talking about sending pull requests
that's really flagging up here.

Most people's changes are important for them and only affect some
specific subset of platforms or systems which often aren't widely used
outside a given company but we still expect their changes to be in -next
before the merge window.  It's a lot easier for everyone to just follow
that rule, we have time based releases and things like -next available
so it is really not the end of the world to wait for one release, doing
this is fairer, minimises the risk of disruption for everyone and it
saves effort with evaluating the different bits of special pleading or
trying to rush to get reviews done quickly.

> > If you want people to review DT bindings you're going to need to submit
> > them.

> I have, twice now, and I Cc'd the the linux-spi list too on v3 for the
> spi binding. Rob Herring acked it.

To repeat what was said last time the code and binding need to be
reviewed together - they will generally be merged together.  This means
that you need to copy subsystem maintainers on bindings for their
subsystem along with the code, as a rule you should send the binding to
at least everyone you send the code to.  Sending things to the lists
alone is not enough to ensure they will be seen with the code using
them.

signature.asc (484 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH v3 02/12] of: add J-Core cpu bindings

Geert Uytterhoeven
In reply to this post by Rich Felker-2
Hi Rich,

On Thu, May 26, 2016 at 1:04 AM, Rich Felker <[hidden email]> wrote:
> If you really do want a lot more detail for SMP-related bindings, I
> could consider submitting a version with SMP omitted for now (since
> the kernel patches submitted at this point don't include SMP) and do
> the addition of SMP as a separate patch later. But with the launch of
> open-hardware boards capable of running SMP J2 systems (see
> https://twitter.com/jcoreeng/status/730330848306700288) near, I'd like
> to be getting bindings we can use stabilized so that we're properly
> including DTB in the boot rom and not relying on external DTB files or
> linking DTB in kernel.

Submitting a version now without SMP is indeed a good idea, and allows
to move forward.

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [hidden email]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH v3 02/12] of: add J-Core cpu bindings

Mark Rutland
In reply to this post by Rich Felker-2
On Wed, May 25, 2016 at 07:04:08PM -0400, Rich Felker wrote:

> On Wed, May 25, 2016 at 11:22:15AM +0100, Mark Rutland wrote:
> > > +Optional properties:
> > > +
> > > +- enable-method: Required only for SMP systems. If present, must be
> > > +  "jcore,spin-table".
> > > +
> > > +
> > > +--------------------
> > > +Individual cpu nodes
> > > +--------------------
> > > +
> > > +Required properties:
> > > +
> > > +- device_type: Must be "cpu".
> > > +
> > > +- compatible: Must be "jcore,j2".
> > > +
> > > +- reg: Must be 0 on uniprocessor systems, or the sequential, zero-based
> > > +  hardware cpu id on SMP systems.
> > > +
> > > +Optional properties:
> > > +
> > > +- clock-frequency: Clock frequency of the cpu in Hz.
> > > +
> > > +- cpu-release-addr: Necessary only for secondary processors on SMP systems
> > > +  using the "jcore,spin-table" enable method. If present, must consist of
> > > +  two cells containing physical addresses. The first cell contains an
> > > +  address which, when written, unblocks the secondary cpu. The second cell
> > > +  contains an address from which the cpu will read its initial program
> > > +  counter when unblocked.
> >
> > I take it this follows the example of the arm64 spin-table rather than
> > the ePAPR spin-table, given the lack of an ePAPR reference or struct
> > definition.
>
> Indeed, I wasn't aware of the ePAPR spec for it. Would you prefer that
> we use a different name or something?

No, the "jcore,spin-table" name is fine.

> > From my experience with the arm64 spin-table, I would *strongly*
> > recommend that you tighten this up, and define things more thoroughly,
> > before a variety of FW/bootloader implementations appear. e.g.
> >
> > * What initial/invalid value should the location contain? Is zero a
> >   valid address that you might want to jump a secondary CPU to?
>
> I believe the hardware is implemented such that just the appearance of
> the address for write on the bus unblocks the secondary cpu waiting on
> it.

Ok, so this is effectively a device register, rather than a location in
"real" memory.

> As for the address to jump to, this is provided by the kernel and,
> for Linux purposes, is always _stext. As I understand the mechanism,
> the actual initial PC for secondary cpus is in rom, and the code there
> is responsible for loading the application-desired (i.e.
> kernel-desired) initial PC and jumping to it.

Ok. Is this second address also a device register, or might this be in
"real" memory?

> > * How must the value be written?
> >   - Which endianness?
>
> CPU native.

Ok. I take it that a CPU's endianness cannot be switched onthe fly,
then? Or does the hardware backing the release-addr register handle
arbitrary endianness dynamically?

If you want to reuse the same HW block across configurations where CPU
endianness differs, it may make sense to define an endianness
regardless, to simplify integration concerns.

> >   - With a single store? Or is there a more involved sequence to prevent
> >     the secondary CPU from seeing a torn value?
>
> The start address is just a physical ram address (internal sram) and
> how it's written does not matter, because it's only read once the
> release write occurs.

Sure. I had initially mis-read the documentation and applied my
understanding of the arm64 spin-table sequence (which only has a single
write for both purposes).

For the actual release write are there any constraints? e.g. value, size
of access?

> > * Must CPUs have a unique cpu-release-addr? I would *strongly* recommend
> >   that they do.
>
> There is currently no spec or implementation with more than one
> secondary cpu.

Ok. Please bear the above in mind if/when implementations with more than
two secondary CPUs are conceivable.

> >   - Is any minimal padding required around cpu-release-addrs? e.g. can
> >     FW or bootlaoder put data in the same cacheline-aligned region and
> >     exepct the OS not to corrupt this?
>
> The word-sized memory (for J2, 32-bit) at the address is what's being
> addressed. There is no implicit license for the kernel to clobber
> other nearby data.

My concern was that if your memory system is not fully coherent, and
the CPU has a cacheable mapping of the initial program counter field, it
would need to perform a cache clean to ensure visibility of that field.
If the cache line for that region were stale, that would clobber data in
the same cache line (e.g. something owned/used by FW).

Per your comments below that doesn't matter now but may in future.

> > * How should the OS map the region (i.e. which MMU/cache attributes)?
>
> For J2, there is no mmu. All these specs need extension for future
> models with mmu, because the properties of the mmu should be described
> in the DT.

I was going by the fact you had a binding for a cache, which I assumed
was SW configurable. If that's not the case, then my questions about
caches and MMU attributes do not apply for the timebeing.

> >   - Is the address permitted to be a device register?
>
> I don't see a strong reason to disallow it. Do you?

So long as you can guarantee that OS does not have a cacheable mapping
of this region, and the size of the access wis well-defined, I do not
see a reason to disallow it.

> > * Where can this memory live?
> >   - Should it be abesnt from any memory node? To which padding?
> >   - Should the memory be described with a memreserve? If so, what are
> >     your architecture-specific memreserve semantics (i.e. which
> >     MMU/cache attributes are permitted for a memreserve'd region)?
>
> If it's in the memory regions exposed by the DT to the kernel, it
> should be reserved, I think. In practice it's not.

Ok. This should be documented (as we do for the arm64 spin-table).

Perhaps that is not a major problem if the OS never pokes the release
register.

If you do /memreserve/ the region rather than carving it out of memory
nodes, you will also need to define semantics for memreserve. Typically
memreserve meaans that the OS should not perform any stores to the
region, but is permitted to map it with some architecture-specific
cacheable attributes.

> > * What state should the CPU be in when it branches to the provided
> >   address?
> >   - Must the MMU be off?
>
> Current models are nommu.
>
> >   - What state must any cache be in?
> >     Should FW perform any implementation defined coherency and cache
> >     management prior to branching?
>
> The current dcache implementation is fully coherent, but we want to
> relax that If this changes the hw/fw should ensure that the secondary
> cpu being started does not see stale dcache.

Ok.

> The icache requires explicit flush so secondary should start with it
> off (as it does now) or ensure that it's flushed before jumping to
> kernel-provided start address.
>
> >   - Must the CPU be in a particular endianness?
>
> There are not any switchable-endian J-Core cpus. If such a thing is
> added it would make sense to describe this.
>
> >   - Which exceptions must be masked?
>
> In practice I think it's the same as how cpu0 starts. I suspect that's
> with everything masked, but I'm not sure right off.
>
> >   - Are interrupts permitted to be pending?
>
> In practice they won't be for current implementations, but I don't
> have an answer that can forsee the future.
>
> >   - Should debug logic (e.g. breakpoint and watchpoints) be placed into
> >     a quiescent state?
>
> This depends on having a model that has these features.
>
> >   - Should any registers be programmed with specific values?
>
> No, beyond perhaps things like control registers (IMASK).
>
> > At some point, you are likely to want CPU hotplug and/or cpuidle. We
> > didn't provision the arm64 spin-table for either of these and never
> > extended it, but you may want to put in place some discoverability now
> > to allow future OSs to use that new support while allowing current OSs
> > to retain functional (e.g. not requiring a new enable-method string).
> >
> > > +---------------------
> > > +Cache controller node
> > > +---------------------
> > > +
> > > +Required properties:
> > > +
> > > +- compatible: Must be "jcore,cache".
> > > +
> > > +- reg: A memory range for the cache controller registers.
> >
> > There is a well-defined memory map for the cache controller?
> >
> > If so, please refer to documentation for it here (either in this
> > section, or the top of this document if common with other elements
> > described herein).
>
> The current version "jcore,cache" has a single 32-bit control register
> per cpu that can be used to enable/disable/flush icache and/or dcache.
> There is no finer-grained control. If/when we do larger caches in the
> future where it makes sense, there will be a new binding for it. (For
> example it may make sense to do one that matches the original SH
> memory-mapped cache interface.)

Ok, this is simpler than I had anticipated.

> > > +--------
> > > +IPI node
> > > +--------
> > > +
> > > +Device trees for SMP systems must have an IPI node representing the mechanism
> > > +used for inter-processor interrupt generation.
> > > +
> > > +Required properties:
> > > +
> > > +- compatible: Must be "jcore,ipi-controller".
> > > +
> > > +- reg: A memory range used to IPI generation.
> > > +
> > > +- interrupts: An irq on which IPI will be received.
> >
> > Likewise.
>
> It's the same (actually even the same memory range, though I didn't
> see a sense in requiring that; there's also an IPI-generate bit).
>
> > > +----------
> > > +CPUID node
> > > +----------
> > > +
> > > +Device trees for SMP systems must have a CPUID node representing the mechanism
> > > +used to identify the current processor on which execution is taking place.
> > > +
> > > +Required properties:
> > > +
> > > +- compatible: Must be "jcore,cpuid-mmio".
> > > +
> > > +- reg: A memory range containing a single 32-bit mmio register which produces
> > > +  the current cpu id (matching the "reg" property of the cpu performing the
> > > +  read) when read.
> >
> > Likewise.
>
> One general question I have about all of your comments -- is the DT
> binding file really supposed to amount to a hardware programming
> manual, fully specifying all of the programming interfaces? I don't
> see that in other binding files, and it feels like what you're asking
> for is beyond the scope of just specifying the bindings.

The binding file is not intended to be a full HW description, but where
possible relevant documentation should be referred to, in case there is
some ambiguity.

My questions about SMP are largely orthogonal to DT; I simply have
experience in dealing with that for arm64, and am aware of some of the
pain points that were not immediately obvious.

> If you really do want a lot more detail for SMP-related bindings, I
> could consider submitting a version with SMP omitted for now (since
> the kernel patches submitted at this point don't include SMP) and do
> the addition of SMP as a separate patch later. But with the launch of
> open-hardware boards capable of running SMP J2 systems (see
> https://twitter.com/jcoreeng/status/730330848306700288) near, I'd like
> to be getting bindings we can use stabilized so that we're properly
> including DTB in the boot rom and not relying on external DTB files or
> linking DTB in kernel.

I would argue that the SMP bindings can be added at the same point as
the code. If there's any chance that something may change, having the
bindings in the kernel early gives a potentially misleading impression
of stability.

I assume that you have the facility to upgrade the boot ROM?

Thanks,
Mark.
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH v3 12/12] sh: add device tree source for J2 FPGA on Mimas v2 board

Mark Rutland
In reply to this post by Rich Felker-2
On Wed, May 25, 2016 at 07:15:25PM -0400, Rich Felker wrote:

> On Wed, May 25, 2016 at 11:33:50AM +0100, Mark Rutland wrote:
> > On Wed, May 25, 2016 at 05:43:03AM +0000, Rich Felker wrote:
> > > Signed-off-by: Rich Felker <[hidden email]>
> > > ---
> > >  arch/sh/boot/dts/j2_mimas_v2.dts | 87 ++++++++++++++++++++++++++++++++++++++++
> > >  1 file changed, 87 insertions(+)
> > >  create mode 100755 arch/sh/boot/dts/j2_mimas_v2.dts
> > >
> > > diff --git a/arch/sh/boot/dts/j2_mimas_v2.dts b/arch/sh/boot/dts/j2_mimas_v2.dts
> > > new file mode 100755
> > > index 0000000..4a66cda
> > > --- /dev/null
> > > +++ b/arch/sh/boot/dts/j2_mimas_v2.dts
> > > @@ -0,0 +1,87 @@
> > > +/dts-v1/;
> > > +
> > > +/ {
> > > + compatible = "jcore,j2-soc";
> > > + model = "J2 FPGA SoC on Mimas v2 board";
> > > +
> > > + #address-cells = <1>;
> > > + #size-cells = <1>;
> > > +
> > > + interrupt-parent = <&aic>;
> > > +
> > > + cpus {
> > > + #address-cells = <1>;
> > > + #size-cells = <0>;
> > > +
> > > + cpu@0 {
> > > + device_type = "cpu";
> > > + compatible = "jcore,j2";
> > > + reg = < 0 >;
> > > + clock-frequency = < 50000000 >;
> >
> > Nit: please remove the spacing around the '<' and '>' here. If nothing
> > else, it's inconsistent with the rest of this file.
> >
> > > + };
> > > + };
> > > +
> > > + memory@10000000 {
> > > + device_type = "memory";
> > > + reg = < 0x10000000 0x4000000 >;
> > > + };
> >
> > Likewise.
>
> OK, I'll change both of these.
>
> > > +
> > > + chosen {
> > > + stdout-path = "/soc@abcd0000/serial@100";
> > > + };
> >
> > Please use a label for the serial node, have an alias, and describe the
> > pre-configured rate per the stdout-path binding, e.g.
>
> Per Documentation/devicetree/bindings/xilinx.txt, current-speed is a
> required property for "xlnx,xps-uartlite-1.00.a". Note that uartlite
> does not actually have a programmable baud rate; the property is
> instead being used to describe the hardware-provided rate.

Ah, ok. I was not aware of this.

> BTW our uartlite is not actually derived from Xilinx's IP core, just
> interface-compatible with it, so I'd actually like to add a
> "jcore,uartlite" binding too in order to express this, with
> "xlnx,xps-uartlite-1.00.a" as the fallback compatible-tag. I'll
> propose that as a separate patch later.

Please do.

Thanks,
Mark.
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH v3 02/12] of: add J-Core cpu bindings

Rich Felker-2
In reply to this post by Mark Rutland
On Thu, May 26, 2016 at 11:38:29AM +0100, Mark Rutland wrote:

> On Wed, May 25, 2016 at 07:04:08PM -0400, Rich Felker wrote:
> > On Wed, May 25, 2016 at 11:22:15AM +0100, Mark Rutland wrote:
> > > > +Optional properties:
> > > > +
> > > > +- enable-method: Required only for SMP systems. If present, must be
> > > > +  "jcore,spin-table".
> > > > +
> > > > +
> > > > +--------------------
> > > > +Individual cpu nodes
> > > > +--------------------
> > > > +
> > > > +Required properties:
> > > > +
> > > > +- device_type: Must be "cpu".
> > > > +
> > > > +- compatible: Must be "jcore,j2".
> > > > +
> > > > +- reg: Must be 0 on uniprocessor systems, or the sequential, zero-based
> > > > +  hardware cpu id on SMP systems.
> > > > +
> > > > +Optional properties:
> > > > +
> > > > +- clock-frequency: Clock frequency of the cpu in Hz.
> > > > +
> > > > +- cpu-release-addr: Necessary only for secondary processors on SMP systems
> > > > +  using the "jcore,spin-table" enable method. If present, must consist of
> > > > +  two cells containing physical addresses. The first cell contains an
> > > > +  address which, when written, unblocks the secondary cpu. The second cell
> > > > +  contains an address from which the cpu will read its initial program
> > > > +  counter when unblocked.
> > >
> > > I take it this follows the example of the arm64 spin-table rather than
> > > the ePAPR spin-table, given the lack of an ePAPR reference or struct
> > > definition.
> >
> > Indeed, I wasn't aware of the ePAPR spec for it. Would you prefer that
> > we use a different name or something?
>
> No, the "jcore,spin-table" name is fine.
>
> > > From my experience with the arm64 spin-table, I would *strongly*
> > > recommend that you tighten this up, and define things more thoroughly,
> > > before a variety of FW/bootloader implementations appear. e.g.
> > >
> > > * What initial/invalid value should the location contain? Is zero a
> > >   valid address that you might want to jump a secondary CPU to?
> >
> > I believe the hardware is implemented such that just the appearance of
> > the address for write on the bus unblocks the secondary cpu waiting on
> > it.
>
> Ok, so this is effectively a device register, rather than a location in
> "real" memory.

Yes, I re-checked and it actually works as a device register. Sorry
for the confusion. I think what happened is that I modeled the
binding/kernel with the intent that it could just as well be in normal
memory with the cpu spinning on it.

> > As for the address to jump to, this is provided by the kernel and,
> > for Linux purposes, is always _stext. As I understand the mechanism,
> > the actual initial PC for secondary cpus is in rom, and the code there
> > is responsible for loading the application-desired (i.e.
> > kernel-desired) initial PC and jumping to it.
>
> Ok. Is this second address also a device register, or might this be in
> "real" memory?

In practice it's real memory, but aside from possible constraints on
how it's written I don't think it matters a lot.

> > > * How must the value be written?
> > >   - Which endianness?
> >
> > CPU native.
>
> Ok. I take it that a CPU's endianness cannot be switched onthe fly,
> then? Or does the hardware backing the release-addr register handle
> arbitrary endianness dynamically?

No, it's not switched on the fly.

> If you want to reuse the same HW block across configurations where CPU
> endianness differs, it may make sense to define an endianness
> regardless, to simplify integration concerns.

The existing cpus are all big-endian, but I believe at one point there
was talk that it's easy to get a little-endian version if you want. In
any case the value is to be read by the cpu, so cpu endianness (i.e.
no endianness, just a value) is the only thing that makes sense to
specify. Adding a fixed endian spec independent of cpu endianness just
complicates both hardware and kernel implementation and its only
benefit seems to be supporting runtime-switchable chips where the
entry-point code has to select the endianness to match the rest of the
kernel.

> > >   - With a single store? Or is there a more involved sequence to prevent
> > >     the secondary CPU from seeing a torn value?
> >
> > The start address is just a physical ram address (internal sram) and
> > how it's written does not matter, because it's only read once the
> > release write occurs.
>
> Sure. I had initially mis-read the documentation and applied my
> understanding of the arm64 spin-table sequence (which only has a single
> write for both purposes).
>
> For the actual release write are there any constraints? e.g. value, size
> of access?

I'm not sure. If so, native word (32-bit) would be the right size to
specify, but it's possible that any size works.

> > > * Must CPUs have a unique cpu-release-addr? I would *strongly* recommend
> > >   that they do.
> >
> > There is currently no spec or implementation with more than one
> > secondary cpu.
>
> Ok. Please bear the above in mind if/when implementations with more than
> two secondary CPUs are conceivable.

Yes.

> > >   - Is any minimal padding required around cpu-release-addrs? e.g. can
> > >     FW or bootlaoder put data in the same cacheline-aligned region and
> > >     exepct the OS not to corrupt this?
> >
> > The word-sized memory (for J2, 32-bit) at the address is what's being
> > addressed. There is no implicit license for the kernel to clobber
> > other nearby data.
>
> My concern was that if your memory system is not fully coherent, and
> the CPU has a cacheable mapping of the initial program counter field, it
> would need to perform a cache clean to ensure visibility of that field.
> If the cache line for that region were stale, that would clobber data in
> the same cache line (e.g. something owned/used by FW).
>
> Per your comments below that doesn't matter now but may in future.

I agree, but from a practical standpoint I think it's irrelevant. The
secondary cpu(s) should start with either empty cache or just some
boot rom code in their caches (and current architecture does not even
cache sram/"rom"). Flushing would be a good idea just to be safe but
it's going to be an effective no-op.

> > > * How should the OS map the region (i.e. which MMU/cache attributes)?
> >
> > For J2, there is no mmu. All these specs need extension for future
> > models with mmu, because the properties of the mmu should be described
> > in the DT.
>
> I was going by the fact you had a binding for a cache, which I assumed
> was SW configurable. If that's not the case, then my questions about
> caches and MMU attributes do not apply for the timebeing.

The only configuration possible via the current binding is
enabling/disabling the cache. If it's enabled, it needs to be used for
icache flush when new code is written, and for dcache flush
before/after DMA. (Note that there is no DMA binding yet and no DMA
driver; these are coming later.)

> > >   - Is the address permitted to be a device register?
> >
> > I don't see a strong reason to disallow it. Do you?
>
> So long as you can guarantee that OS does not have a cacheable mapping
> of this region, and the size of the access wis well-defined, I do not
> see a reason to disallow it.
>
> > > * Where can this memory live?
> > >   - Should it be abesnt from any memory node? To which padding?
> > >   - Should the memory be described with a memreserve? If so, what are
> > >     your architecture-specific memreserve semantics (i.e. which
> > >     MMU/cache attributes are permitted for a memreserve'd region)?
> >
> > If it's in the memory regions exposed by the DT to the kernel, it
> > should be reserved, I think. In practice it's not.
>
> Ok. This should be documented (as we do for the arm64 spin-table).
>
> Perhaps that is not a major problem if the OS never pokes the release
> register.
>
> If you do /memreserve/ the region rather than carving it out of memory
> nodes, you will also need to define semantics for memreserve. Typically
> memreserve meaans that the OS should not perform any stores to the
> region, but is permitted to map it with some architecture-specific
> cacheable attributes.

My interpretation of memreserve is just that it marks memory ranges
that the kernel cannot use for allocatable memory for its own
purposes, despite otherwise possibly lying in the range of a "memory"
node. I would not interpret it as excluding accesses by drivers that
were told to use specific addresses in the "reserved" range as part of
their DT bindings.

> > One general question I have about all of your comments -- is the DT
> > binding file really supposed to amount to a hardware programming
> > manual, fully specifying all of the programming interfaces? I don't
> > see that in other binding files, and it feels like what you're asking
> > for is beyond the scope of just specifying the bindings.
>
> The binding file is not intended to be a full HW description, but where
> possible relevant documentation should be referred to, in case there is
> some ambiguity.
>
> My questions about SMP are largely orthogonal to DT; I simply have
> experience in dealing with that for arm64, and am aware of some of the
> pain points that were not immediately obvious.

OK, thanks for clarifying that. This is actually really helpful
feedback to have but I wasn't sure if you wanted me to consider it
part of what needs to be done for DT binding or for consideration in
designing and documenting the SMP architecture.

> > If you really do want a lot more detail for SMP-related bindings, I
> > could consider submitting a version with SMP omitted for now (since
> > the kernel patches submitted at this point don't include SMP) and do
> > the addition of SMP as a separate patch later. But with the launch of
> > open-hardware boards capable of running SMP J2 systems (see
> > https://twitter.com/jcoreeng/status/730330848306700288) near, I'd like
> > to be getting bindings we can use stabilized so that we're properly
> > including DTB in the boot rom and not relying on external DTB files or
> > linking DTB in kernel.
>
> I would argue that the SMP bindings can be added at the same point as
> the code. If there's any chance that something may change, having the
> bindings in the kernel early gives a potentially misleading impression
> of stability.

OK. I'll strip it down to just the parts that are relevant for non-SMP
and submit the patch adding SMP bindings along with the SMP kernel
patches.

> I assume that you have the facility to upgrade the boot ROM?

Yes. For FPGA implementations it's just part of the FPGA bitstream and
you upgrade it the same way you load a new bitstream onto the FPGA.

Rich
12