Re: [RFC/PATCH 0/22] W1: sysfs, lifetime and other fixes

classic Classic list List threaded Threaded
18 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: [RFC/PATCH 0/22] W1: sysfs, lifetime and other fixes

Evgeniy Polyakov
On Thu, 2005-04-21 at 09:31 -0500, Dmitry Torokhov wrote:

> Hi Evgeniy,
>
> On 4/21/05, Evgeniy Polyakov <[hidden email]> wrote:
> > On Thu, 2005-04-21 at 02:07 -0500, Dmitry Torokhov wrote:
> > > Hi,
> >
> > Hello, Dmitry.
> >
> > > I happened to take a look into drivers/w1 and found there bunch of thigs
> > > that IMO should be changed:
> > >
> > > - custom-made refcounting is racy
> >
> > Why do you think so?
> > Did you find exactly the place which races against something?
> >
> > > - lifetime rules need to be better enforced
> >
> > Hmm, I misunderstand you.
> >
>
> Consider thie following:
>
> 451         while (atomic_read(&sl->refcnt)) {
> 452                 printk(KERN_INFO "Waiting for %s to become free:
> refcnt=%d.\n",
> 453                                 sl->name, atomic_read(&sl->refcnt));
> 454
> 455                 if (msleep_interruptible(1000))
> 456                         flush_signals(current);
> 457         }
> 458
> 459         sysfs_remove_bin_file (&sl->dev.kobj, &sl->attr_bin);
> 460         device_remove_file(&sl->dev, &sl->attr_name);
> 461         device_remove_file(&sl->dev, &sl->attr_val);
> 462         device_unregister(&sl->dev);
> 463         w1_family_put(sl->family);
> .. And caller does kfree(sl);
>
> Now, if application opens slave's sysfs attribute while other thread
> exited the loop and is about to remove attributes, then you will kfree
> object that is in use and who knows what will happen. This is example
> of both refcounting being racey and lifetime rules being violated.
Yes, I see now.
I will think of it some more...

> > > - family framework is insufficient for many advanced w1 devices
> >
> > No, family framework is just indication which family is used.
> > Feel free to implement additional methods for various devices
> > and store them in driver's private areas like ipaq does.
> >
>
> OK, that is what I am aying. But why do you need that attribute with
> variable name and a bin attribute that is not really bin but just a
> dump for all kind of data (looks like debug one).
bin attribute was created for lm_sensors scripts format - it only caches
read value.
I think there might be only 2 "must have" methods - read and write.
I plan to implement them using connector, so probably they will go away
completely.

> > > - custom-made hotplug notification over netlink should be removed in favor
> > >   of standard hotplug notification
> >
> > It is not hotplug, and your changes broke it completely.
> > I'm waiting for connector to be included or discarded, so I can move
> > w1 on top of it's interface or move connector's bits into the w1
> > subsystem.
> >
>
> You will not be able to cram all 1-wire devices into unified
> interface. You will need to build classes on top of it and you might
> use connector (I am not sure) bit not on w1 bus level.
> ...
connector allows to have different objects inside one netlink group,
so it will use it in that way.
I think only two w1 methods must exist - read and write,
and they must follow protocol, defined in family driver.


> > > w1-drop-owner.patch
> > >    Drop owner field from w1_master and w1_slave structures. Just having it
> > >    there does not magically fixes lifetime rules.
> >
> > They do not even pretend, I still do not understand what is "lifetime
> > rules"?
>
> So there is no point in having them, right?

If I understand you correctly, "lifetime rules" are implemented in a
following way:
when object is created it has 0 refcnt, each access increments it and
must
decrements when access is finished.
According to mix of sysfs vs. current refcnt I see a possibility to
free object in ->relese()/remove() method and that is all, but I need to
investigate it more deeply.

> >
> > > w1-bus-ops.patch
> > >    Cleanup bus operations code:
> > >    - have bus operatiions accept w1_master instead of unsigned long and
> > >      drop data field from w1_bus_master so the structure can be statically
> > >      allocated by driver implementing it;
> > >    - rename w1_bus_master to w1_bus_ops to avoid confusion with w1_master;
> > >    - separate master registering and allocation so drivers can setup proper
> > >      link between private data and master and set useable master's name.
> >
> > I strongly object against such changes.
> > 1. w1 was designed in the way that w1 bus master drivers do not
> > know about other w1 world. It is very simple and very low-level
> > abstraction,
> > that only understands how to do low-level functions. It is not needed
> > do know about w1_master structure and even about it's existence.
>
> Well, it does need to know about w1_bus_master structure, which is
> pretty much the same. And it allows having static bus_ops allocation
> and removes need for casting from unsigned longs...
w1_bus_master structure is low-level physical operations.
It _completely_ does not know abou logical links to other
w1 core objects, Why _bus_ driver should know about logical
objects on top of it? That is why they are separate.
Even if they are not too different (and actually they _are_ different)
from your point of view, they differ in abstration model.
Bus master driver - is like NIC driver, it does not know about the rest
of
the network stack, but you want it to have all info about neighbours,
routes
and so on...

> > 2. All renaming are superfluous, I'm not against it, but completely do
> > not
> > understand it's merits.
>
> Because now it represents operations only, data field has been
> dropped. I my head hurt when I see w1_master and w1_bus_muster
> together as 2 separate objects both representing the same piece of
> hardware.

No.

w1_bus_master is low-level driver for physical hardware operations.
w1_master is logical structure which is operated by w1 w1 core stack.
It has it's logical slaves, it has it's own attributes and features,
it _completely_ does not belong to low-level physical layer where you
want to place them both.

> > 3. You broke netlink allocation routing - it may fail and it is not
> > fatal.
>
> Because it is going away in later patcehs ;)

This is wrong - netlink notification is used and will be moved to
connector
interface later.

> >
> > > w1-fold-w1-int.patch
> > >    Fold w1_int.c into w1.c - there is no point in artificially separating
> > >    code for master devices between 2 files.
> >
> > w1_int.c was created to store external interface implementation,
> > why do you want to move it into w1 core code?
> > It will only soil the code...
>
> Because I do not understand why code creating master devices is
> separate from code creating master device's attributes.
It is better to move all attributes inside code creating master device
so place that attribute creating into w1_int.c

> >
> > > w1-drop-netlink.patch
> > >    Drop custom-made hotplug over netlink notification from w1 core.
> > >    Standard hotplug mechanism should work just fine (patch will follow).
> >
> > netlink notification was not created for hotplug.
> > Also I'm against w1 hotplug support, since hotplug is quite rarely used
> > in embedded platforms where the majority of w1 devices live.
>
> kobject_uevent does notification over netlink so I do not understand
> why custom approach is better. You don't really need to use script.
kobject is too big for that. It is used exactly for kobject changes.
Custom netlink notifications are created for w1 specific objects
and it's control. You can not control w1 slaves/masters using hotplug.

> > > w1-drop-control-thread.patch
> > >    Drop control thread from w1 core, whatever it does can also be done in
> > >    the context of w1_remove_master_device. Also, pin the module when
> > >    registering new master device to make sure that w1 core is not unloaded
> > >    until last device is gone. This simplifies logic a lot.
> >
> > Why do you think master can be removed in safe context only?
>
> Can you show me example where you remove master from an interrupt
> context or a tasklet? I doubt you will ever see one.
As I said I have feature requests for ability to export w1 devices
outside w1 core.
Probably it is due to it's private non-GPL usage, so it is not created,
but it is usefull feature actually and we can not know what will
happen in what context when we export master/slave devices.
w1 slaves can be found on the bus without search method reaction
implemented in it's asic, btw.
And it is _very_ usefull to add/remove slaves using external command but
not using
automatic detection in search methods.
 
> > I have feature requests for both adding/removing and exporting
> > master devices and slaves to the external world.
>
> External as in userspace? It (user thread) can wait just fine...

Exporting them into other kernel modules.
We do not know in what context that structures will be used there.

> > Control thread is also the place in which we kick all devices
> > when we need it, but not only when we need to remove w1 core module.
>
> Define kicking for me please...

Removing master device using netlink command for exaple.

> >
> > > w1-move-search-to-io.patch
> > >    Move w1_search function to w1_io.c to be with the rest of IO code.
> >
> > w1_search() is high-level protocol method, w1_io.c only contains
> > calls for low-level methods like bite/byte banging, reset, HW search and
> > so on.
>
> Well it does bit banging and completely foreign to the rest of w1
> code. It may be high-level operation as fas as 1-wire on-wire protocol
> goes, but it surely does not belong with kernel's W1 bus
> implementattion code.
It does not only low-level operations,
but, well, probably it is better to place it there.

> > > w1-master-attr-cleanup.patch
> > >    Clean-up master attribute implementation:
> > >    - drop unnecessary "w1_master" prefix from attribute names;
> > >    - do not acquire master->mutex when accessing attributes;
> > >    - move attribute code "closer" to the rest of master code.
> >
> > Ok, but slave count and slaves attributes itself requires that mutex.
>
> They are gone. You can scan sysfs to get your slaves and count. Kernel
> does not need to do that.
I created that files exactly for reaason to not scan the tree, but only
read one [two] files :)

> > > w1-master-scan-interval.patch
> > >    More master attributes changes:
> > >    - rename timeout parameter/attribute to scan_interval to better
> > >      reflect its purpose;
> > >    - make scan_timeout be a per-device attribute and allow changing
> > >      it from userspace via sysfs;
> > >    - allow changing max_slave_count it from userspace as well.
> >
> > I like that change, but why do you ned to change the name?
>
> Because nothing times out (as in error). It defines interval between
> scans -> scan_interval.
Ok.

> > > w1-master-cleanup.patch
> > >    Clean-up master device implementation:
> > >    - get rid of separate refcount, rely on driver model to enforce
> > >      lifetime rules;
> > >    - use atomic to generate unique master IDs;
> > >    - drop unused fields.
> >
> > That patch is very broken.
> > I completely against it:
> > 1. it breaks process logic - searching can be interrupted and stopped,
> > thread will exit on signals.
>
> Interrupted/stopped from userspace?
Your loop waits only until interrupt happens - it can be delivered from
anywhere.

> > 2. Your changes will break master/slave structure exporting.
>
> -ENEEDMOREDATA.

I think I described it in a master exporting paragraph, let's drop it
here.

> >
> > > w1-slave-cleanup.patch
> > >    Clean-up slave device implementation:
> > >    - get rid of separate refcount, rely on driver model to enforce
> > >      lifetime rules;
> > >    - pin w1 module until slave device is registered with sysfs to make
> > >      sure W1 core stays loaded.
> > >    - drop 'name' attribute as we already have it in bus_id.
> >
> > The same and even worse.
>
> You need to fix lifetime rules.
You moved all lto device model, while I want to have existing model
due to described actions.
So the right solution is to not break all existing locking, which
is mixed with device driver model, but create preper interoperability
model.
I will think of it some more, I will integrate your ad Adrian Bunk's
cleanups first when current pending patches are pushed.

> > > w1-family-cleanup.patch
> > >    Clean-up family implementation:
> > >    - get rid of w1_family_ops and template attributes in w1_slave
> > >      structure and have family drivers create necessary attributes
> > >      themselves. There are too many different devices using 1-Wire
> > >      interface and it is impossible to fit them all into single
> > >      attribute model. If interface unification is needed it can be
> > >      done by building cross-bus class hierarchy.
> > >    - rename w1_smem to w1_sernum because devices are called Silicon
> > >      serial numbers, they have address (ID) but don't have memory
> > >      in regular sense.
> > >    - rename w1_therm to w1_thermal.
> >
> > smem == simple memory id, it is official name AFAIR.
> > Renames are superfluous, family_ops contains a "must have" operations,
> > driver writer can easily add it's own if it is needed.
>
> What is so "must have" about 2 attributes? smem does not need anything
> for exampe...
It is not read and read_bin, but read and write.
I have not implemented write methods monthes ago since I did not
have such a hardware, but now I see it was bad decision which
confused people.

> > > w1-family-is-driver.patch
> > >    Convert family into proper device-model drivers:
> > >    - embed driver structure into w1_family and register with the
> > >      driver core;
> > >    - do not try to manually bind slaves to familes, leave it to
> > >      the driver core;
> > >    - fold w1_family.c into w1.c
> >
> > Why do you break it?
> > They were separated intentionally - it simplifies code review,
> > it is completely different logical models, family processing
> > is not hte same as slave.
>
> Masters, slaves and families are all objects of W1 kernel bus. With
> cutting a bunch of fat from family code it does not make sense to keep
> them separate anymore.
What you do is exactly the same that already exist, but using other
model.
No need to dig into device model in a such way.

> > Thank you very much for your changes and ideas,
> > but as you can see I'm against several of it.
> > The main reason is why dig into the driver model in a such way?
> > It will complicate strucutre exporting too much.
>
> Because it is the standard for implementing devies and drivers in
> linux kernel. You need to explain about exporting structure since the
> rest of the system seem to be doing just fine.
>
> > Existing locking/refcnt schema is very flexible and allows device
> > manipulation while it "holds" the reference counter,
>
> It is also racy and buggy.
It is not.
You pointed an error, and it will be fixed after I think about it some
more,
the problem is that device model[which is not the main part of the w1
system]
is interfere to the existing locking schema [which is quite big and
allows very flexible object manipulation], and you suggest to almost
completely
replace one with another.
With such changes how to increment slave's usage counter? module_get
(w1_family)?

> > and it will not be possible if one just blindly gets/puts module's
> > refcnt.
>
> Only wire.ko is pinned. You are still free to remove family drivers or
> master drivers (or killing their objects somehow). It is only core
> that is pinned to make sure that release functions are available when
> object finally goes away.

If we remove slave deivice, we must be sure it's object is freed
when appripriate kobject is released.
The same is with family itself.

No, we need either replace all locking with device driver model,
or properly operate with existing one.
I will investigate it deeply, I'm sure will will find the best
solution.

> > Object has reference counter which is incremented and decremented when
> > object is in use, not the whole module reference counter,
> > one may remove and add separate objects without knowledge of
> > what family or bus master driver handles that.
>
> > Your changes mix low-level driver logic with w1 core.
> > You have removed netlink notification and replace it with hotplug,
> > but it can not be used for systems without shell userspace support.
>
> kobject_uevent does not requere a sehll account.
Exactly as existing netlink notifications, which allows
to bring not only simple predefined actions, but any information you
like.
As you said, we can not fit all existing w1 devices into very confined
limits, so hotplug events which are add/remove can not solve the
whole notification problem.
Simple notification mechanism that was created for w1 is only first
step,
of course it can be replaced with hotplug notification,
but only it, if we want, and I strongly believe we will want, to extend
it a bit we will fail with current hotplug approach.

--
        Evgeniy Polyakov

Crash is better than data corruption -- Arthur Grabowski

signature.asc (196 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [RFC/PATCH 0/22] W1: sysfs, lifetime and other fixes

Evgeniy Polyakov
On Thu, 2005-04-21 at 11:09 -0500, Dmitry Torokhov wrote:

> One more thing...
>
> On 4/21/05, Evgeniy Polyakov <[hidden email]> wrote:
> > On Thu, 2005-04-21 at 02:07 -0500, Dmitry Torokhov wrote:
> >
> > > w1-master-drop-attrs.patch
> > >    Get rid of unneeded master device attributes:
> > >    - 'pointer' and 'attempts' are meaningless for userspace;
> > >    - information provided by 'slaves' and 'slave_count' can be gathered
> > >      from other sysfs bits;
> > >    - w1_slave_found has to be rearranged now that slave_count field is gone.
> >
> > attempts is usefull for broken lines.
>
> It simply increments with every search i.e. every 10 secondsby default
> and does not provide indication of the quality of the wire.
When slaves can not be found until several attempts, it means line
is broken, how many time existing slave appeared/dissapeared during
/sys/bus/w1/devices/w1_master1/attempts says about link quality.

--
        Evgeniy Polyakov

Crash is better than data corruption -- Arthur Grabowski

signature.asc (196 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [RFC/PATCH 0/22] W1: sysfs, lifetime and other fixes

Dmitry Torokhov-4
In reply to this post by Evgeniy Polyakov
On 4/25/05, Evgeniy Polyakov <[hidden email]> wrote:

> On Thu, 2005-04-21 at 09:31 -0500, Dmitry Torokhov wrote:
> >
> > OK, that is what I am aying. But why do you need that attribute with
> > variable name and a bin attribute that is not really bin but just a
> > dump for all kind of data (looks like debug one).
>
> bin attribute was created for lm_sensors scripts format - it only caches
> read value.
> I think there might be only 2 "must have" methods - read and write.
> I plan to implement them using connector, so probably they will go away
> completely.
...
> > You will not be able to cram all 1-wire devices into unified
> > interface. You will need to build classes on top of it and you might
> > use connector (I am not sure) bit not on w1 bus level.
> > ...
>
> connector allows to have different objects inside one netlink group,
> so it will use it in that way.
> I think only two w1 methods must exist - read and write,
> and they must follow protocol, defined in family driver.

No, I think there should not be any "must have" methods on w1_bus
level. What you really need (and this needs to be coordinated with
other sensors people) is a "sensors" class hierarchy that will define
classes like "temperature sensor", "fan", "vid", etc. Then your w1
family drivers, when bound to a slave, will create needed class
devices. i2c drivers will do the same, and your superio, and I'll be
able to change i8k driver just for kicks. Then your usespace would not
care what _bus_ a particular sensor is sittign on and will be
presented with a unified interface. Look at your NIC example -
userspace does not care if NIC is sitting on a PCI, ISA, PCMCIA or USB
bus - it's all the same. And your classes can use netlink as a
transport mechanism - fine, why shouldn't they... But it will be
available for entire kernel, not only w1 bus..

Once again, bus code is not the right level to define interface with
userspace. There just way too many different devices acn be connected
to the same bus. You need to separate them into classes andd efine
interface tfor a class. And class does not have to be confined to a
signe bus, it can span across several buses, providing unified
interface to a group of similar objects.

...
> If I understand you correctly, "lifetime rules" are implemented in a
> following way:
> when object is created it has 0 refcnt, each access increments it and
> must
> decrements when access is finished.

No, not each access (well, depending on what you mean by access).
Normally, when you create an object you set it's refcount to 1 (becase
there is 1 owner - you). Evry time you pass that object to another
thread of execution (process) you need to increment reference count
and every time you donr with using object you decrement reference
count. Last user needs to destroy the object. See
Documentation/kref.txt

> > >
> > > > w1-bus-ops.patch
> > > >    Cleanup bus operations code:
> > > >    - have bus operatiions accept w1_master instead of unsigned long and
> > > >      drop data field from w1_bus_master so the structure can be statically
> > > >      allocated by driver implementing it;
> > > >    - rename w1_bus_master to w1_bus_ops to avoid confusion with w1_master;
> > > >    - separate master registering and allocation so drivers can setup proper
> > > >      link between private data and master and set useable master's name.
> > >
> > > I strongly object against such changes.
> > > 1. w1 was designed in the way that w1 bus master drivers do not
> > > know about other w1 world. It is very simple and very low-level
> > > abstraction,
> > > that only understands how to do low-level functions. It is not needed
> > > do know about w1_master structure and even about it's existence.
> >
> > Well, it does need to know about w1_bus_master structure, which is
> > pretty much the same. And it allows having static bus_ops allocation
> > and removes need for casting from unsigned longs...
>
> w1_bus_master structure is low-level physical operations.
> It _completely_ does not know abou logical links to other
> w1 core objects, Why _bus_ driver should know about logical
> objects on top of it? That is why they are separate.
> Even if they are not too different (and actually they _are_ different)
> from your point of view, they differ in abstration model.
> Bus master driver - is like NIC driver, it does not know about the rest
> of
> the network stack, but you want it to have all info about neighbours,
> routes
> and so on...
>

Ok, look at the drivers implementiog NICs - struct netdevice. The
define open() and close() methods, but they also know a little bit
about netdevice, like how to get rivate data (netdev_priv) and some
other stuff. That is exactly what I have done WRT  w1_master and
w1_bus_master. Again, this allows to have w1_bus_master (w1_bus_ops)
statically allocated and not piggy-back it to w1_master memory
allocation. And we not have better type-safety because we don't pass
unsigned longs and up/down cast them everywhere. And you don't need to
"search" for a master device using "data" as a cookie. If you want we
can have w1_master_priv to access w1_master->priv instead of
referencing it directly.

> > > 3. You broke netlink allocation routing - it may fail and it is not
> > > fatal.
> >
> > Because it is going away in later patcehs ;)
>
> This is wrong - netlink notification is used and will be moved to
> connector
> interface later.

But not at the w1_bus level, please.

> > >
> > > > w1-drop-netlink.patch
> > > >    Drop custom-made hotplug over netlink notification from w1 core.
> > > >    Standard hotplug mechanism should work just fine (patch will follow).
> > >
> > > netlink notification was not created for hotplug.
> > > Also I'm against w1 hotplug support, since hotplug is quite rarely used
> > > in embedded platforms where the majority of w1 devices live.
> >
> > kobject_uevent does notification over netlink so I do not understand
> > why custom approach is better. You don't really need to use script.
>
> kobject is too big for that. It is used exactly for kobject changes.
> Custom netlink notifications are created for w1 specific objects
> and it's control. You can not control w1 slaves/masters using hotplug.

Hotplug is a unified mechanism for notifying userspace of new devices.
Not only on w1 bus but everywhere. Stop inventing solutions useable
for w1 bus only. And if I for some reason don't want to use netlink -
well I can with gebneric hotplug solution but not with your.

> > > > w1-drop-control-thread.patch
> > > >    Drop control thread from w1 core, whatever it does can also be done in
> > > >    the context of w1_remove_master_device. Also, pin the module when
> > > >    registering new master device to make sure that w1 core is not unloaded
> > > >    until last device is gone. This simplifies logic a lot.
> > >
> > > Why do you think master can be removed in safe context only?
> >
> > Can you show me example where you remove master from an interrupt
> > context or a tasklet? I doubt you will ever see one.
>
> As I said I have feature requests for ability to export w1 devices
> outside w1 core.
> Probably it is due to it's private non-GPL usage, so it is not created,
> but it is usefull feature actually and we can not know what will
> happen in what context when we export master/slave devices.

Look at your present w1_remove_master_device. It sleeps. Sop there is
no need for a separate thread, callers must be able to sleep anyway.

> w1 slaves can be found on the bus without search method reaction
> implemented in it's asic, btw.
> And it is _very_ usefull to add/remove slaves using external command but
> not using
> automatic detection in search methods.

But the request for that will come from userspace with is perfectly
able to sleep. You are over-engineering and making kernel code
unnecessarily complex without thinking it through.

> > > I have feature requests for both adding/removing and exporting
> > > master devices and slaves to the external world.
> >
> > External as in userspace? It (user thread) can wait just fine...
>
> Exporting them into other kernel modules.
> We do not know in what context that structures will be used there.

Why other kernel modules would be interested in raw access w1_slaves?
C are to give an example?
 
> > > Control thread is also the place in which we kick all devices
> > > when we need it, but not only when we need to remove w1 core module.
> >
> > Define kicking for me please...
>
> Removing master device using netlink command for exaple.

Wrong level. You need to start with device implementing w1_bus_master
(w1_bus_ops) to remova dangling data structures). Easiest way I think
it have the driver compiled as a module and remove it altogether - why
keep it if you don't need master?

> > > > w1-master-attr-cleanup.patch
> > > >    Clean-up master attribute implementation:
> > > >    - drop unnecessary "w1_master" prefix from attribute names;
> > > >    - do not acquire master->mutex when accessing attributes;
> > > >    - move attribute code "closer" to the rest of master code.
> > >
> > > Ok, but slave count and slaves attributes itself requires that mutex.
> >
> > They are gone. You can scan sysfs to get your slaves and count. Kernel
> > does not need to do that.
>
> I created that files exactly for reaason to not scan the tree, but only
> read one [two] files :)

The less code in kernel that produces data availavle elsewhere the better :)

> > > > w1-master-cleanup.patch
> > > >    Clean-up master device implementation:
> > > >    - get rid of separate refcount, rely on driver model to enforce
> > > >      lifetime rules;
> > > >    - use atomic to generate unique master IDs;
> > > >    - drop unused fields.
> > >
> > > That patch is very broken.
> > > I completely against it:
> > > 1. it breaks process logic - searching can be interrupted and stopped,
> > > thread will exit on signals.
> >
> > Interrupted/stopped from userspace?
>
> Your loop waits only until interrupt happens - it can be delivered from
> anywhere.

No, only root can kill kernel therad so it is pretty safe. And hey, if
a thread goes mad maybe it's a good thing that it can be killed.
 

> > > > w1-family-is-driver.patch
> > > >    Convert family into proper device-model drivers:
> > > >    - embed driver structure into w1_family and register with the
> > > >      driver core;
> > > >    - do not try to manually bind slaves to familes, leave it to
> > > >      the driver core;
> > > >    - fold w1_family.c into w1.c
> > >
> > > Why do you break it?
> > > They were separated intentionally - it simplifies code review,
> > > it is completely different logical models, family processing
> > > is not hte same as slave.
> >
> > Masters, slaves and families are all objects of W1 kernel bus. With
> > cutting a bunch of fat from family code it does not make sense to keep
> > them separate anymore.
>
> What you do is exactly the same that already exist, but using other
> model.
> No need to dig into device model in a such way.
...
> the problem is that device model[which is not the main part of the w1
> system]
> is interfere to the existing locking schema [which is quite big and
> allows very flexible object manipulation], and you suggest to almost
> completely
> replace one with another.

device model is here to _use_ it. It already implements bunch of stuff
you have to re-implement if you do it "your own way" and it is already
debugged much better than your solution. And if there is a problem
with driver core implementation - well, more people are looking at it
and are more likely to discover a problem and offer a solution.

I do not understand why you are against full integration with device
model - it does simplifies and unifies the code.

> With such changes how to increment slave's usage counter? module_get
> (w1_family)?

Actually if you need it it would be get_device(&w1_slave->dev). And if
you need pin family object you would get
get_driver(&w1_family->driver). But I don't think you will needed it.
Actually, at some point I had w1_family_get() implemented as a wrapper
to get_driver() but since it does not seem to be needed I dropped it.
 

> > > and it will not be possible if one just blindly gets/puts module's
> > > refcnt.
> >
> > Only wire.ko is pinned. You are still free to remove family drivers or
> > master drivers (or killing their objects somehow). It is only core
> > that is pinned to make sure that release functions are available when
> > object finally goes away.
>
> If we remove slave deivice, we must be sure it's object is freed
> when appripriate kobject is released.
> The same is with family itself.

Right.

> No, we need either replace all locking with device driver model,
> or properly operate with existing one.

And it was done. Well, not locking, but pinning of the objects. It was
all moved to device model. It may not be visible ;) but it is there.
Locking is still there as well.

--
Dmitry
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Reply | Threaded
Open this post in threaded view
|

Re: [RFC/PATCH 0/22] W1: sysfs, lifetime and other fixes

Dmitry Torokhov-4
In reply to this post by Evgeniy Polyakov
On 4/25/05, Evgeniy Polyakov <[hidden email]> wrote:

> On Thu, 2005-04-21 at 11:09 -0500, Dmitry Torokhov wrote:
> > One more thing...
> >
> > On 4/21/05, Evgeniy Polyakov <[hidden email]> wrote:
> > > On Thu, 2005-04-21 at 02:07 -0500, Dmitry Torokhov wrote:
> > >
> > > > w1-master-drop-attrs.patch
> > > >    Get rid of unneeded master device attributes:
> > > >    - 'pointer' and 'attempts' are meaningless for userspace;
> > > >    - information provided by 'slaves' and 'slave_count' can be gathered
> > > >      from other sysfs bits;
> > > >    - w1_slave_found has to be rearranged now that slave_count field is gone.
> > >
> > > attempts is usefull for broken lines.
> >
> > It simply increments with every search i.e. every 10 secondsby default
> > and does not provide indication of the quality of the wire.
>
> When slaves can not be found until several attempts, it means line
> is broken, how many time existing slave appeared/dissapeared during
> /sys/bus/w1/devices/w1_master1/attempts says about link quality.

Heh, if you are debugging all you need is "date" command to see how
quickly slave appears. If you want to keep statistics your program
need to listen to hotpug events for master and slaves and count these.
I do not see a reason for a counter that simply increments every 10
seconds.

--
Dmitry
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Reply | Threaded
Open this post in threaded view
|

Re: [RFC/PATCH 0/22] W1: sysfs, lifetime and other fixes

Evgeniy Polyakov
In reply to this post by Dmitry Torokhov-4
On Mon, 25 Apr 2005 11:32:14 -0500
Dmitry Torokhov <[hidden email]> wrote:

> On 4/25/05, Evgeniy Polyakov <[hidden email]> wrote:
> > On Thu, 2005-04-21 at 09:31 -0500, Dmitry Torokhov wrote:
> > >
> > > OK, that is what I am aying. But why do you need that attribute with
> > > variable name and a bin attribute that is not really bin but just a
> > > dump for all kind of data (looks like debug one).
> >
> > bin attribute was created for lm_sensors scripts format - it only caches
> > read value.
> > I think there might be only 2 "must have" methods - read and write.
> > I plan to implement them using connector, so probably they will go away
> > completely.
> ...
> > > You will not be able to cram all 1-wire devices into unified
> > > interface. You will need to build classes on top of it and you might
> > > use connector (I am not sure) bit not on w1 bus level.
> > > ...
> >
> > connector allows to have different objects inside one netlink group,
> > so it will use it in that way.
> > I think only two w1 methods must exist - read and write,
> > and they must follow protocol, defined in family driver.
>
> No, I think there should not be any "must have" methods on w1_bus
> level. What you really need (and this needs to be coordinated with
> other sensors people) is a "sensors" class hierarchy that will define
> classes like "temperature sensor", "fan", "vid", etc. Then your w1
> family drivers, when bound to a slave, will create needed class
> devices. i2c drivers will do the same, and your superio, and I'll be
> able to change i8k driver just for kicks. Then your usespace would not
> care what _bus_ a particular sensor is sittign on and will be
> presented with a unified interface. Look at your NIC example -
> userspace does not care if NIC is sitting on a PCI, ISA, PCMCIA or USB
> bus - it's all the same. And your classes can use netlink as a
> transport mechanism - fine, why shouldn't they... But it will be
> available for entire kernel, not only w1 bus..
>
> Once again, bus code is not the right level to define interface with
> userspace. There just way too many different devices acn be connected
> to the same bus. You need to separate them into classes andd efine
> interface tfor a class. And class does not have to be confined to a
> signe bus, it can span across several buses, providing unified
> interface to a group of similar objects.

Heh, that would be nice, but that requires a _lot_ of changes,
so it is only theory, at least for now.

Reality is that w1 devices must be managed from userspace,
it can be implemented using 2 system calls - read and write,
while they are implemented using sysfs and thus device/driver model,
it requires different locking, which may race - and I will fix it.
Other implementation could be connector's calls - they
can be used in i2c core and _are_ used in superio core,
it will put it into w1 if it will be commited in.
Probably implementation will be similar to superio one...

> ...
> > If I understand you correctly, "lifetime rules" are implemented in a
> > following way:
> > when object is created it has 0 refcnt, each access increments it and
> > must
> > decrements when access is finished.
>
> No, not each access (well, depending on what you mean by access).
> Normally, when you create an object you set it's refcount to 1 (becase
> there is 1 owner - you). Evry time you pass that object to another
> thread of execution (process) you need to increment reference count
> and every time you donr with using object you decrement reference
> count. Last user needs to destroy the object. See
> Documentation/kref.txt

The behaviour exactly the same, but implementation always
waits on remove when others finish. No problem here.
 

> > > >
> > > > > w1-bus-ops.patch
> > > > >    Cleanup bus operations code:
> > > > >    - have bus operatiions accept w1_master instead of unsigned long and
> > > > >      drop data field from w1_bus_master so the structure can be statically
> > > > >      allocated by driver implementing it;
> > > > >    - rename w1_bus_master to w1_bus_ops to avoid confusion with w1_master;
> > > > >    - separate master registering and allocation so drivers can setup proper
> > > > >      link between private data and master and set useable master's name.
> > > >
> > > > I strongly object against such changes.
> > > > 1. w1 was designed in the way that w1 bus master drivers do not
> > > > know about other w1 world. It is very simple and very low-level
> > > > abstraction,
> > > > that only understands how to do low-level functions. It is not needed
> > > > do know about w1_master structure and even about it's existence.
> > >
> > > Well, it does need to know about w1_bus_master structure, which is
> > > pretty much the same. And it allows having static bus_ops allocation
> > > and removes need for casting from unsigned longs...
> >
> > w1_bus_master structure is low-level physical operations.
> > It _completely_ does not know abou logical links to other
> > w1 core objects, Why _bus_ driver should know about logical
> > objects on top of it? That is why they are separate.
> > Even if they are not too different (and actually they _are_ different)
> > from your point of view, they differ in abstration model.
> > Bus master driver - is like NIC driver, it does not know about the rest
> > of
> > the network stack, but you want it to have all info about neighbours,
> > routes
> > and so on...
> >
>
> Ok, look at the drivers implementiog NICs - struct netdevice. The
> define open() and close() methods, but they also know a little bit
> about netdevice, like how to get rivate data (netdev_priv) and some
> other stuff. That is exactly what I have done WRT  w1_master and
> w1_bus_master. Again, this allows to have w1_bus_master (w1_bus_ops)
> statically allocated and not piggy-back it to w1_master memory
> allocation. And we not have better type-safety because we don't pass
> unsigned longs and up/down cast them everywhere. And you don't need to
> "search" for a master device using "data" as a cookie. If you want we
> can have w1_master_priv to access w1_master->priv instead of
> referencing it directly.

Neither network driver knows about how skb are used.
NIC's device driver does not know about neighbours, routes, filters and so on.
It only moves data to the physical layer, how data is managed over it
is not in driver's competence.
w1 bus master driver only knows how to move data to the wire,
it does not know about how that data was moved there, from/to which
slave it is originated and so on.

Bus master driver is low-level part that lives in it's own driver,
while w1 core only knows about higher-layer w1_master objects.
Like network stack operates on device driver (in our case it is
w1 bus master driver) through dev->something(), and we have
here dev->bus_master->something(), network core operates
over routes using dst/rt - w1 has dev->slist and so on.
You may say that why call through dev->bus_master->something()
when we may call dev->something(), I can say that
w1_master itself is like a stack in network -
it knows about it's routes (slave devices),
it knows about it's low-level driver (bus master),
it has proper locking (xmit lock in network).

Bus master driver is absolutely separate object
from w1_master structure and logical object itself.

> > > > 3. You broke netlink allocation routing - it may fail and it is not
> > > > fatal.
> > >
> > > Because it is going away in later patcehs ;)
> >
> > This is wrong - netlink notification is used and will be moved to
> > connector
> > interface later.
>
> But not at the w1_bus level, please.

How do you suppose to notify about alarm condition?
Not from bus layer?
Who does send "link is down" messages? It is not the same
as device is present and found, it like "w1 device has something to read".
For example w1 ds18s20 thermal sensor may send information
about "85 degree problem" - it is read when sensor did not
finished temerature transformation yet, how non-bus layer may know about it?

Your idea about classes over the various buses is good,
but unfortunately it is utopia, al least for now,
so let's create w1 core layer (which, btw, is not only bus,
which is managed by bus master dirver, but also some logic over it,
one may call it w1 stack, stack can send such a messages, doesn't it)?

> > > >
> > > > > w1-drop-netlink.patch
> > > > >    Drop custom-made hotplug over netlink notification from w1 core.
> > > > >    Standard hotplug mechanism should work just fine (patch will follow).
> > > >
> > > > netlink notification was not created for hotplug.
> > > > Also I'm against w1 hotplug support, since hotplug is quite rarely used
> > > > in embedded platforms where the majority of w1 devices live.
> > >
> > > kobject_uevent does notification over netlink so I do not understand
> > > why custom approach is better. You don't really need to use script.
> >
> > kobject is too big for that. It is used exactly for kobject changes.
> > Custom netlink notifications are created for w1 specific objects
> > and it's control. You can not control w1 slaves/masters using hotplug.
>
> Hotplug is a unified mechanism for notifying userspace of new devices.
> Not only on w1 bus but everywhere. Stop inventing solutions useable
> for w1 bus only. And if I for some reason don't want to use netlink -
> well I can with gebneric hotplug solution but not with your.

The problem is that there is not only exist/not events, that may be sent,
as I said in previous mail.
You suggest to limit it in that way - this is wrong.
Feel free to _add_ hotplug support, but not replace notification with it.

> > > > > w1-drop-control-thread.patch
> > > > >    Drop control thread from w1 core, whatever it does can also be done in
> > > > >    the context of w1_remove_master_device. Also, pin the module when
> > > > >    registering new master device to make sure that w1 core is not unloaded
> > > > >    until last device is gone. This simplifies logic a lot.
> > > >
> > > > Why do you think master can be removed in safe context only?
> > >
> > > Can you show me example where you remove master from an interrupt
> > > context or a tasklet? I doubt you will ever see one.
> >
> > As I said I have feature requests for ability to export w1 devices
> > outside w1 core.
> > Probably it is due to it's private non-GPL usage, so it is not created,
> > but it is usefull feature actually and we can not know what will
> > happen in what context when we export master/slave devices.
>
> Look at your present w1_remove_master_device. It sleeps. Sop there is
> no need for a separate thread, callers must be able to sleep anyway.


That is exactly why control thread exists - to manage sleepable operations!


> > w1 slaves can be found on the bus without search method reaction
> > implemented in it's asic, btw.
> > And it is _very_ usefull to add/remove slaves using external command but
> > not using
> > automatic detection in search methods.
>
> But the request for that will come from userspace with is perfectly
> able to sleep. You are over-engineering and making kernel code
> unnecessarily complex without thinking it through.

Connector's requests come from BH context.
Only module unloading comes with good context  (in our case we do not
get read/write operations),
but I do not want to limit the system only for that kinds of events.

> > > > I have feature requests for both adding/removing and exporting
> > > > master devices and slaves to the external world.
> > >
> > > External as in userspace? It (user thread) can wait just fine...
> >
> > Exporting them into other kernel modules.
> > We do not know in what context that structures will be used there.
>
> Why other kernel modules would be interested in raw access w1_slaves?
> C are to give an example?

Concider w1 battery slave device, which exports unified interface to
the userspace.
It requires access that can be obtained from different generic module
[like existing kernelspace/userspace protocol,
proper device files and so on],
which will implement only read/write interface.
It's read method will get_slave_by_something() and read it's data
or do something else, then generic module will use that data.

Generic buttery monitor will not scan w1 bus for it's devices,
since it even does not know about w1, it only understands reading/writing
operations.

Placing operations needed for that module into w1_bat.c and hope
that noone will implement new battery monitor subsystem
or will adopt all battery users to use only that nterface is naive.

There is at least _possibility_ to create such a model
with existing design [and it is _very_ easy],
but your changes broke it, although it could be changed...

> > > > Control thread is also the place in which we kick all devices
> > > > when we need it, but not only when we need to remove w1 core module.
> > >
> > > Define kicking for me please...
> >
> > Removing master device using netlink command for exaple.
>
> Wrong level. You need to start with device implementing w1_bus_master
> (w1_bus_ops) to remova dangling data structures). Easiest way I think
> it have the driver compiled as a module and remove it altogether - why
> keep it if you don't need master?

1. master can be removed by command. It is not in process' context.
Current thread can remove only all object at once, but nevertheless
I do not want to limit it.
2. control thread can add/remove new slaves by request.
Usefullness of that ability was pointed in my previous e-mail,
but you skipped that part.

> > > > > w1-master-attr-cleanup.patch
> > > > >    Clean-up master attribute implementation:
> > > > >    - drop unnecessary "w1_master" prefix from attribute names;
> > > > >    - do not acquire master->mutex when accessing attributes;
> > > > >    - move attribute code "closer" to the rest of master code.
> > > >
> > > > Ok, but slave count and slaves attributes itself requires that mutex.
> > >
> > > They are gone. You can scan sysfs to get your slaves and count. Kernel
> > > does not need to do that.
> >
> > I created that files exactly for reaason to not scan the tree, but only
> > read one [two] files :)
>
> The less code in kernel that produces data availavle elsewhere the better :)

Does 3 lines of code for reading slave's names is too big
price for not scanning the whole /sys/bus/w1/w1_master1/ directory?


> > > > > w1-master-cleanup.patch
> > > > >    Clean-up master device implementation:
> > > > >    - get rid of separate refcount, rely on driver model to enforce
> > > > >      lifetime rules;
> > > > >    - use atomic to generate unique master IDs;
> > > > >    - drop unused fields.
> > > >
> > > > That patch is very broken.
> > > > I completely against it:
> > > > 1. it breaks process logic - searching can be interrupted and stopped,
> > > > thread will exit on signals.
> > >
> > > Interrupted/stopped from userspace?
> >
> > Your loop waits only until interrupt happens - it can be delivered from
> > anywhere.
>
> No, only root can kill kernel therad so it is pretty safe. And hey, if
> a thread goes mad maybe it's a good thing that it can be killed.

And if it exits - it breaks the logic - user can not know the state
of the master device when thread is exited, but module was not removed
by request.

> > > > > w1-family-is-driver.patch
> > > > >    Convert family into proper device-model drivers:
> > > > >    - embed driver structure into w1_family and register with the
> > > > >      driver core;
> > > > >    - do not try to manually bind slaves to familes, leave it to
> > > > >      the driver core;
> > > > >    - fold w1_family.c into w1.c
> > > >
> > > > Why do you break it?
> > > > They were separated intentionally - it simplifies code review,
> > > > it is completely different logical models, family processing
> > > > is not hte same as slave.
> > >
> > > Masters, slaves and families are all objects of W1 kernel bus. With
> > > cutting a bunch of fat from family code it does not make sense to keep
> > > them separate anymore.
> >
> > What you do is exactly the same that already exist, but using other
> > model.
> > No need to dig into device model in a such way.
> ...
> > the problem is that device model[which is not the main part of the w1
> > system]
> > is interfere to the existing locking schema [which is quite big and
> > allows very flexible object manipulation], and you suggest to almost
> > completely
> > replace one with another.
>
> device model is here to _use_ it. It already implements bunch of stuff
> you have to re-implement if you do it "your own way" and it is already
> debugged much better than your solution. And if there is a problem
> with driver core implementation - well, more people are looking at it
> and are more likely to discover a problem and offer a solution.
>
> I do not understand why you are against full integration with device
> model - it does simplifies and unifies the code.

Because it is not needed here - and even if it could be integrated
more closer - your changes broke too many special design cases,
which are not acceptible.
 
> > With such changes how to increment slave's usage counter? module_get
> > (w1_family)?
>
> Actually if you need it it would be get_device(&w1_slave->dev). And if
> you need pin family object you would get
> get_driver(&w1_family->driver). But I don't think you will needed it.
> Actually, at some point I had w1_family_get() implemented as a wrapper
> to get_driver() but since it does not seem to be needed I dropped it.

We can not do it in that way.
1. low-level bus master code differs from what w1_master is.
2. family itself is different from slave object.

Having that we need to create w1_master/w1_family device/driver model
locking + w1_bus_master/w1_slave locking or move them to device/driver
model too. Why it is needed I still do not see, there is
proper locking schema, which is broken in the places where
existing model touches device/driver one, and it will be fixed.

> > > > and it will not be possible if one just blindly gets/puts module's
> > > > refcnt.
> > >
> > > Only wire.ko is pinned. You are still free to remove family drivers or
> > > master drivers (or killing their objects somehow). It is only core
> > > that is pinned to make sure that release functions are available when
> > > object finally goes away.
> >
> > If we remove slave deivice, we must be sure it's object is freed
> > when appripriate kobject is released.
> > The same is with family itself.
>
> Right.
>
> > No, we need either replace all locking with device driver model,
> > or properly operate with existing one.
>
> And it was done. Well, not locking, but pinning of the objects. It was
> all moved to device model. It may not be visible ;) but it is there.
> Locking is still there as well.

We need here either _only_ device/driver locking (locking here and above
is not only locking itself, but usage counters and all corresponding events
too), or mix of the existing schema and device/driver model.
The first approach already has too many issues, which
probably can be resolved, but probably dont, existing mixing
model works except one case, which I will fix.
It will use device/driver model  and it's remove callback
to free resources.


Now I have folowing agenda for w1:
1. wait until existing changes are commited
2. put your's and Adrian's cleanups
3. fix w1 sysfs usage
4. commit 2 and 3
5. implement hotplug using your patch, but not instead of existing notification.
6. commit hotplug changes

I strongly believe that after [3] there will be no moot points.


Thank you.

> --
> Dmitry


        Evgeniy Polyakov

Only failure makes us experts. -- Theo de Raadt
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Reply | Threaded
Open this post in threaded view
|

Re: [RFC/PATCH 0/22] W1: sysfs, lifetime and other fixes

Evgeniy Polyakov
In reply to this post by Dmitry Torokhov-4
On Mon, 25 Apr 2005 11:36:05 -0500
Dmitry Torokhov <[hidden email]> wrote:

> On 4/25/05, Evgeniy Polyakov <[hidden email]> wrote:
> > On Thu, 2005-04-21 at 11:09 -0500, Dmitry Torokhov wrote:
> > > One more thing...
> > >
> > > On 4/21/05, Evgeniy Polyakov <[hidden email]> wrote:
> > > > On Thu, 2005-04-21 at 02:07 -0500, Dmitry Torokhov wrote:
> > > >
> > > > > w1-master-drop-attrs.patch
> > > > >    Get rid of unneeded master device attributes:
> > > > >    - 'pointer' and 'attempts' are meaningless for userspace;
> > > > >    - information provided by 'slaves' and 'slave_count' can be gathered
> > > > >      from other sysfs bits;
> > > > >    - w1_slave_found has to be rearranged now that slave_count field is gone.
> > > >
> > > > attempts is usefull for broken lines.
> > >
> > > It simply increments with every search i.e. every 10 secondsby default
> > > and does not provide indication of the quality of the wire.
> >
> > When slaves can not be found until several attempts, it means line
> > is broken, how many time existing slave appeared/dissapeared during
> > /sys/bus/w1/devices/w1_master1/attempts says about link quality.
>
> Heh, if you are debugging all you need is "date" command to see how
> quickly slave appears. If you want to keep statistics your program
> need to listen to hotpug events for master and slaves and count these.
> I do not see a reason for a counter that simply increments every 10
> seconds.

It is not counter but attempt does matter, one of course can simply
calculate attempt number using timeout value, but that requires
timeout knowledge, which is not accessible after driver is loaded.

> --
> Dmitry


        Evgeniy Polyakov

Only failure makes us experts. -- Theo de Raadt
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Reply | Threaded
Open this post in threaded view
|

Re: [RFC/PATCH 0/22] W1: sysfs, lifetime and other fixes

Evgeniy Polyakov
In reply to this post by Dmitry Torokhov-4
While thinking about locking schema
with respect to sysfs files I recalled,
why I implemented such a logic -
now one can _always_ remove _any_ module
[corresponding object is removed from accessible
pathes and waits untill all exsting users are gone],
which is very good - I really like it in networking model,
while with whole device driver model
if we will read device's file very quickly
in several threads we may end up not unloading it at all.

So decision is simple from the first point of view -
just remove appropriate objects from accessible pathes
and free them from finall callbacks [device's remove method].

        Evgeniy Polyakov

Only failure makes us experts. -- Theo de Raadt
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Reply | Threaded
Open this post in threaded view
|

Re: [RFC/PATCH 0/22] W1: sysfs, lifetime and other fixes

Dmitry Torokhov-4
On 4/25/05, Evgeniy Polyakov <[hidden email]> wrote:

> While thinking about locking schema
> with respect to sysfs files I recalled,
> why I implemented such a logic -
> now one can _always_ remove _any_ module
> [corresponding object is removed from accessible
> pathes and waits untill all exsting users are gone],
> which is very good - I really like it in networking model,
> while with whole device driver model
> if we will read device's file very quickly
> in several threads we may end up not unloading it at all.

I am sorrry, that is complete bull*. sysfs also allows removing
modules at an arbitrary time (and usually without annoying "waiting
for refcount" at that)... You just seem to not understand how driver
code works, thus the need of inventing your own schema.

BTW, I am looking at the connector code ATM and I am just amazed at
all wied refounting stuff that is going on there. what a single
actomic_dec_and_test() call without checkng reurn vaue is supposed to
do again?

--
Dmitry
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Reply | Threaded
Open this post in threaded view
|

Re: [RFC/PATCH 0/22] W1: sysfs, lifetime and other fixes

Dmitry Torokhov-4
In reply to this post by Evgeniy Polyakov
On 4/25/05, Evgeniy Polyakov <[hidden email]> wrote:
>
> How do you suppose to notify about alarm condition?
> Not from bus layer?
> Who does send "link is down" messages? It is not the same
> as device is present and found, it like "w1 device has something to read".
> For example w1 ds18s20 thermal sensor may send information
> about "85 degree problem" - it is read when sensor did not
> finished temerature transformation yet, how non-bus layer may know about it?

Classes. And return -EAGAIN when reading is not ready (for some reason).

>
> Your idea about classes over the various buses is good,
> but unfortunately it is utopia, al least for now,

It is not an utopia, it is Linux kernel and we do not need to get in
unfinished solution.

> so let's create w1 core layer (which, btw, is not only bus,
> which is managed by bus master dirver, but also some logic over it,
> one may call it w1 stack, stack can send such a messages, doesn't it)?
>

Heh ;) Let's concentrate on sensors stack, please...

> > >
> > > As I said I have feature requests for ability to export w1 devices
> > > outside w1 core.
> > > Probably it is due to it's private non-GPL usage, so it is not created,
> > > but it is usefull feature actually and we can not know what will
> > > happen in what context when we export master/slave devices.
> >
> > Look at your present w1_remove_master_device. It sleeps. Sop there is
> > no need for a separate thread, callers must be able to sleep anyway.
>
> That is exactly why control thread exists - to manage sleepable operations!

Read what I wrote again - if caller is sleeping on thread's completion
there is really no need for a thread. The caller can do whetever
thread does. Only if caller would send request and continue one woudl
need to set up a thread.

> > > w1 slaves can be found on the bus without search method reaction
> > > implemented in it's asic, btw.
> > > And it is _very_ usefull to add/remove slaves using external command but
> > > not using
> > > automatic detection in search methods.
> >
> > But the request for that will come from userspace with is perfectly
> > able to sleep. You are over-engineering and making kernel code
> > unnecessarily complex without thinking it through.
>
> Connector's requests come from BH context.
> Only module unloading comes with good context  (in our case we do not
> get read/write operations),
> but I do not want to limit the system only for that kinds of events.
>

And you still have master's thread to manage slaves. Although I don't
quite understand why you would need manual addition of slaves. Either
they speak W1 protocol and will be added automatically, or they don't
speak w1 and then they are not w1 devices.

> > > > > I have feature requests for both adding/removing and exporting
> > > > > master devices and slaves to the external world.
> > > >
> > > > External as in userspace? It (user thread) can wait just fine...
> > >
> > > Exporting them into other kernel modules.
> > > We do not know in what context that structures will be used there.
> >
> > Why other kernel modules would be interested in raw access w1_slaves?
> > C are to give an example?
>
> Concider w1 battery slave device, which exports unified interface to
> the userspace.
> It requires access that can be obtained from different generic module
> [like existing kernelspace/userspace protocol,
> proper device files and so on],
> which will implement only read/write interface.
> It's read method will get_slave_by_something() and read it's data
> or do something else, then generic module will use that data.
>
> Generic buttery monitor will not scan w1 bus for it's devices,
> since it even does not know about w1, it only understands reading/writing
> operations.
>

And here you are thinking of classes again. Writing separate battery
monitor applets for i2c, w1, superio and the rest is not less silly.
You are trying to move them over your ocnnector code. Alternatively
you could have move them to class codes and build netlink notification
on top of them. This way you'd separate buses (physical interface)
with userpsace interfaces and allowed use of different transports.

> >
> > Wrong level. You need to start with device implementing w1_bus_master
> > (w1_bus_ops) to remova dangling data structures). Easiest way I think
> > it have the driver compiled as a module and remove it altogether - why
> > keep it if you don't need master?
>
> 1. master can be removed by command. It is not in process' context.
> Current thread can remove only all object at once, but nevertheless
> I do not want to limit it.

You are also need to remove code controlling physical device presented
as w1_master. The request will go do a different system (module).

> 2. control thread can add/remove new slaves by request.
> Usefullness of that ability was pointed in my previous e-mail,
> but you skipped that part.

I am still missing usefullness of it.

> >
> > The less code in kernel that produces data availavle elsewhere the better :)
>
> Does 3 lines of code for reading slave's names is too big
> price for not scanning the whole /sys/bus/w1/w1_master1/ directory?
>

How often do you use them? While debugging only?
 

> > > > > > w1-master-cleanup.patch
> > > > > >    Clean-up master device implementation:
> > > > > >    - get rid of separate refcount, rely on driver model to enforce
> > > > > >      lifetime rules;
> > > > > >    - use atomic to generate unique master IDs;
> > > > > >    - drop unused fields.
> > > > >
> > > > > That patch is very broken.
> > > > > I completely against it:
> > > > > 1. it breaks process logic - searching can be interrupted and stopped,
> > > > > thread will exit on signals.
> > > >
> > > > Interrupted/stopped from userspace?
> > >
> > > Your loop waits only until interrupt happens - it can be delivered from
> > > anywhere.
> >
> > No, only root can kill kernel therad so it is pretty safe. And hey, if
> > a thread goes mad maybe it's a good thing that it can be killed.
>
> And if it exits - it breaks the logic - user can not know the state
> of the master device when thread is exited, but module was not removed
> by request.
>

I (as a root) have zillion ways to break the system. There is not hing
new. The oint that an ordinary user can't do anything with that
thread.

> >
> > device model is here to _use_ it. It already implements bunch of stuff
> > you have to re-implement if you do it "your own way" and it is already
> > debugged much better than your solution. And if there is a problem
> > with driver core implementation - well, more people are looking at it
> > and are more likely to discover a problem and offer a solution.
> >
> > I do not understand why you are against full integration with device
> > model - it does simplifies and unifies the code.
>
> Because it is not needed here - and even if it could be integrated
> more closer - your changes broke too many special design cases,
> which are not acceptible.
>

Having too many special design cases in otherwise pretty simple bus
indicates that there something wrong with the design.

> > > With such changes how to increment slave's usage counter? module_get
> > > (w1_family)?
> >
> > Actually if you need it it would be get_device(&w1_slave->dev). And if
> > you need pin family object you would get
> > get_driver(&w1_family->driver). But I don't think you will needed it.
> > Actually, at some point I had w1_family_get() implemented as a wrapper
> > to get_driver() but since it does not seem to be needed I dropped it.
>
> We can not do it in that way.
> 1. low-level bus master code differs from what w1_master is.
> 2. family itself is different from slave object.
>
> Having that we need to create w1_master/w1_family device/driver model
> locking + w1_bus_master/w1_slave locking or move them to device/driver
> model too. Why it is needed I still do not see, there is
> proper locking schema, which is broken in the places where
> existing model touches device/driver one, and it will be fixed.
>

You have exactly the same problem with master devices too. What
escapes me is the desire to have 2 separate refcounting for the same
object. Except for ability to introduce 2x more bugs.
 
--
Dmitry
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Reply | Threaded
Open this post in threaded view
|

Re: [RFC/PATCH 0/22] W1: sysfs, lifetime and other fixes

Evgeniy Polyakov
In reply to this post by Dmitry Torokhov-4
On Mon, 2005-04-25 at 15:22 -0500, Dmitry Torokhov wrote:

> On 4/25/05, Evgeniy Polyakov <[hidden email]> wrote:
> > While thinking about locking schema
> > with respect to sysfs files I recalled,
> > why I implemented such a logic -
> > now one can _always_ remove _any_ module
> > [corresponding object is removed from accessible
> > pathes and waits untill all exsting users are gone],
> > which is very good - I really like it in networking model,
> > while with whole device driver model
> > if we will read device's file very quickly
> > in several threads we may end up not unloading it at all.
>
> I am sorrry, that is complete bull*. sysfs also allows removing
> modules at an arbitrary time (and usually without annoying "waiting
> for refcount" at that)... You just seem to not understand how driver
> code works, thus the need of inventing your own schema.
Ok, let's try again - now with explanation,
since it looks like you did not even try to understand what I said.
If you will remove objects from ->remove() callback
you may end up with rmmod being stuck.
Explanation: each read still gets reference counter,
while in rmmod path there is a wait until it is zero.
If there are too many simultaneous reads - even
if each will put reference counter at the end, we still can have
non zero refcnt each time we check it in rmmod path.
That is why object must be removed from accessible pathes
first, and only freed in ->remove() callback.

> BTW, I am looking at the connector code ATM and I am just amazed at
> all wied refounting stuff that is going on there. what a single
> actomic_dec_and_test() call without checkng reurn vaue is supposed to
> do again?

It has explicit barrieres, which guarantees that
there will not be atomic operation vs. non atomic
reordering.

--
        Evgeniy Polyakov

Crash is better than data corruption -- Arthur Grabowski

signature.asc (196 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [RFC/PATCH 0/22] W1: sysfs, lifetime and other fixes

Dmitry Torokhov-2
On Tuesday 26 April 2005 01:43, Evgeniy Polyakov wrote:

> On Mon, 2005-04-25 at 15:22 -0500, Dmitry Torokhov wrote:
> > On 4/25/05, Evgeniy Polyakov <[hidden email]> wrote:
> > > While thinking about locking schema
> > > with respect to sysfs files I recalled,
> > > why I implemented such a logic -
> > > now one can _always_ remove _any_ module
> > > [corresponding object is removed from accessible
> > > pathes and waits untill all exsting users are gone],
> > > which is very good - I really like it in networking model,
> > > while with whole device driver model
> > > if we will read device's file very quickly
> > > in several threads we may end up not unloading it at all.
> >
> > I am sorrry, that is complete bull*. sysfs also allows removing
> > modules at an arbitrary time (and usually without annoying "waiting
> > for refcount" at that)... You just seem to not understand how driver
> > code works, thus the need of inventing your own schema.
>
> Ok, let's try again - now with explanation,
> since it looks like you did not even try to understand what I said.
> If you will remove objects from ->remove() callback
> you may end up with rmmod being stuck.
> Explanation: each read still gets reference counter,
> while in rmmod path there is a wait until it is zero.
> If there are too many simultaneous reads - even
> if each will put reference counter at the end, we still can have
> non zero refcnt each time we check it in rmmod path.
> That is why object must be removed from accessible pathes
> first, and only freed in ->remove() callback.

Please try to read the code. device_unregister and kobject_unregister
do not require caller to wait for the last reference to drop, they rely
on availability of release method to clean up the object when last user
is gone. driver_unregister is blocking (like your family code) but
teardown takes no time. If driver is in use (attributes are open) then
module refcount is non-zero and instead of (possibly endless) "waiting for
refcount to drop" message you will get nice -EBUSY.

If you program so that you wait in module_exit for object release - you
get what you deserve.

> > BTW, I am looking at the connector code ATM and I am just amazed at
> > all wied refounting stuff that is going on there. what a single
> > actomic_dec_and_test() call without checkng reurn vaue is supposed to
> > do again?
>
> It has explicit barrieres, which guarantees that
> there will not be atomic operation vs. non atomic
> reordering.

And you can't use explicit barriers - why exactly?

--
Dmitry
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Reply | Threaded
Open this post in threaded view
|

Re: [RFC/PATCH 0/22] W1: sysfs, lifetime and other fixes

Greg KH-2
In reply to this post by Dmitry Torokhov-4
On Mon, Apr 25, 2005 at 11:32:14AM -0500, Dmitry Torokhov wrote:

> On 4/25/05, Evgeniy Polyakov <[hidden email]> wrote:
> > On Thu, 2005-04-21 at 09:31 -0500, Dmitry Torokhov wrote:
> > >
> > > OK, that is what I am aying. But why do you need that attribute with
> > > variable name and a bin attribute that is not really bin but just a
> > > dump for all kind of data (looks like debug one).
> >
> > bin attribute was created for lm_sensors scripts format - it only caches
> > read value.
> > I think there might be only 2 "must have" methods - read and write.
> > I plan to implement them using connector, so probably they will go away
> > completely.
> ...
> > > You will not be able to cram all 1-wire devices into unified
> > > interface. You will need to build classes on top of it and you might
> > > use connector (I am not sure) bit not on w1 bus level.
> > > ...
> >
> > connector allows to have different objects inside one netlink group,
> > so it will use it in that way.
> > I think only two w1 methods must exist - read and write,
> > and they must follow protocol, defined in family driver.
>
> No, I think there should not be any "must have" methods on w1_bus
> level. What you really need (and this needs to be coordinated with
> other sensors people) is a "sensors" class hierarchy that will define
> classes like "temperature sensor", "fan", "vid", etc. Then your w1
> family drivers, when bound to a slave, will create needed class
> devices. i2c drivers will do the same, and your superio, and I'll be
> able to change i8k driver just for kicks. Then your usespace would not
> care what _bus_ a particular sensor is sittign on and will be
> presented with a unified interface.

Yes, that is the way to go, and is what a number of people are currently
working on implementing.

thanks,

greg k-h
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Reply | Threaded
Open this post in threaded view
|

Re: [RFC/PATCH 0/22] W1: sysfs, lifetime and other fixes

Greg KH-2
In reply to this post by Evgeniy Polyakov
On Tue, Apr 26, 2005 at 10:43:36AM +0400, Evgeniy Polyakov wrote:

> On Mon, 2005-04-25 at 15:22 -0500, Dmitry Torokhov wrote:
> > On 4/25/05, Evgeniy Polyakov <[hidden email]> wrote:
> > > While thinking about locking schema
> > > with respect to sysfs files I recalled,
> > > why I implemented such a logic -
> > > now one can _always_ remove _any_ module
> > > [corresponding object is removed from accessible
> > > pathes and waits untill all exsting users are gone],
> > > which is very good - I really like it in networking model,
> > > while with whole device driver model
> > > if we will read device's file very quickly
> > > in several threads we may end up not unloading it at all.
> >
> > I am sorrry, that is complete bull*. sysfs also allows removing
> > modules at an arbitrary time (and usually without annoying "waiting
> > for refcount" at that)... You just seem to not understand how driver
> > code works, thus the need of inventing your own schema.
>
> Ok, let's try again - now with explanation,
> since it looks like you did not even try to understand what I said.
> If you will remove objects from ->remove() callback
> you may end up with rmmod being stuck.

Yes, and that is acceptable.  networking implemented their own locking
method to allow unloading of their drivers in such a manner.  No other
subsystem is going to do that kind of implementation, so Dmitry is
correct here.

thanks,

greg k-h
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Reply | Threaded
Open this post in threaded view
|

Re: [RFC/PATCH 0/22] W1: sysfs, lifetime and other fixes

Evgeniy Polyakov
In reply to this post by Dmitry Torokhov-2
On Tue, 2005-04-26 at 01:50 -0500, Dmitry Torokhov wrote:

> On Tuesday 26 April 2005 01:43, Evgeniy Polyakov wrote:
> > On Mon, 2005-04-25 at 15:22 -0500, Dmitry Torokhov wrote:
> > > On 4/25/05, Evgeniy Polyakov <[hidden email]> wrote:
> > > > While thinking about locking schema
>  > > with respect to sysfs files I recalled,
> > > > why I implemented such a logic -
> > > > now one can _always_ remove _any_ module
> > > > [corresponding object is removed from accessible
> > > > pathes and waits untill all exsting users are gone],
> > > > which is very good - I really like it in networking model,
> > > > while with whole device driver model
> > > > if we will read device's file very quickly
> > > > in several threads we may end up not unloading it at all.
> > >
> > > I am sorrry, that is complete bull*. sysfs also allows removing
> > > modules at an arbitrary time (and usually without annoying "waiting
> > > for refcount" at that)... You just seem to not understand how driver
> > > code works, thus the need of inventing your own schema.
> >
> > Ok, let's try again - now with explanation,
> > since it looks like you did not even try to understand what I said.
> > If you will remove objects from ->remove() callback
> > you may end up with rmmod being stuck.
> > Explanation: each read still gets reference counter,
> > while in rmmod path there is a wait until it is zero.
> > If there are too many simultaneous reads - even
> > if each will put reference counter at the end, we still can have
> > non zero refcnt each time we check it in rmmod path.
> > That is why object must be removed from accessible pathes
> > first, and only freed in ->remove() callback.
>
> Please try to read the code. device_unregister and kobject_unregister
> do not require caller to wait for the last reference to drop, they rely
> on availability of release method to clean up the object when last user
> is gone. driver_unregister is blocking (like your family code) but
> teardown takes no time. If driver is in use (attributes are open) then
> module refcount is non-zero and instead of (possibly endless) "waiting for
> refcount to drop" message you will get nice -EBUSY.
>
> If you program so that you wait in module_exit for object release - you
> get what you deserve.
But we can remove objects not from rmmod path.
You pointed right example in one previous e-mail.

Using above "waiting for device..." message is for debug only.

> > > BTW, I am looking at the connector code ATM and I am just amazed at
> > > all wied refounting stuff that is going on there. what a single
> > > actomic_dec_and_test() call without checkng reurn vaue is supposed to
> > > do again?
> >
> > It has explicit barrieres, which guarantees that
> > there will not be atomic operation vs. non atomic
> > reordering.
>
> And you can't use explicit barriers - why exactly?
I used them - code was following:
smp_mb__before_atomic_dec();
atomic_dec();
smp_mb__after_atomic_dec();

I think simple atomic_dec_and_test() or even atomic_dec_and_lock()
is better.

--
        Evgeniy Polyakov

Crash is better than data corruption -- Arthur Grabowski

signature.asc (196 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [RFC/PATCH 0/22] W1: sysfs, lifetime and other fixes

Evgeniy Polyakov
In reply to this post by Greg KH-2
On Tue, 2005-04-26 at 00:00 -0700, Greg KH wrote:

> On Tue, Apr 26, 2005 at 10:43:36AM +0400, Evgeniy Polyakov wrote:
> > On Mon, 2005-04-25 at 15:22 -0500, Dmitry Torokhov wrote:
> > > On 4/25/05, Evgeniy Polyakov <[hidden email]> wrote:
> > > > While thinking about locking schema
> > > > with respect to sysfs files I recalled,
> > > > why I implemented such a logic -
> > > > now one can _always_ remove _any_ module
> > > > [corresponding object is removed from accessible
> > > > pathes and waits untill all exsting users are gone],
> > > > which is very good - I really like it in networking model,
> > > > while with whole device driver model
> > > > if we will read device's file very quickly
> > > > in several threads we may end up not unloading it at all.
> > >
> > > I am sorrry, that is complete bull*. sysfs also allows removing
> > > modules at an arbitrary time (and usually without annoying "waiting
> > > for refcount" at that)... You just seem to not understand how driver
> > > code works, thus the need of inventing your own schema.
> >
> > Ok, let's try again - now with explanation,
> > since it looks like you did not even try to understand what I said.
> > If you will remove objects from ->remove() callback
> > you may end up with rmmod being stuck.
>
> Yes, and that is acceptable.  networking implemented their own locking
> method to allow unloading of their drivers in such a manner.  No other
> subsystem is going to do that kind of implementation, so Dmitry is
> correct here.
w1 does it too :)
It's locking was lurked in network code.
And it _is_ design note to be able to remove objects in any time.
Ok, I can not say, that it is exactly like networking,
since there is waiting in rmmod path, it is very similar to virtual
devices
like vlan.

> thanks,
>
> greg k-h
--
        Evgeniy Polyakov

Crash is better than data corruption -- Arthur Grabowski

signature.asc (196 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [RFC/PATCH 0/22] W1: sysfs, lifetime and other fixes

Dmitry Torokhov-2
In reply to this post by Evgeniy Polyakov
On Tuesday 26 April 2005 02:06, Evgeniy Polyakov wrote:
> On Tue, 2005-04-26 at 01:50 -0500, Dmitry Torokhov wrote:
> >
> > If you program so that you wait in module_exit for object release - you
> > get what you deserve.
>
> But we can remove objects not from rmmod path.
> You pointed right example in one previous e-mail.

Right, and you need to be careful so that thread does not hold any references
to the resource it tries to free. rmmod is just one pf the common examples.
But you need to consider this scenario whether you using driver model or your
separate refcount - the basic problem is still the same.

> Using above "waiting for device..." message is for debug only.
>
> > > > BTW, I am looking at the connector code ATM and I am just amazed at
> > > > all wied refounting stuff that is going on there. what a single
> > > > actomic_dec_and_test() call without checkng reurn vaue is supposed to
> > > > do again?
> > >
> > > It has explicit barrieres, which guarantees that
> > > there will not be atomic operation vs. non atomic
> > > reordering.
> >
> > And you can't use explicit barriers - why exactly?
>
> I used them - code was following:
> smp_mb__before_atomic_dec();
> atomic_dec();
> smp_mb__after_atomic_dec();
>
> I think simple atomic_dec_and_test() or even atomic_dec_and_lock()
> is better.

This is usually indicates that there some kiond of a problem. Consider
following fragment:

> +static void cn_queue_wrapper(void *data)
> +{
> +       struct cn_callback_entry *cbq = (struct cn_callback_entry *)data;
> +
> +       atomic_inc_and_test(&cbq->cb->refcnt);
> +       cbq->cb->callback(cbq->cb->priv);
> +       atomic_dec_and_test(&cbq->cb->refcnt);
>

What exactly this refcount protects? Can it be that other code decrements
refcount and frees the object right when one CPU is entering this function?
If not that means that cb structure is protected by some other means, so
why we need to increment refcout here and consider ordering?

Btw, cb refcount can be complelely removed, something like the patch below
(won't apply cleanly as I have some other stuff).

--
Dmitry

 drivers/connector/cn_queue.c |   85 +++++++++++--------------------------------
 include/linux/connector.h    |    2 -
 2 files changed, 23 insertions(+), 64 deletions(-)

Index: linux-2.6.11/drivers/connector/cn_queue.c
===================================================================
--- linux-2.6.11.orig/drivers/connector/cn_queue.c
+++ linux-2.6.11/drivers/connector/cn_queue.c
@@ -33,49 +33,12 @@
 
 static void cn_queue_wrapper(void *data)
 {
- struct cn_callback_entry *cbq = (struct cn_callback_entry *)data;
+ struct cn_callback_entry *cbq = data;
 
- atomic_inc_and_test(&cbq->cb->refcnt);
  cbq->cb->callback(cbq->cb->priv);
- atomic_dec_and_test(&cbq->cb->refcnt);
-
  cbq->destruct_data(cbq->ddata);
 }
 
-static struct cn_callback_entry *cn_queue_alloc_callback_entry(struct cn_callback *cb)
-{
- struct cn_callback_entry *cbq;
-
- cbq = kmalloc(sizeof(*cbq), GFP_KERNEL);
- if (!cbq) {
- printk(KERN_ERR "Failed to create new callback queue.\n");
- return NULL;
- }
-
- memset(cbq, 0, sizeof(*cbq));
-
- cbq->cb = cb;
-
- INIT_WORK(&cbq->work, &cn_queue_wrapper, cbq);
-
- return cbq;
-}
-
-static void cn_queue_free_callback(struct cn_callback_entry *cbq)
-{
- cancel_delayed_work(&cbq->work);
- flush_workqueue(cbq->pdev->cn_queue);
-
- while (atomic_read(&cbq->cb->refcnt)) {
- printk(KERN_INFO "Waiting for %s to become free: refcnt=%d.\n",
-       cbq->pdev->name, atomic_read(&cbq->cb->refcnt));
-
- msleep(1000);
- }
-
- kfree(cbq);
-}
-
 int cn_cb_equal(struct cb_id *i1, struct cb_id *i2)
 {
 #if 0
@@ -90,40 +53,37 @@ int cn_cb_equal(struct cb_id *i1, struct
 int cn_queue_add_callback(struct cn_queue_dev *dev, struct cn_callback *cb)
 {
  struct cn_callback_entry *cbq, *__cbq;
- int found = 0;
+ int retval = 0;
 
- cbq = cn_queue_alloc_callback_entry(cb);
- if (!cbq)
+ cbq = kmalloc(sizeof(*cbq), GFP_KERNEL);
+ if (!cbq) {
+ printk(KERN_ERR "Failed to create new callback queue.\n");
  return -ENOMEM;
+ }
 
  atomic_inc(&dev->refcnt);
+
+ memset(cbq, 0, sizeof(*cbq));
+ INIT_WORK(&cbq->work, &cn_queue_wrapper, cbq);
+ cbq->cb = cb;
  cbq->pdev = dev;
+ cbq->nls = dev->nls;
+ cbq->seq = 0;
+ cbq->group = cbq->cb->id.idx;
 
  spin_lock_bh(&dev->queue_lock);
+
  list_for_each_entry(__cbq, &dev->queue_list, callback_entry) {
  if (cn_cb_equal(&__cbq->cb->id, &cb->id)) {
- found = 1;
- break;
+ retval = -EEXIST;
+ kfree(cbq);
+ goto out;
  }
  }
- if (!found) {
- atomic_set(&cbq->cb->refcnt, 1);
- list_add_tail(&cbq->callback_entry, &dev->queue_list);
- }
+ list_add_tail(&cbq->callback_entry, &dev->queue_list);
+ out:
  spin_unlock_bh(&dev->queue_lock);
-
- if (found) {
- atomic_dec(&dev->refcnt);
- atomic_set(&cbq->cb->refcnt, 0);
- cn_queue_free_callback(cbq);
- return -EINVAL;
- }
-
- cbq->nls = dev->nls;
- cbq->seq = 0;
- cbq->group = cbq->cb->id.idx;
-
- return 0;
+ return retval;
 }
 
 void cn_queue_del_callback(struct cn_queue_dev *dev, struct cb_id *id)
@@ -142,8 +102,9 @@ void cn_queue_del_callback(struct cn_que
  spin_unlock_bh(&dev->queue_lock);
 
  if (found) {
- atomic_dec(&cbq->cb->refcnt);
- cn_queue_free_callback(cbq);
+ cancel_delayed_work(&cbq->work);
+ flush_workqueue(cbq->pdev->cn_queue);
+ kfree(cbq);
  atomic_dec_and_test(&dev->refcnt);
  }
 }
Index: linux-2.6.11/include/linux/connector.h
===================================================================
--- linux-2.6.11.orig/include/linux/connector.h
+++ linux-2.6.11/include/linux/connector.h
@@ -115,8 +115,6 @@ struct cn_callback
  struct cb_id id;
  void (* callback)(void *);
  void *priv;
-
- atomic_t refcnt;
 };
 
 struct cn_callback_entry
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Reply | Threaded
Open this post in threaded view
|

Re: [RFC/PATCH 0/22] W1: sysfs, lifetime and other fixes

Evgeniy Polyakov
In reply to this post by Dmitry Torokhov-4
On Mon, 2005-04-25 at 16:32 -0500, Dmitry Torokhov wrote:

> On 4/25/05, Evgeniy Polyakov <[hidden email]> wrote:
> >
> > How do you suppose to notify about alarm condition?
> > Not from bus layer?
> > Who does send "link is down" messages? It is not the same
> > as device is present and found, it like "w1 device has something to read".
> > For example w1 ds18s20 thermal sensor may send information
> > about "85 degree problem" - it is read when sensor did not
> > finished temerature transformation yet, how non-bus layer may know about it?
>
> Classes. And return -EAGAIN when reading is not ready (for some reason).
Device may be broken, btw, and return that value always.
How to notify about such condition?

> >
> > Your idea about classes over the various buses is good,
> > but unfortunately it is utopia, al least for now,
>
> It is not an utopia, it is Linux kernel and we do not need to get in
> unfinished solution.

That will require too many changes to say:
"hey, tomorrow we will have new generic w1/i2c/superio set".
I believe it can be done, it is definitely a good idea,
but let's fix existing bugs before changing the whole tree.

> > so let's create w1 core layer (which, btw, is not only bus,
> > which is managed by bus master dirver, but also some logic over it,
> > one may call it w1 stack, stack can send such a messages, doesn't it)?
> >
>
> Heh ;) Let's concentrate on sensors stack, please...

If we still have a bugs there, I do not think it is good idea
to move things ahead.
I repeat, I agree that unufied class hierarchy for sensors devices
is a very good idea, but it can not be implemented in a couple of days,
so I would pospone it until others bugs are fixed.

> > > >
> > > > As I said I have feature requests for ability to export w1 devices
> > > > outside w1 core.
> > > > Probably it is due to it's private non-GPL usage, so it is not created,
> > > > but it is usefull feature actually and we can not know what will
> > > > happen in what context when we export master/slave devices.
> > >
> > > Look at your present w1_remove_master_device. It sleeps. Sop there is
> > > no need for a separate thread, callers must be able to sleep anyway.
> >
> > That is exactly why control thread exists - to manage sleepable operations!
>
> Read what I wrote again - if caller is sleeping on thread's completion
> there is really no need for a thread. The caller can do whetever
> thread does. Only if caller would send request and continue one woudl
> need to set up a thread.
Why do you think caller is in process context?
Control thread was created to process that work, which can not be done
in interrupt context, i.e. some command from unknown context
is deffered to control thread execution, which process it in a safe
context. Using connector for exaple you may send command REMOVE_*,
which will be received in BH context and just set the flag, which
will be seen by control thread.

> > > > w1 slaves can be found on the bus without search method reaction
> > > > implemented in it's asic, btw.
> > > > And it is _very_ usefull to add/remove slaves using external command but
> > > > not using
> > > > automatic detection in search methods.
> > >
> > > But the request for that will come from userspace with is perfectly
> > > able to sleep. You are over-engineering and making kernel code
> > > unnecessarily complex without thinking it through.
> >
> > Connector's requests come from BH context.
> > Only module unloading comes with good context  (in our case we do not
> > get read/write operations),
> > but I do not want to limit the system only for that kinds of events.
> >
>
> And you still have master's thread to manage slaves. Although I don't
> quite understand why you would need manual addition of slaves. Either
> they speak W1 protocol and will be added automatically, or they don't
> speak w1 and then they are not w1 devices.
There are w1 devices that do not respond to search command
[actually there are devices that understand only one command at all],
ok, they are broken, but it does not matter.

> > > > > > I have feature requests for both adding/removing and exporting
> > > > > > master devices and slaves to the external world.
> > > > >
> > > > > External as in userspace? It (user thread) can wait just fine...
> > > >
> > > > Exporting them into other kernel modules.
> > > > We do not know in what context that structures will be used there.
> > >
> > > Why other kernel modules would be interested in raw access w1_slaves?
> > > C are to give an example?
> >
> > Concider w1 battery slave device, which exports unified interface to
> > the userspace.
> > It requires access that can be obtained from different generic module
> > [like existing kernelspace/userspace protocol,
> > proper device files and so on],
> > which will implement only read/write interface.
> > It's read method will get_slave_by_something() and read it's data
> > or do something else, then generic module will use that data.
> >
> > Generic buttery monitor will not scan w1 bus for it's devices,
> > since it even does not know about w1, it only understands reading/writing
> > operations.
> >
>
> And here you are thinking of classes again. Writing separate battery
> monitor applets for i2c, w1, superio and the rest is not less silly.
> You are trying to move them over your ocnnector code. Alternatively
> you could have move them to class codes and build netlink notification
> on top of them. This way you'd separate buses (physical interface)
> with userpsace interfaces and allowed use of different transports.
No, connector is just an example.

Classes hierarchy for all sensor devices is good idea,
but not too easy to be implemeted.
I stronly agree with it and would like to port w1 to it.

Connector could be used to control that classes - just an example.

> > >
> > > Wrong level. You need to start with device implementing w1_bus_master
> > > (w1_bus_ops) to remova dangling data structures). Easiest way I think
> > > it have the driver compiled as a module and remove it altogether - why
> > > keep it if you don't need master?
> >
> > 1. master can be removed by command. It is not in process' context.
> > Current thread can remove only all object at once, but nevertheless
> > I do not want to limit it.
>
> You are also need to remove code controlling physical device presented
> as w1_master. The request will go do a different system (module).
No need to remove bus master device, it can be there with zero refcnt,
so when it will call w1_remove_master_device() it wouldn't block.

> > 2. control thread can add/remove new slaves by request.
> > Usefullness of that ability was pointed in my previous e-mail,
> > but you skipped that part.
>
> I am still missing usefullness of it.

There are devices that can not be found using w1 search commands,
there are devices that only understand one command,
I agree they are broken, but they still can be easily supported
by w1 subsystem.

> > >
> > > The less code in kernel that produces data availavle elsewhere the better :)
> >
> > Does 3 lines of code for reading slave's names is too big
> > price for not scanning the whole /sys/bus/w1/w1_master1/ directory?
> >
>
> How often do you use them? While debugging only?

Always.
With long line it is very common to lose w1 slaves - that is why it was
TTL
attribute created - it's purpose is to integrate such events.
With device like iButton it _IS_ thy _only_ behaviour - slave object
exists only couple of seconds.

>
> > > > > > > w1-master-cleanup.patch
> > > > > > >    Clean-up master device implementation:
> > > > > > >    - get rid of separate refcount, rely on driver model to enforce
> > > > > > >      lifetime rules;
> > > > > > >    - use atomic to generate unique master IDs;
> > > > > > >    - drop unused fields.
> > > > > >
> > > > > > That patch is very broken.
> > > > > > I completely against it:
> > > > > > 1. it breaks process logic - searching can be interrupted and stopped,
> > > > > > thread will exit on signals.
> > > > >
> > > > > Interrupted/stopped from userspace?
> > > >
> > > > Your loop waits only until interrupt happens - it can be delivered from
> > > > anywhere.
> > >
> > > No, only root can kill kernel therad so it is pretty safe. And hey, if
> > > a thread goes mad maybe it's a good thing that it can be killed.
> >
> > And if it exits - it breaks the logic - user can not know the state
> > of the master device when thread is exited, but module was not removed
> > by request.
> >
>
> I (as a root) have zillion ways to break the system. There is not hing
> new. The oint that an ordinary user can't do anything with that
> thread.
Signal can be sent not only by your request.

> > >
> > > device model is here to _use_ it. It already implements bunch of stuff
> > > you have to re-implement if you do it "your own way" and it is already
> > > debugged much better than your solution. And if there is a problem
> > > with driver core implementation - well, more people are looking at it
> > > and are more likely to discover a problem and offer a solution.
> > >
> > > I do not understand why you are against full integration with device
> > > model - it does simplifies and unifies the code.
> >
> > Because it is not needed here - and even if it could be integrated
> > more closer - your changes broke too many special design cases,
> > which are not acceptible.
> >
>
> Having too many special design cases in otherwise pretty simple bus
> indicates that there something wrong with the design.
I see your point.

> > > > With such changes how to increment slave's usage counter? module_get
> > > > (w1_family)?
> > >
> > > Actually if you need it it would be get_device(&w1_slave->dev). And if
> > > you need pin family object you would get
> > > get_driver(&w1_family->driver). But I don't think you will needed it.
> > > Actually, at some point I had w1_family_get() implemented as a wrapper
> > > to get_driver() but since it does not seem to be needed I dropped it.
> >
> > We can not do it in that way.
> > 1. low-level bus master code differs from what w1_master is.
> > 2. family itself is different from slave object.
> >
> > Having that we need to create w1_master/w1_family device/driver model
> > locking + w1_bus_master/w1_slave locking or move them to device/driver
> > model too. Why it is needed I still do not see, there is
> > proper locking schema, which is broken in the places where
> > existing model touches device/driver one, and it will be fixed.
> >
>
> You have exactly the same problem with master devices too. What
> escapes me is the desire to have 2 separate refcounting for the same
> object. Except for ability to introduce 2x more bugs.
Yep.
As I said in previous e-mail - I do want to remove objects in any time,
so it has it's own locking schema, but I also do want to use sysfs
objects, so there is another locking schema - in the place they are
touch each other there is a problem and I will fix it,
probably  by moving object freeing into ->remove() callback.

--
        Evgeniy Polyakov

Crash is better than data corruption -- Arthur Grabowski

signature.asc (196 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [RFC/PATCH 0/22] W1: sysfs, lifetime and other fixes

Evgeniy Polyakov
In reply to this post by Dmitry Torokhov-2
On Tue, 2005-04-26 at 02:16 -0500, Dmitry Torokhov wrote:

> > > > It has explicit barrieres, which guarantees that
> > > > there will not be atomic operation vs. non atomic
> > > > reordering.
> > >
> > > And you can't use explicit barriers - why exactly?
> >
> > I used them - code was following:
> > smp_mb__before_atomic_dec();
> > atomic_dec();
> > smp_mb__after_atomic_dec();
> >
> > I think simple atomic_dec_and_test() or even atomic_dec_and_lock()
> > is better.
>
> This is usually indicates that there some kiond of a problem. Consider
> following fragment:
>
> > +static void cn_queue_wrapper(void *data)
> > +{
> > +       struct cn_callback_entry *cbq = (struct cn_callback_entry *)data;
> > +
> > +       atomic_inc_and_test(&cbq->cb->refcnt);
> > +       cbq->cb->callback(cbq->cb->priv);
> > +       atomic_dec_and_test(&cbq->cb->refcnt);
> >
>
> What exactly this refcount protects? Can it be that other code decrements
> refcount and frees the object right when one CPU is entering this function?
> If not that means that cb structure is protected by some other means, so
> why we need to increment refcout here and consider ordering?
It does not needed there. I pointed it to Andrew when we discuss it
couple of weeks ago, but forget to remove.

> Btw, cb refcount can be complelely removed, something like the patch below
> (won't apply cleanly as I have some other stuff).

I will think of it some more, probably you are right,
it looks like flush_workqueue() is sufficient for that.

> --
> Dmitry
>
>  drivers/connector/cn_queue.c |   85 +++++++++++--------------------------------
>  include/linux/connector.h    |    2 -
>  2 files changed, 23 insertions(+), 64 deletions(-)
>
> Index: linux-2.6.11/drivers/connector/cn_queue.c
> ===================================================================
> --- linux-2.6.11.orig/drivers/connector/cn_queue.c
> +++ linux-2.6.11/drivers/connector/cn_queue.c
> @@ -33,49 +33,12 @@
>  
>  static void cn_queue_wrapper(void *data)
>  {
> - struct cn_callback_entry *cbq = (struct cn_callback_entry *)data;
> + struct cn_callback_entry *cbq = data;
>  
> - atomic_inc_and_test(&cbq->cb->refcnt);
>   cbq->cb->callback(cbq->cb->priv);
> - atomic_dec_and_test(&cbq->cb->refcnt);
> -
>   cbq->destruct_data(cbq->ddata);
>  }
>  
> -static struct cn_callback_entry *cn_queue_alloc_callback_entry(struct cn_callback *cb)
> -{
> - struct cn_callback_entry *cbq;
> -
> - cbq = kmalloc(sizeof(*cbq), GFP_KERNEL);
> - if (!cbq) {
> - printk(KERN_ERR "Failed to create new callback queue.\n");
> - return NULL;
> - }
> -
> - memset(cbq, 0, sizeof(*cbq));
> -
> - cbq->cb = cb;
> -
> - INIT_WORK(&cbq->work, &cn_queue_wrapper, cbq);
> -
> - return cbq;
> -}
> -
> -static void cn_queue_free_callback(struct cn_callback_entry *cbq)
> -{
> - cancel_delayed_work(&cbq->work);
> - flush_workqueue(cbq->pdev->cn_queue);
> -
> - while (atomic_read(&cbq->cb->refcnt)) {
> - printk(KERN_INFO "Waiting for %s to become free: refcnt=%d.\n",
> -       cbq->pdev->name, atomic_read(&cbq->cb->refcnt));
> -
> - msleep(1000);
> - }
> -
> - kfree(cbq);
> -}
> -
>  int cn_cb_equal(struct cb_id *i1, struct cb_id *i2)
>  {
>  #if 0
> @@ -90,40 +53,37 @@ int cn_cb_equal(struct cb_id *i1, struct
>  int cn_queue_add_callback(struct cn_queue_dev *dev, struct cn_callback *cb)
>  {
>   struct cn_callback_entry *cbq, *__cbq;
> - int found = 0;
> + int retval = 0;
>  
> - cbq = cn_queue_alloc_callback_entry(cb);
> - if (!cbq)
> + cbq = kmalloc(sizeof(*cbq), GFP_KERNEL);
> + if (!cbq) {
> + printk(KERN_ERR "Failed to create new callback queue.\n");
>   return -ENOMEM;
> + }
>  
>   atomic_inc(&dev->refcnt);
> +
> + memset(cbq, 0, sizeof(*cbq));
> + INIT_WORK(&cbq->work, &cn_queue_wrapper, cbq);
> + cbq->cb = cb;
>   cbq->pdev = dev;
> + cbq->nls = dev->nls;
> + cbq->seq = 0;
> + cbq->group = cbq->cb->id.idx;
>  
>   spin_lock_bh(&dev->queue_lock);
> +
>   list_for_each_entry(__cbq, &dev->queue_list, callback_entry) {
>   if (cn_cb_equal(&__cbq->cb->id, &cb->id)) {
> - found = 1;
> - break;
> + retval = -EEXIST;
> + kfree(cbq);
> + goto out;
>   }
>   }
> - if (!found) {
> - atomic_set(&cbq->cb->refcnt, 1);
> - list_add_tail(&cbq->callback_entry, &dev->queue_list);
> - }
> + list_add_tail(&cbq->callback_entry, &dev->queue_list);
> + out:
>   spin_unlock_bh(&dev->queue_lock);
> -
> - if (found) {
> - atomic_dec(&dev->refcnt);
> - atomic_set(&cbq->cb->refcnt, 0);
> - cn_queue_free_callback(cbq);
> - return -EINVAL;
> - }
> -
> - cbq->nls = dev->nls;
> - cbq->seq = 0;
> - cbq->group = cbq->cb->id.idx;
> -
> - return 0;
> + return retval;
>  }
>  
>  void cn_queue_del_callback(struct cn_queue_dev *dev, struct cb_id *id)
> @@ -142,8 +102,9 @@ void cn_queue_del_callback(struct cn_que
>   spin_unlock_bh(&dev->queue_lock);
>  
>   if (found) {
> - atomic_dec(&cbq->cb->refcnt);
> - cn_queue_free_callback(cbq);
> + cancel_delayed_work(&cbq->work);
> + flush_workqueue(cbq->pdev->cn_queue);
> + kfree(cbq);
>   atomic_dec_and_test(&dev->refcnt);
>   }
>  }
> Index: linux-2.6.11/include/linux/connector.h
> ===================================================================
> --- linux-2.6.11.orig/include/linux/connector.h
> +++ linux-2.6.11/include/linux/connector.h
> @@ -115,8 +115,6 @@ struct cn_callback
>   struct cb_id id;
>   void (* callback)(void *);
>   void *priv;
> -
> - atomic_t refcnt;
>  };
>  
>  struct cn_callback_entry
--
        Evgeniy Polyakov

Crash is better than data corruption -- Arthur Grabowski

signature.asc (196 bytes) Download Attachment