cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Highlighted
Observer
Observer
6,708 Views
Registered: ‎01-19-2009

i2c driver hang?

We're running linux on powerpc.  We have devices that are hot-swappable on I2C (power supplies, etc.).  We're noticing that I2C is failing after several (20-30) attempts to access a non-existant I2C device that the driver is hanging.  Note that we are not even trying to hot-swap the device; the device doesn't exist when I run my test.  This is using the new sysfs driver.  The affect to the application is that the read() hangs.  It is interruptable, so cntl-C gets me out of the application but subsequent runs cause the exact read() to hang again.  I hosed down the driver a bit and the last place I've seen it is at the end of XIic_MasterSend or MasterRecv.  

 

Core registers are:

 

GIE (+1C)   0x8000

IS (+20)   0xD0

IE (+28)   0x27

RST (+40)   0x00

CTL (+100)  0x0D

STS (+104)  0x40

 

 

Clearly the interrupt status and enable registers are disjoint.

 

I've looked at the bus with an I2C analyzer and the last transaction was ST ADDR NAK SP; no errors.

 

Has anyone seen this before?  I haven't seen any bug fixes related to I2C. 

0 Kudos
5 Replies
Highlighted
Xilinx Employee
Xilinx Employee
6,675 Views
Registered: ‎09-10-2008

I haven't seen that specifically, but I've heard of other issues with I2C on other devices.

 

It's not clear to me if it's h/w IP issue or s/w driver issues.

 

Sorry, not much help yet.

0 Kudos
Highlighted
Observer
Observer
6,660 Views
Registered: ‎01-19-2009

I think I've made some progress in fixing this hang.   I haven't hammered on it for days, but at least I can say that I haven't seen it hang and it's ran for more than an hour (normally it dies within a minute or two).

 

First, I'd like to know why it tries a transaction 160 times when it fails...   Seems a little too much.  Maybe someone was trying to delay in the case of an EEPROM write in progress making the part not ACK.  That should probably be done at a higher layer (the app).

 

In the original code (i2c-algo-xilinx.c) the retry code reads:

 

Status = XIic_MasterRecv(&dev->Iic, pmsg->buf, pmsg->len);

dev->Iic.Stats.TxErrors = 0;

 

and

 

  Status = XIic_MasterSend(&dev->Iic, pmsg->buf, pmsg->len);

  dev->Iic.Stats.TxErrors = 0;

 

 

I think the sometimes the interrupt is coming in prior to the clearing of TxError and is getting lost.  I've changed the code to say:

 

  dev->Iic.Stats.TxErrors = 0;

Status = XIic_MasterRecv(&dev->Iic, pmsg->buf, pmsg->len);

and 

  dev->Iic.Stats.TxErrors = 0;

Status = XIic_MasterSend(&dev->Iic, pmsg->buf, pmsg->len);

 

You might want to consider rolling this into the distribution.  It does look like the problem is gone.  Or at least the window has been made much, much smaller.

 

-D i c k

 

0 Kudos
Highlighted
Xilinx Employee
Xilinx Employee
6,656 Views
Registered: ‎09-10-2008

Thanks for the update. Keep us up to date on any other findings.

 

Yes, we'll try to roll these feedback in, but want to wait long enough for your testing.

 

Yes the # of retries is large and it was because of EEPROM operations.

 

Thanks.

0 Kudos
Highlighted
Observer
Observer
6,613 Views
Registered: ‎01-19-2009

John,

 

The above fix has shown to definitely fix the problem.  You would also see this in the case of the EEPROM write being busy; any venture into that retry code potentially could cause a hang.

 

I also put a return -ENODEV after the printfs when the retry count is exhausted.  It's not good to indicate success on the read when no data was actually placed in the buffer; someone might try to use it.

 

-**bleep**

 

0 Kudos
Highlighted
Xilinx Employee
Xilinx Employee
6,609 Views
Registered: ‎09-10-2008

Thanks, appreciate the feedback.

 

I'll try to queue that up for testing and incorporation.

0 Kudos