mini6410 misses external interrupt

cyberhippy
Hi All,
I am using a CAN mcp2515 controller attached to a MINI6410 running
emdebian. I have tried two Linux kernels 2.6.36 and 2.6.28. Both exhibit
the same problem.

The mcp2515 uses a external interrupt pin, in my case IRQ_EINT(16), which
works about 200 times over 15 minutes before the system stops working.

I have done a number of things which lead to the conclusion that the
problem is in the Linux kernel. Test I have done to date:

- inserted debug messages into the IRQ work handling routine to check if
all interrupts are cleared when the routine is exited. (No sign of a
problem here)

- monitored the interrupt line and the SPI CS line on an oscilloscope. The
result is that the all communication between mini6410 and mcp2515 are
displayed correctly until such time that the interrupt line goes low and no
SPI transfers occur. The interrupt line remains low until a restart of the
socketcan interface is done. Even though the interrupt line is low, it is
still possible send CAN messages out onto the bus. Only receiving messages
freezes. 

- I have tried to register the interrupt as level triggered rather then
edge triggered (default) which resulted in a complete freeze of the system.

Is there any one out there having dealt with a similar problem and solved
it? Or has any body got an idea as to what might be wrong?

Thanks for any help in advance.

Dave McLaughlin
Are you sure that the MCP2515 is still generating the interrupts on
receive? Is it possible the device is going OFF BUS?

Can you check the registers for this condition?

This sounds more like a driver issue with the MCP2515 than with the
interrupts on the MINI6410.

Dave...

cyberhippy
Hi Dave,

I though in the first place that this might be a driver issue and I have
tried about 3 different version of the driver. The fact is though that I
monitor the interrupt line and the chip select line on the MCP2515 using a
scope. For a working interrupt I see the interrupt line going low for less
then 400us, in that time I see a short low on the CS (reading the interrupt
flags) and a second longer low on the CS (reading the RX buffer), CS and
the interrupt line go high together after the RX buffer read. 

The last interrupt that occurs when the system hangs, the interrupt goes
low and I see no activity on the CS line at all. The scope captures about
5ms after the interrupt went low. 

I have done the same test in the driver interrupt handling routine by
inserting a counter and debug messages. The counter value and 

cat /proc/interrupts 

show the same value. 

One more thing, the driver reads the interrupt flags a second time, which
causes CS to go low a second time after the interrupt went high again. 

The debug messages always say that that the RX0 int flags was set and that
all interrupts were cleared on exiting the interrupt routine.

The only driver issue that I see here is that a higher level of the
socketcan modules prohibits the interrupt handler from being executed.

I have also double checked, the first drivers used a call back that dealt
with the actual interrupt and the ISR just scheduled that call back. The
latest version got rid of the call back and deals with the interrupt in the
ISR. The behaviour that I observe is the same for all drivers.

Regards
Cyberhippy

adam watson
how are you getting on with this driver ? I do CAN stuff all day long on
PIC24 normally.

cyberhippy
Hi adam,
I have no progress to report, just one more test that assures that it must
be the Linux kernel that misses the interrupt. I have tried to regenerate
the missed interrupt which unstuck the driver. 
As the interrupt line is a data line I have done the unspeakable. When the
interrupt got stuck I have shorted the low interrupt line momentary to Vdd
and thus regenerated the falling edge of the interrupt. The driver picked
this new interrupt and all works just fine, until it gets stuck again ;-(. 
Examining the interrupt code of the Linux kernel I have the suspicion that
this situation is related to the ethernet chip. Both the MCP2515 and the
ethernet chip use the same external interrupt bank were the ethernet chip
uses a higher priority. The interrupt handler iterates through all
interrupts in a bank and seems to stop once it finds one that reported that
the interrupt was handled. Thus it stops searching before the MCP2515
interrupt is handled. (a downfall of edge triggered interrupts) 
The way to prove that this is the case is to move the interrupt to another
bank which requires physical rework. Else one could fix the kernel external
interrupt handler (my preferred fix).
I will implement a fix once I have some time spare or someone else, who
knows more about s3c64xx external interrupt handling, can point me to the
right direction to fix the kernel. 
For now I use a cron job to restart the interface every 5 min. 
Thanks for asking

cyberhippy

Sergej
Did you found any solution?
I have exactly the same problem.

Juergen Beisert
Maybe this issue is worth to be reported on the ALKML[1] mailing list. If
the logic misses edge triggered interrupts you should use low level
triggered interrupts as a workaround.

[1] linux-arm-kernel@lists.infradead.org

Sergej
Hi Juergen,

I tried to change edge triggered interrupts to low level. In this case
system hangs.

Peter Wang
I also have the same problem, any progress ?

navid
Hi All

The point is INT line in mcp251x is level-triggered interrupt. When MCP251x
needs MCU's attention, changes the INT line to 0, now it is the
responsibility of MCU (interrupt handler) to clear some flags in MCP251x
(CANINTF) which eventually makes MCP251x to change the INT to 1. 

The device driver for MCP251x (mcp251x.c) register an IRQ with
"IRQF_TRIGGERED_FALLING" flag which means interrupt handler would be
invoked if there is a falling edge signal in interrupt line. 

Mixing two different kind of triggering leads to some problems which
finally communication between MCP251x and MCU stops. For example if by any
chance MCU misses one edge it won't be able to clear the MCP251x's
interrupt flags and INT line remains low. but MCU is supposed to service 
this interrupt line as long as it is low.

As a solution for this problem I use "IRQF_TRIGGERED_LOW | IRQF_ONESHOT" 
for IRQ registration and it works fine

I also attached a patch file