An ARM executable binary is started on an ARM hardware board, running Linux OS. This process when operational, only under some certain conditions (like after some time of tests or certain tests) showed to be using 99.9% CPU. After which the system (board) becomes unresponsive, cannot connect to the board in any way. So no way to know what’s going on. Well not exactly.
To start with when we used to see this problem we don’t know what could be causing this, as the process in questions was a very complicated code – multi-threaded, using both TCP and UDP network sockets for control and streaming data messages, with multimedia streaming.
We connect the board to a PC via a serial cable (this board has a serial port). Start a terminal emulation software on PC (Hyper-terminal, Tera-term, any one will do). This gives a command prompt of the Linux on the board.
Find the suspect:
In steps GDB:
possible deadlocks among threads, a unwanted infinite loop created due to bad coding(signed/unsigned data type mismatches in condition checks), or a plain design bug in which programmer assumed and relied some behaviour, variable taking certain value which just did not manifest or vice-a-versa.