Monday, 26 August 2013

Using GDB,top and other tools to debug a process that has hanged and is hogging 99.9%CPU

A process which crashes leaves a back-trace so on getting that we at least know the symptom of the crash but a process which is hogging CPU 99.9% and makes the system unusable (I wonder about how the number is 99.9%, but that’s another post). Below is an first-hand account the process I used along with GDB on the target board, to analyse and pin-point(well close enough!) the culprit code.

An ARM executable binary is started on an ARM hardware board, running Linux OS. This process when operational, only under some certain conditions (like after some time of tests or certain tests) showed to be using 99.9% CPU. After which the system (board) becomes unresponsive, cannot connect to the board in any way. So no way to know what’s going on. Well not exactly. 
To start with when we used to see this problem we don’t know what could be causing this, as the process in questions was a very complicated code – multi-threaded, using both TCP and UDP network sockets for control and streaming data messages, with multimedia streaming.
So it’s really scary to think about how are we going to get to the bottom of this bug?

We connect the board to a PC via a serial cable (this board has a serial port). Start a terminal emulation software on PC (Hyper-terminal, Tera-term, any one will do). This gives a command prompt of the Linux on the board.

Find the suspect:
Executing top command on the board prompt, shows the processes currently running. Get the process id of the process you want to debug.
top –H shows the thread wise listing of each process if they are multi-threaded. In my case the process which was hogging the CPU was a complex multi-threaded source code.
top –H output in my case showed a thread MyThread4 running at 99.9% CPU
Get the pid of culprit process or thread. Let’s say it is 666.

In steps GDB:
 Linux installation needs to have gdb (client) installed under it. Attach gdb to the running process which is hogging the CPU.
gdb --pid=666
or
gdb –pid 666

On successful attaching gdb stops the process and gives the gdb prompt.
But the worst part I faced was while doing all this, the terminal console was continuously printing some debug messages which were continually scrolling up the terminal and hence I could not see the echo of what commands I was typing, so had to be careful about not making typos, lest the command would go bad.
Seeing gdb Backtrace in this case doesn’t help much as it shows some library call inside libc or some-thing where process was last hanged and most likely no source for it is going to be available for you to debug.
That’s when this gdb gold nugget comes handy:

info threads
This shows all threads of the process and also shows the functions each thread was last executing when the process was stopped when gdb attached to it and took stock of things.
This makes life lot easier, well compared to the situation we were started.
The output of info threads in my case was(part output). The thread MyThread4 which was seen as culprit in top  command, is seen below.
20   Thread 0x43b00460 (LWP 1505) “MYThread1::Func1()" 0x403709c0 in select ()
   from /lib/libc.so.6
19   Thread 0x443e0460 (LWP 1506) " MyThread2::Func2()" 0x40351d44 in nanosleep ()
   from /lib/libc.so.6
18   Thread 0x44c41460 (LWP 1507) " MyThread3::Func1" 0x400757c6 in __libc_do_syscall () from /lib/libpthread.so.0
17   Thread 0x454a9460 (LWP 1508) "MyThread4::t" 0x00073080 in MyThread1Class1

The thread MyThread4 and the function it was inside when gdb stopped is a certain parse() function.
This gave us enough clue to go back and review the source code for any possible bug/s .
Technically the cause of this could be -  
possible deadlocks among threads, a unwanted infinite loop created due to bad coding(signed/unsigned data type mismatches in condition checks), or a plain design bug in which programmer assumed and relied some behaviour, variable taking certain value which just did not manifest or vice-a-versa.
Phew….

I just thanked the gdb developers profusely, for it was to rescue once again!

Sunday, 18 August 2013

String permutations - Solution

Here is a iterative solution to the problem asked here few days back:
Find all permutations of a given string.

def permutations_iter(word):  
   stack = list(word)  
   results = [stack.pop()]  
   while len(stack) != 0:  
     c = stack.pop()  
     new_results = []  
     for w in results:  
       for i in range(len(w)+1):  
         new_results.append(w[:i] + c + w[i:])  
     results = new_results  
   return results  

 #Example test code  
 print permutations_iter("ABC") 

Thursday, 15 August 2013

Find all permutations of a string

Here is one interesting, apparently simple(on paper), problem.
           Write a program to find and print all permutations of a given string.

This is a right type of problem for using recursion. And any recursive solution , in theory can be implemented as a iterative(non-recursive) solution. I would be posting the solution here in few days.


Friday, 9 August 2013

Sequel... Life and Scope of Local and Global variables in Python programming language

Here is the explanation of how local and global variables behave in Python.

First things first: That puzzle asked here few days back has solution as below:
The output of the python program is a run time error, variable used before initialized:
Traceback (most recent call last):
File "python_local_global_vars.py", line 10, in
test()
File "python_local_global_vars.py", line 5, in test
print "Global", i
UnboundLocalError: local variable 'i' referenced before assignment

Reasoning:
There is a global variable i initialized to 333. Then in the function test() there is a local variable i declared as the for loop counter. In python when there is a variable referenced , it will look for any local variable of that name first, if not found it looks for a global variable of that name. A local variable overrides the global variable if the name of the two is same.But in this example the local variables' life begins only at the statement of the for loop, not any before.
But when we try to print the variable i before that as Global i , in the statement 
print "Global i =", i 
 python sees there is a local variable of same name , so it overrides the global i, but the value of local i is not initialized until the for loop. So we are accessing a uninitialized local variable in that print statement. Hence the python run time error occurs.




Wednesday, 7 August 2013

Puzzle about Life and Scope of Local and Global variables in Python programming language

Here is a puzzle about python language, local and global variables and the point where and when they are defined initialized.

What do you think would be the output of this simple python code snippet
Will post the answer and its explanation in a while.

i=333  
 def test():  
   print "Global i =", i  
   for i in range(10):  
     print "Local i = ",i  
 test() 

Monday, 5 August 2013

Solution to problem - Palindromic substrings of a string

Here is my solution to the problem asked earlier about Palindromic Substrings
It is Python - with the function to solve the problem and the application test code with some 10 test cases used to test the code.

#Input is a string   
 def num_of_palindromic_substrings(ipstr):  
   slen = len(ipstr)  
   cnt = 0  
   for x in range(slen):  
     for y in range((x+1),slen):  
       substr = ipstr[x:(y+1)]  
       rstr = ''.join(reversed(substr))  
       if(substr == rstr):  
         cnt += 1  
         print "Palindrome sub string number",cnt,":",substr  
   return cnt  
 #Application test code  
 #Test cases   
 t1 = "abaabab"  
 t2 = "baababa"   
 t3 = "abbacada"  
 t4 = "aza"  
 t5 = "abba"  
 t6 = "abacaba"  
 t7 = "zaza"  
 t8 = "abcd"  
 t9 = "abacada"  
 t10 = 'ab3492a'  
 t11 = "level"  
 ret = num_of_palindromic_substrings(t11)  
 print "Total number of palindromic sub strings is:", ret   

Sunday, 4 August 2013

Debugging a hanged/frozen network application

Recently I read a question on a forum asking about how to debug a process(network application) which is freezing:

A process which network data via tcpip. After running the process for a while while tracking network load, it seems that application gets into freeze state and the process does not getting data, there are other processes in the system that using the network interfaces and they work properly . Application comes out of this hanged situation by itself after several minutes.
Without knowing the OS, the nature of the application what it is doing(is it a chat client, ssh , ftp client), what networking libraries is it using, is it a multi-threaded code, and such details it would be hard to advice any specific steps. 
My answer to it was as below:

  1. top Check top to see ow much resources(CPU, memory) your process is using and if there is anything abnormally high values in CPU usage for it.
  2. pstack This should stack frames of the process executing at time of the problem.
  3. netstat Run this with necessary options (tcp/udp) to check what is the stae of the network sockets opened by your process
  4. gcore -s -c This forces your process to core when the mentioned problem happens, and then analyze that core file using gdb gdb and then use command where at gdb prompt to get full back trace of the process (which functions it was executing last and previous function calls.

Friday, 2 August 2013

Find Palindromic sub strings of a string

Here's a good puzzle. Use any programming language to solve it.
Hint: There is faster thing called interpreters, slower ones are compilers.

Input given is a string (character array ) which contains a sequence of individual alphanumeric characters without any spaces, special characters in it.

Find sub-strings of a given string; but those sub-strings should be palindromes.
Print those strings and the total number of such sub strings found.

Note: I put this condition of alphanumeric chars only myself because when I was writing my solution for this problem using python , it has issues in having Non-ASCII characters in some test cases which I constructed to test my python code)

So you guessed my solution which I shall be posting later is in Python.

A palindrome is any number or string which is same if read forward or backwards.
e.g.
Strings which are palindromes :
   level ;  radar ; SOS ;