Wednesday, April 8, 2015

OpenMP, file tree walk and map-reduce example

All happens on Linux and we will use GCC and built in OpenMP support and also thread local storage. We will see how easy is to spread workload across threads using OpenMP.
In man pages we find complete example for nftw or file tree walk. That example will print to standard output about everything what it finds while walking directories. Instead of having it printing everything I made it count files:


To compile it I used:

gcc -O1 -o simpleftw simpleftw.c

In order to count files across multiple directories we will pass few directories in argument list to program and also turn our static variable counter into thread local storage. We will use pragma OpenMP parallel for directive combined with reduction which is in our case simple addition:


That print inside for loop should demonstrate that thread local storage works as expected. C99 switch is required, or we will have to specify that variable i is private. Also switch for OpenMP is required. To compile it I used:

gcc -O1 -fopenmp -o dwalker dwalker.c -std=c99

Execution produces on my box:

./dwalker ~/Documents/ ~/Downloads/
Thread 1 3894
Thread 2 7784
There was 11678 files in ...

Everything works as expected.

Monday, April 6, 2015

C pointers, assembler and GDB

Idea here is to do some multitasking, find out how C pointer looks like in assembler and use Linux system calls, GCC and GDB. We start with Hello World:


Execute man 2 write in terminal to find out more about write, 1 is standard output and 12 is number of bytes to be written. I saved it as hello.c and this is how it was compiled:

gcc -O0 -m64 -save-temps -o hlo hello.c

GCC will emit assembly code as temp, I like 64 bit version and no optimizations. Content of temp file is:



It pushes base pointer onto stack, there is q at the end of the push and bp is called rbp, so it must be 64 bit code. Loads string into memory, later into rax, prepares call to write and so on. Now we take that temp file and compile it with -g switch so that we can use GDB.

gcc -g -O0 -o hlo hello.s
gdb hlo


We type in start at prompt and here is the GDB session:


Label .LCO is our string, -8(%rbp) is where pointer is. After info registers in short form, I also restricted info to rbp, I find out that rbp points to  0x7fffffffe360 and at  0x7fffffffe360 - 8 = 0x7fffffffe358 is our pointer. No, 0x60 - 8 is not 0x52. After we send next instruction and line 17 is executed we can see our string in memory, slash 13 c is we want 13 characters from that address.
Since write is Linux system call, we can use strace to see what is going on.


After quite few lines were replaced with three consecutive dots we see that we managed to write all 13 characters to standard output.
Using system calls is so easy that one may even attempt writing to real file:


We added more system calls. We applied common flags and modes in open, checked for all possible errors, after writing string to file we have closed file. If we compile it using

gcc -O0 -m64 -save-temps -o hello2 hello.c

We have binary and also assembly code. With help of strace


Executing cat deleteme.blog we see that write really worked and strace didn't cheat. Examining assembly code will be interesting to those who are curious how if works on low level. Finally every project manager will take a note that from 08:13:04.407694 to 09:24:49.906151 I have written only 21 line of code.

Thursday, April 2, 2015

Extending PostgreSQL with own C functions

Power of open source, if you are not happy with what PostgreSQL currently offers, you write own extension in C. Compile code with your functions into shared library, install it and they will become available from PostgreSQL. OK there are some rules and procedures. Once we are inside PostgreSQL we are using its types, interfaces and utilities. Let us do Hello World example. To build example you will need postgresql-server-dev-9.1 or whatever version you are using, installed on your Linux box.


As we can see we are using new V1 spec. That is modified concat_text example from PostgreSQL documentation, look for 35.9.4. Version 1 Calling Conventions. We have load of different VAR macros. For example:

VARHDRSZ is the same as sizeof(int4), but it's considered good style to use the macro VARHDRSZ to refer to the size of the overhead for a variable-length type.

That is from PostgreSQL documentation. SET_VARSIZE we can find in includes postgres.h


Unless you are on big endian. Going through header file one also can read in comments more about Datum and varlena datatypes. Then we got palloc which corresponds to malloc, memcpy you already know and GET and RETURN macros. It is obvious that for writing extensions one needs to familiarize himself with PostgreSQL internals. Power without knowledge and responsibility exists only in fery tails told by “software evangelists” at annual developers developers developers meetings.
Variables passed around by PostgreSQL may be on the disc, do not change them.
To build shared library I used the following Makefile:


Rather long story to get location of pgxs. That pgxs is location of makefiles for building extensions. It is not trivial build and using provided mk files is right way to do it. After that we can copy say_hello.so to some reasonable location or give full path to it in create function declaration.


PostgreSQL already allows Python through untrusted language PL/Python. One can utilize power of Python for functions or triggers without learning much about PostgreSQL internals. But again if you need power and speed, you can use what PostgreSQL speaks internally and that is C.

Wednesday, April 1, 2015

SQLite parameterized query in C

Still very angry at Discovery Health but that is not reason to stop using C. To execute parameterized query we prepare SQL statement with one or more placeholders. Placeholder could be question mark, alone or followed by number, column, dollar sign or at sign followed by alphanumeric. For example:

select title, full_name from jane where id = @id;

We prepare such statement using sqlite3_prepare_v2, later we bind our parameter and finally execute query. To do binding we will use appropriate function, there is few of them:


All bind functions will return SQLITE_OK if binding is successful or error code if it fails. The first argument is handle to prepared statement. The second argument is index of parameter. The third argument is value to be set. For blob, text we have the fourth argument, size in bytes and the fifth – destructor function. Instead of destructor function we can pass constants SQLITE_STATIC - do nothing or SQLITE_TRANSIENT – make local copy and free it when done. To find out what is index of parameter we are using this function:

int sqlite3_bind_parameter_index(sqlite3_stmt*, const char *zName);

We pass prepared statement and parameter name it returns zero if there is no such parameter or parameter index if it exists. Even if we know that our parameter must have index one, we will still look for it to demonstrate how it is done. Here is the code:


Database should be loaded with required values in previous examples, if not here is sql to create it:

CREATE TABLE jane ( id INTEGER PRIMARY KEY NOT NULL, title TEXT, full_name TEXT NOT NULL )
INSERT INTO jane VALUES(1,'Mr','Smith');
INSERT INTO jane VALUES(2,'Mrs','Doe');
INSERT INTO jane VALUES(3,'Mr','Doe');


After we build it using:

gcc -g test.c -lsqlite3 -o test

We execute test and see the following output:


We could also misspell parameter name and rebuild to check is error handling working.

Tuesday, March 31, 2015

PostgreSQL, libpqxx and prepared statement example

As I promised, if people are finding interesting introductory article,  I will write more about PostgreSQL and libpqxx. But before I start with programming – rant.
I am still looking for work and finding none. Today went to Discovery Health, had very pleasant 45 minutes chat with their architect and BA and promise that we will talk again. Later guy from employment agency, who arranged meeting, calls and says that they do not want me since I do not have ANSI C in CV?! Like somebody from management decided to override decision of interviewers. If ANSI C in CV is precondition, why they wasted my time and invited me for interview? I wish them the same from their customers. BTW I am usually employed as senior developer and not as C developer or, Perl developer.
Back to programming, this time C++, had enough of ANSI C for today. Environment is Linux, I am using the same PostgreSQL and libpqxx as last time and to compile example we will use:

g++ hello_prep.cxx -o hello_prep -I/usr/local/include/ -lpqxx -lpq


This time I used test092.cxx, run dpkg with -L switch on libpqxx3-doc to see where is it. It tests passing binary parameter to prepared statement. Test macros are replaced with printing of tested values to standard output and setup of connection/transaction is included. Here is the code:


After connection is successfully obtained, transaction T is constructed. Temp table is created and after confirmation that prepared statements are supported, testing goes on. We have prepare::declaration and prepare::invocation, available in reference generated with doxygen. Adding parameters is in iterative fashion and feels natural, as they say in documentation like varargs from C. Library is well designed and easy to use, documentation, tutorial and tests are supplied. Lengths and contents should match and test succeeds.

Sunday, March 29, 2015

SQLite C API another convenience routine example

Last time we presented sqlite3_exec example. Beside sqlite3_exec we can use wrapper around  sqlite3_exec and that is sqlite3_get_table. We will get, if call is successful, array of UTF-8 zero terminated strings and we have to free that array at the end. Here is interface for sqlite3_get_table and sqlite3_free_table:


It is quite self-explanatory. Number of rows doesn't count column names and we need to add one to it. Cleanup is required in the any case, if all is ОК we need to call sqlite3_free_table, if there is failure we need to do cleanup of error message, like with  sqlite3_exec. We will retrieve content of that jane table from sqlite3_exec example, then we inserted three rows into table. Here is the code:


Very simple and very user friendly API. If we build and execute binary, we should get error:

Get table failure: no such table: jane

Our table is in surogat.db, we repair example, recompile and we should see table printout:


That is legacy interface and usage is not recommended, though it is very user friendly and that is the reason I supplied example.
As usual, example is built on Linux using gcc and successfully tested. Didn't try on different OS or compiler.

Tests, pointers, arrays and GDB

While I was looking for work, actually I am still looking for work, they sent me to do some tests. Those are some “tech check” rubbish tests which are testing how much of man pages you know by hart. Not do you have logic of programmer and real working knowledge but how well have you memorized help files. So let me explain how you are going to deal with those test and real life problems in sensible way. While agile approach is very desirable in project management, memorizing help files is what industry expects from programmers. Everything further happens on Linux and we will do some debugging to find out answers.
About every book teaching C contains story how one can declare array and access array elements via pointer arithmetic. Something like this:


Expression *arr1d+i is not really pointer arithmetic since dereferencing will happen before addition, and everybody who worked in C longer than two weeks knows it, but it will also produce desired result. I also omitted array length and gcc managed to read it from initializer. Now we can declare some pointers and assign address of our array to them.


Array is just pointer to its first element, we got type and everything right. If we now take address of array we will have double pointer? Not really.


Produces this warning:

warning: initialization from incompatible pointer type [enabled by default]

Since we do not know what is wrong, what type for pointer to array we are getting instead double pointer to integer, we will ask GDB. This is the code:



and we will save it as untitled.c and build using

gcc -g -Wall -o untitled untitled.c
untitled.c: In function ‘main’:
untitled.c:12:16: warning: initialization from incompatible pointer type [enabled by default]
untitled.c:13:7: warning: unused variable ‘p1d11’ [-Wunused-variable]
untitled.c:12:8: warning: unused variable ‘p1d12’ [-Wunused-variable]

This is together with output. Switch -g means that we want debugg info and -Wall that we want all warnings. Now we start interactive session and ask GDB what we want to know:


It printed few lines of messages about license, where to report bugs and similar and loaded symbols for untitled. On prompt (gdb) we type in start and it starts and breaks on the first possible line. We try info locals and see that array is not initialized yet, so we execute next. Now array is initialized and we print it. Finally we ask it to print &arr1d and we learn what is the type of our “double pointer”.


This is what address of array returns and how “double pointer” should be declared, really ugly question on some idiotic test.
Things are becoming more interesting with multidimensional arrays. For example:

int arr2d[][4] = {{1, 2, 3, 4}, {5, 6, 7, 8}};

We can not omit everything even if we are supplying initializer, just the first square bracket may be empty. How about asking some questions? Start GDB session and ask all what you need to know:


Simple as that. There is one more question left, what will happen with double pointer initialized to address of array, why it not working? Again we are agile and write code:


That will execute and print *p = 1?! Start GDB session and check what is happening:


Abbreviated print is p and x will print content of memory at some address. Array is not just pointer, there is size of it what counts. If we have used ld format in printf, we would see slightly bigger output than just one ;-)
That would be such lovely question for test, what would be output if we replace %d with %ld? Naturally it will be *p = 8589934593!
Ask yourself stupid questions for fun and for profit.