Tuesday, April 21, 2015

PostgreSQL and PL/Pythonu

Beside SQL PostgreSQL has support for creating functions in many other so called procedural languages. We just need to install them and they are available within database. Standard distribution includes those four languages PL/pgSQL, PL/Tcl, PL/Perl, and PL/Python. There are other externally maintained languages like Java, Scheme, PHP, Unix shell and so on. If that is not enough it is possible to write procedural language handler, it is open source and consequently fully customizable.
Hosted Python is very interesting option. It is quite powerful, versatile language and code is very concise. If it is executed within PostgreSQL it enjoys very fast communication avoiding overhead of DB calls which external programs suffer. Both Python 2 and Python 3 are supported. If you are using, for example, Ubuntu then postgresql-plpython-9.x is Python 2 and postgresql-plpython3-9.x is Python 3. It needs to be installed, server restarted and language created.

createlang plpythonu [dbname]

If we do not specify db, then it will be created on db to which are we currently connected. That u suffix have meaning that we are dealing with unrestricted/untrusted language. Power is given to us and it is our responsibility to use it to our benefit without causing damage. Once everything is in place we can write Hello World function in Python.


From psql we try it


Or if you prefer GUI, then use pgAdmin III start Query tool and execute it there.
To enjoy full comfort of Python one need to import modules. How do we use modules in PL/Pythonu? It is no different to ordinary Python code:


we should see response. If we see:


That is sign that httplib2 is not in path, we can place it in path or if we do not want to reload server we can append to function:


There is significant quantity of examples on Internet to help you further. With some knowledge of Python one can do really huge amount of work directly from PostgreSQL without overhead of DB connection. In role of mailer of monthly report PL/Pythonu function beats Java Application Server and even C/C++ application connecting from outside to DB.

Thursday, April 9, 2015

GCC Quad-Precision Math Library example

There is small PDF on GCC website with libquadmath documentation. It contains explanation and example how to convert string to __float128 and how to convert __float128 back to string. It took me some googling to find out how to compile example and to figure out what is width parameter. To compile it we need -lquadmath linker switch and width, fourth argument to quadmath_snprintf, is overall width of output, right anchored. Here is my version of mentioned example:


If we try to assign number without Q suffix then we will have ordinary double inside quad storage. The fourth argument to quadmath_snprintf will be ignored if it is smaller than length of string representation, no spaces will be added in front of number.
If we need to examine layout in memory, we can use modified example from last article. We do not need here -lquadmath linker switch to compile.


Interesting part from GDB session:


We compare it with output of program to see what we are looking at.
Examining tests from GCC source, we find out that all operators which are supported for other floats are supported on __float128, so no funny function calls to add or compare two numbers. Comfort of normal syntax and 34 significant digits. For example we do square root round trip:


That code produced -3.8518598887744717061119558851698546e-34 error on my box.

Double precision floats and GDB

They are not human readable in memory like integers, so if you are examining memory in GDB session it is not very likely that you will recognize what number is it. Their layout in memory is specified by IEEE 754 standard. They occupy 64 bits and there is one bit for sign, 11 bits for exponent and 52 bits significand. There is really plenty of read about floats on the web and I will not bother you with theory, significance of their discrete spectrum, rounding and similar. Just want to show you how to handle them during debug session.
I have seen in some lecture, can't remember exactly where, solution to examine memory layout via struct and union. Here is my interpretation of it:


In struct we have specified layout and union is helper so that we can see number in human readable representation. We compile it using GCC -g switch and we start GDB session to examine how layout looks like. Before that we execute program once and save output so that we can recognize our number, the second printf is of interest.

number = 3.14159265358979 sign = 0, exponent = 400, mantissa = 921FB54442D18

Here is debug session, rather most interesting parts:


After variable is initialized we queried address of it and then we asked for two units to be printed in hex format. Just to be on the safe side, let us print them 32bit by 32bit and 4 bytes by 4 bytes as well


Where we clearly have little-endian layout in memory, least significant byte at lowest address. The same number in register


Here we do not really have address so no endianness either. Maybe we could just use f format?

Wednesday, April 8, 2015

OpenMP, file tree walk and map-reduce example

All happens on Linux and we will use GCC and built in OpenMP support and also thread local storage. We will see how easy is to spread workload across threads using OpenMP.
In man pages we find complete example for nftw or file tree walk. That example will print to standard output about everything what it finds while walking directories. Instead of having it printing everything I made it count files:


To compile it I used:

gcc -O1 -o simpleftw simpleftw.c

In order to count files across multiple directories we will pass few directories in argument list to program and also turn our static variable counter into thread local storage. We will use pragma OpenMP parallel for directive combined with reduction which is in our case simple addition:


That print inside for loop should demonstrate that thread local storage works as expected. C99 switch is required, or we will have to specify that variable i is private. Also switch for OpenMP is required. To compile it I used:

gcc -O1 -fopenmp -o dwalker dwalker.c -std=c99

Execution produces on my box:

./dwalker ~/Documents/ ~/Downloads/
Thread 1 3894
Thread 2 7784
There was 11678 files in ...

Everything works as expected.

Monday, April 6, 2015

C pointers, assembler and GDB

Idea here is to do some multitasking, find out how C pointer looks like in assembler and use Linux system calls, GCC and GDB. We start with Hello World:


Execute man 2 write in terminal to find out more about write, 1 is standard output and 12 is number of bytes to be written. I saved it as hello.c and this is how it was compiled:

gcc -O0 -m64 -save-temps -o hlo hello.c

GCC will emit assembly code as temp, I like 64 bit version and no optimizations. Content of temp file is:



It pushes base pointer onto stack, there is q at the end of the push and bp is called rbp, so it must be 64 bit code. Loads string into memory, later into rax, prepares call to write and so on. Now we take that temp file and compile it with -g switch so that we can use GDB.

gcc -g -O0 -o hlo hello.s
gdb hlo


We type in start at prompt and here is the GDB session:


Label .LCO is our string, -8(%rbp) is where pointer is. After info registers in short form, I also restricted info to rbp, I find out that rbp points to  0x7fffffffe360 and at  0x7fffffffe360 - 8 = 0x7fffffffe358 is our pointer. No, 0x60 - 8 is not 0x52. After we send next instruction and line 17 is executed we can see our string in memory, slash 13 c is we want 13 characters from that address.
Since write is Linux system call, we can use strace to see what is going on.


After quite few lines were replaced with three consecutive dots we see that we managed to write all 13 characters to standard output.
Using system calls is so easy that one may even attempt writing to real file:


We added more system calls. We applied common flags and modes in open, checked for all possible errors, after writing string to file we have closed file. If we compile it using

gcc -O0 -m64 -save-temps -o hello2 hello.c

We have binary and also assembly code. With help of strace


Executing cat deleteme.blog we see that write really worked and strace didn't cheat. Examining assembly code will be interesting to those who are curious how if works on low level. Finally every project manager will take a note that from 08:13:04.407694 to 09:24:49.906151 I have written only 21 line of code.

Thursday, April 2, 2015

Extending PostgreSQL with own C functions

Power of open source, if you are not happy with what PostgreSQL currently offers, you write own extension in C. Compile code with your functions into shared library, install it and they will become available from PostgreSQL. OK there are some rules and procedures. Once we are inside PostgreSQL we are using its types, interfaces and utilities. Let us do Hello World example. To build example you will need postgresql-server-dev-9.1 or whatever version you are using, installed on your Linux box.


As we can see we are using new V1 spec. That is modified concat_text example from PostgreSQL documentation, look for 35.9.4. Version 1 Calling Conventions. We have load of different VAR macros. For example:

VARHDRSZ is the same as sizeof(int4), but it's considered good style to use the macro VARHDRSZ to refer to the size of the overhead for a variable-length type.

That is from PostgreSQL documentation. SET_VARSIZE we can find in includes postgres.h


Unless you are on big endian. Going through header file one also can read in comments more about Datum and varlena datatypes. Then we got palloc which corresponds to malloc, memcpy you already know and GET and RETURN macros. It is obvious that for writing extensions one needs to familiarize himself with PostgreSQL internals. Power without knowledge and responsibility exists only in fery tails told by “software evangelists” at annual developers developers developers meetings.
Variables passed around by PostgreSQL may be on the disc, do not change them.
To build shared library I used the following Makefile:


Rather long story to get location of pgxs. That pgxs is location of makefiles for building extensions. It is not trivial build and using provided mk files is right way to do it. After that we can copy say_hello.so to some reasonable location or give full path to it in create function declaration.


PostgreSQL already allows Python through untrusted language PL/Python. One can utilize power of Python for functions or triggers without learning much about PostgreSQL internals. But again if you need power and speed, you can use what PostgreSQL speaks internally and that is C.

Wednesday, April 1, 2015

SQLite parameterized query in C

Still very angry at Discovery Health but that is not reason to stop using C. To execute parameterized query we prepare SQL statement with one or more placeholders. Placeholder could be question mark, alone or followed by number, column, dollar sign or at sign followed by alphanumeric. For example:

select title, full_name from jane where id = @id;

We prepare such statement using sqlite3_prepare_v2, later we bind our parameter and finally execute query. To do binding we will use appropriate function, there is few of them:


All bind functions will return SQLITE_OK if binding is successful or error code if it fails. The first argument is handle to prepared statement. The second argument is index of parameter. The third argument is value to be set. For blob, text we have the fourth argument, size in bytes and the fifth – destructor function. Instead of destructor function we can pass constants SQLITE_STATIC - do nothing or SQLITE_TRANSIENT – make local copy and free it when done. To find out what is index of parameter we are using this function:

int sqlite3_bind_parameter_index(sqlite3_stmt*, const char *zName);

We pass prepared statement and parameter name it returns zero if there is no such parameter or parameter index if it exists. Even if we know that our parameter must have index one, we will still look for it to demonstrate how it is done. Here is the code:


Database should be loaded with required values in previous examples, if not here is sql to create it:

CREATE TABLE jane ( id INTEGER PRIMARY KEY NOT NULL, title TEXT, full_name TEXT NOT NULL )
INSERT INTO jane VALUES(1,'Mr','Smith');
INSERT INTO jane VALUES(2,'Mrs','Doe');
INSERT INTO jane VALUES(3,'Mr','Doe');


After we build it using:

gcc -g test.c -lsqlite3 -o test

We execute test and see the following output:


We could also misspell parameter name and rebuild to check is error handling working.