grumpy old programmer

Friday, May 15, 2015

Recursion and optimizations

As seen in last instalment calculating Fibonacci numbers using naïve implementation creates binary tree of calls and the same subtrees are calculated over and over again what has huge impact on performance. One of ways to improve performance is caching previous results and that is called memoization. In old Java we will do something like this:

private static final Map<Long, Long> memo = new HashMap<>();
static {
  memo.put(0L, 0L);
  memo.put(1L, 1L);
}
public static void main(String[] args) {
  System.out.println(fibonacci(0));
  System.out.println(fibonacci(43));
  System.out.println(fibonacci(92));
}
static long fibonacci(long n) {
  if (memo.get(n) == null) {
    memo.put(n, fibonacci(n - 1) + fibonacci(n - 2));
  }
  return memo.get(n);
}

We initialize map container with result for first two cases, to avoid if statements, and later if there is mapping for input, we return it and if there is no mapping for input we calculate it. Without help from memoization naïve implementation will choke and die on input 92. Using Java 8 it becomes even shorter, here is just fibonacci method, everything else is the same:

public static long fibonacci(long n) {
  return memo.computeIfAbsent(n, m -> fibonacci(m - 1) + fibonacci(m - 2));
}

Signature of computeIfAbsent is the following:

default V computeIfAbsent(K key,
      Function<? super K, ? extends V> mappingFunction)

where n is key and function is interface Function, we defined it recursively and compiler was not upset. Memoization brings speed but takes space.
How do we turn tail call version into lambda. We can do this for example:

static Function<Long, Long> y = new Function<Long, Long>() {
  private long tailRecursive(long a, long b, long n) {
    if (n > 0) {
      return tailRecursive(b, a + b, n - 1);
    }
    return a;
  }
  public Long apply(Long x) {
    return tailRecursive(0, 1, x);
  }
};
public static long fibonacciFunctional(long n) {
  return y.apply(n);
}

what does not look very functional and very Java 8 because it is anonymous class. Another more Java 8 option is declaring functional interface with sufficient number of arguments:

interface FewArgs<T, U, V, R> {
  public R apply(T t, U u, V v);
}

static FewArgs<Long, Long, Long, Long> tailRecursive;

static {
  tailRecursive = (a, b, n) -> {
    if (n > 0)
      return tailRecursive.apply(b, a + b, n - 1);
    return a;
  };
}

We also can not initialize it in one go, declare class field and initialize it, it must be done from some of host class methods.
Remaining optimization ideas are strictly related to Fibonacci numbers and not applicable to most other recursions. There is not so obvious way to generate Fibonacci numbers:

| 1 1 | n      | F(n+1) F(n)   |
| 1 0 |    =   | F(n)   F(n-1) |

If we rise matrix on the left, lets call it A, to power of n-1 its A[0][0] is Fn. We can prove it using induction. For n=1 our claim obviously holds, now

| F(n+1) F(n)   |   | 1 1 |
| F(n)   F(n-1) | x | 1 0 | =

|F(n+1)+F(n) F(n+1)+0|   |F(n+2) F(n+1)|
|F(n)+F(n−1) F(n)+0 | = |F(n+1) F(n) |

Instead of doing chain multiplication we can do squaring and reduce number of multiplications to log(n) multiplications.

| F(n+1) F(n)   |2
| F(n)   F(n-1) | =

| F(n+1)^2+F(n)^2         F(n+1)*F(n)+F(n)*F(n−1)|
| F(n)*F(n+1)+F(n−1)*F(n) F(n)^2+F(n−1)^2        |

From where one can pull more interesting relations.
Here is illustrative implementation of this idea:

private static long square(long n) {
  if(n==0) 
    return n;
  long test = n-1;
  ArrayList<long> arr = new ArrayList<>();
  long[] A = new long[]{
    1, 1,
    1, 0
  };
  long[] result = new long[4];
  System.arraycopy(A, 0, result, 0, 4);
  while(test>1){
    if(test%2!=0)
      arr.add(1L);
    else
      arr.add(0L);
    test/=2;
  }
  for(int i=arr.size()-1;i>-1;i--){
    result=multiply(result,result);
    if(arr.get(i)==1)
      result=multiply(result,A);
  }
  return result[0];
}
private static long[] multiply(long[] a, long[] b) {
  long[] result = new long[]{
    a[0] * b[0] + a[1] * b[2], a[0] * b[1] + a[1] * b[3],
    a[2] * b[0] + a[3] * b[2], a[2] * b[1] + a[3] * b[3]
  };
  return result;
}
public static void main(String[] arg) {
  for (int i = 0; i < 93; i++) {
    System.out.println("F(" + i + ") = " + square(i));
  }
}

That ArrayList is mapper, consumes some additional space but makes things easier to debug.

Thursday, May 14, 2015

Recursion

Got recently homework to do as part of interview process, already described it here. After providing them with working solution, thoroughly tested, they decided not to speak with me. My conclusion is that was not so bad outcome.
While searching web for clues I discovered this Find all paths from source to destination.
Guy submits his code, peers do review, everything sounds nice.
I will refrain from reviewing particular solution but after looking at it I decided to write about recursion. My impression is that young generations of programmers are putting great effort into mastering frameworks and new features of programming languages but in the same time somehow missing basics. It also coincides with my exploration of new functional features of Java 8.
Recursion is when method, in Java, or function, in C, calls itself during execution. Execution of caller is suspended until callee finishes, then it proceeds. So, there must be some natural way for recursive calls to stop propagating themselves into infinity. If we are not traversing some kind of tree structure, or marking vertices as visited when traversing graph, we typically passing variable to control depth of recursion. Trivial example of recursion is calculation of factorial:

n! = n*(n-1)*(n-2)*...*2*1

It is defined for positive integers. Here is natural and naive implementation together with test:

public static void main(String[] args) {
  System.out.println("9! = "+factorial(9));
}
static long factorial(long n){
  if(n<0)
    return 0;
  if(n<2)
    return 1;
  return n*factorial(n-1);
}

I could throw exception on negative n and assert is 9! equal to 362880. What stops recursion here from proceeding forever is that if statement in combination with decreasing n in recursive calls. Now in order to visualize how execution looks like we will add some debugging code.

public static void main(String[] args) {
  System.out.println("9! = " + factorial(9, 9));
}

static long factorial(long n, long m) {
  for (int i = 0; i < m - n; i++) {
    System.out.print("-");
  }
  System.out.println(n);
  if (n < 0)
    return 0;
  if (n < 2)
    return 1;
  long result = n * factorial(n - 1, m);
  for (int i = 0; i < m - n; i++) {
    System.out.print("-");
  }
  System.out.println(n);
  return result;
}

Code now prints current value of n and number of dashes is how deep we are into recursion. We can see how stack of frames is growing in debugger as well but this is nicer. Output of execution is:

9
-8
--7
---6
----5
-----4
------3
-------2
--------1
-------2
------3
-----4
----5
---6
--7
-8
9
9! = 362880

As expected it behaves in linear fashion. It can be much more interesting than linear, for that we will calculate Fibonacci numbers. In the case of Fibonacci numbers we have the following recursive definition:

Fn = Fn-1 + Fn-2

Trivial and naive implementation looks like this:

public static void main(String[] args) {
  System.out.println("F42 = " + fibonacci(42));
}

static long fibonacci(long n) {
  if (n < 1)
    return 0;
  if (n < 2)
    return 1;
  return fibonacci(n - 1) + fibonacci(n - 2);
}

Execution of this code will take about second or two. Using 42 as function argument illustrates point that naïve implementation is not really most efficient. Mechanics of stopping recursive calls is identical to one in the case of factorial. Let us insert debugging code and see what is going on.

static int indent = 0;

public static void main(String[] args) {
  System.out.println("F5 = " + fibonacci(5, 5));
}

static long fibonacci(long n, long m) {
  for (int i = 0; i < m - n + indent; i++) {
    System.out.print("-");
  }
  System.out.println(n);
  if (n < 1)
    return 0;
  if (n < 2)
    return 1;
  long result = fibonacci(n - 1, m);
  for (int i = 0; i < m - n + indent; i++) {
    System.out.print("-");
  }
  System.out.println(n);
  indent--;
  result = result + fibonacci(n - 2, m);
  indent++;
  for (int i = 0; i < m - n + indent; i++) {
    System.out.print("-");
  }
  System.out.println(n);
  return result;
}

For reasonably sized argument of 2 we have this output:

2
-1
2
-0
2
F2 = 1

Argument 2 is decreased and recursive call is made with argument 1. When it returns to level 0 with result 1 next recursive call is made with argument 0. The second call also returns with result 0 to level 0 and we have 1+0=1 as result.
We try now 3 as argument:

3
-2
--1
-2
--0
-2
3
-1
3
F3 = 2

We start on level 0 with 3-1 call, then we have pattern from last run elevated for one level up, return to level 0 with result 1 and finally 3-2 recursive call and its return with result 1 to level 0. We can make conclusion that recursive calls are building binary tree. We can try argument 4 to recognize patterns for 3 and 2 and so on.
We could achieve the same on the factorial if we were splitting chain multiplication in half, I was writing about that here.
Recursive call does not have to return anything, work can be performed on one of function arguments. For example we can reverse array by swapping elements.

public static void main(String[] args) {
  String r = "Hello World!"; 
  char[] arr = r.toCharArray();
  reverse(arr, r.length()-1, 0);
  System.out.println(arr);
}

public static void reverse(char[] array, int hi, int lo) {
  if(lo>hi)
    return;
  char tmp = array[hi];
  array[hi] = array[lo];
  array[lo] = tmp;
  reverse(array, hi - 1, lo + 1);
}

Here recursion stops when lo is bigger than hi.

Optimization

We can improve on naïve implementation using tail call elimination. We write method in such way that return statement is pure function call and that allows compiler to perform optimization and practically discard frames without placing them onto the stack. For example factorial can be rewritten like this:

static long tailRecursive(long acc, long n) {
  if (n == 0)
    return acc;
  return tailRecursive(n * acc, n - 1);
}

We provide 1 as initial value for accumulator, n is the same as before. For Fibonacci we will need two accumulators:

static long tailRecursive(long a, long b, long n) {
  if (n > 0)
    return tailRecursive(b, a + b, n - 1);
  return a;
}

Initial values are a = 0, b = 1 and n as before. From this form is quite easy to switch to iterative form.

static long notRecursive(long n) {
  if(n == 0)
    return 0;
  long a = 0;
  long b = 1;
  long c;
  do {
    c = b;
    b = a + b;
    a = c;
    --n;
  } while (n > 0);
  return a;
}

About other forms of optimization like memoization, repeated squaring and lambdas, next time.

Sunday, May 10, 2015

Scheduling and Johnson's Algorithm

There is actually paper available on the web where Johnson's scheduling algorithm is nicely described. You can find it here.
So that is not all-pairs shortest paths but scheduling one. For those who are lazy to read it, here is the story. We have programming task paired with testing task. We have one programmer and one tester, tester can test only what is coded, must wait for programmer to finish. In which order jobs should be executed so that whole bunch is finished in shortest time?
Now Johnson's algorithm in plain English:

For each task T1, T2, ..., Tn, determine the P1 and P2 times.
Establish two queues, Q1 at the beginning of the schedule and Q2 at the end of the schedule.
Examine all the P1 and P2 times, and determine the smallest.
If the smallest time is P1, then insert the corresponding task at the end of Q1. Otherwise, the smallest time is P2, and the corresponding task is inserted at the beginning of Q2. In case of a tie between a P1 and P2 time, use the P1 time.
Delete that task from further consideration.
Repeat steps 3 – 5 until all tasks have been assigned.

Very simple. Here is the code:

import java.util.ArrayDeque;
import java.util.Deque;

public class JavaApplication18 {

  public static void main(String[] args) {
    Tasks[] t = new Tasks[6];
    for (int i = 0; i < 6; i++) {
      t[i] = new Tasks();
    }
    t[0].programming = 15;
    t[0].testing = 4;
    t[1].programming = 22;
    t[1].testing = 10;
    t[2].programming = 3;
    t[2].testing = 6;
    t[3].programming = 12;
    t[3].testing = 8;
    t[4].programming = 9;
    t[4].testing = 6;
    t[5].programming = 12;
    t[5].testing = 9;
    executeAll(6, t);
  }

  static void executeAll(int n, Tasks[] t) {
    int[] order = getSchedule(n, t);
    System.out.println("order of execution is:");
    for (int i = 0; i < n; i++) {
      System.out.println((char)(65+order[i]) + " " + t[order[i]].programming +
        " -> " + t[order[i]].testing);
    }
    System.out.println("Complete duration is " + calculateDuration(n, order, t));
  }

  static int calculateDuration(int n, int[] order, Tasks[] t) {
    int max = 0, programmingTime = 0;
    int testingTime = 0, carry;
    for (int i = 0; i < n; i++) {
      programmingTime += t[order[i]].programming;
      carry = 0;
      if (testingTime > programmingTime)
        carry = testingTime - programmingTime;
      testingTime = programmingTime + t[order[i]].testing + carry;
      if (testingTime > max)
        max = testingTime;
    }
    return testingTime;
  }

  static int[] getSchedule(int n, Tasks[] t) {
    int[] order = new int[n];
    int counter = 0, taskIndex = -1, min;
    Deque<integer> programmingDeque = new ArrayDeque<>();
    Deque<integer> testingDeque = new ArrayDeque<>();
    for (int j = 0; j < n; j++) {
      min = 2 ^ 16;
      kind task = null;
      for (int i = 0; i < n; i++) {
        if (!t[i].removed) {
          if (t[i].programming < min) {
            min = t[i].programming;
            task = kind.programming;
            taskIndex = i;
          }
          if (t[i].testing < min) {
            min = t[i].testing;
            task = kind.testing;
            taskIndex = i;
          }
        }
      }
      if (task == kind.programming) {
        programmingDeque.addLast(taskIndex);
        t[taskIndex].removed = true;
      }
      if (task == kind.testing) {
        testingDeque.addFirst(taskIndex);
        t[taskIndex].removed = true;
      }
    }

    while (programmingDeque.size() > 0) {
      order[counter++] = programmingDeque.removeFirst();
    }
    while (testingDeque.size() > 0) {
      order[counter++] = testingDeque.removeFirst();
    }
    return order;
  }

  enum kind {
    programming, testing
  };

  static class Tasks {

    public int programming;
    public int testing;
    public boolean removed;
  }
}

I think that code is self-explanatory and I will not bother you with explanation.
I also had a look at C++ code written by riteshkumargupta here it is slight modification of problem mentioned in paper. If you prefer C++ to Java take a look.

Monday, May 4, 2015

Find all paths from vertex to vertex in directed, weighted graph

Why I am doing homework? I have applied for a job at some company. They did some kind of phone interview and decided to give me “task” before they make up their mind will they ever talk to me about job again. OK, that evening I open Firefox and find in mailbox my tasks. That was few NP hard problems and I can pick and choose. Time for execution is limited, to make everything more interesting. Like reality show but you do not eat snails. Going quickly through them I realized that I am dealing with person who thinks that he or she is lecturer at university and that I do not have clue how to solve any of tasks.
BTW my educational background is physicist. Studied technical physics at school of electrical engineering at University of Belgrade. Never finished. As programmer I am self-educated. While I didn’t get university diploma I have learned to use literature.
Natural choice was the first task, it must be more important than others, that is the reason why it is the first. Task was something about finding all paths from vertex A to vertex B on weighted directed multigraph with cycles. Never encountered graphs before, only trees and they are part of the family. OK, time to google hard and yandex hard. After whole day of searching I got two clues. The first one is complete solution to generate all possible paths up to some length on Perl Monks and the other one is notes from some lecturing by Gunnar. Both clues claiming that adjacency matrix is a way to go and tool of choice is modified Floyd-Warshall. Learned quickly what I need and assembled working program based on degenerated matrix multiplication. After three days almost without sleep, submitted solution and went to bed. Next morning I figured out much simpler solution.
Now friendly warning, what follows is theory by somebody who studied graph theory for one whole day.

Adjacency matrix

Graph is one with directed edges, one way edges, multiple edges between pair of vertices may exist and there may be edge starting and finishing at the same vertex. Aadjacency matrix gives metric on graph space, who is neighbour to whom. Where you can go from some place.

     A    B    C    …    Z
A    1    0    0        1

From vertex A we can travel to A or Z in one move, to B and C we can't go in one move. Now comes curious tourist with question how far is B from A. If we start multiplying adjacency matrix with itself we will find out where we can go in two or more moves. Number of moves corresponds to power of adjacency matrix and if after rising it to power of 4 we see something like this:

     A    B    C    …    Z
A    0    2    1        4

meaning of it is we can't reach A in 4 moves, there is only one path to C but 2 paths to B and 4 paths to Z. Now to find out how many different paths exist from A to B where length of path is up to 4 moves one sums adjacencyMatrix[0][1] for powers from 1 to 4. Those are not shortest pats those are all possible paths including loops and cycles. Good reason to keep binary adjacency matrix around even if we got weights for edges.
So how do we go about finding those two or more moves paths, translating them in chain of edges? Since adjacency matrix is metric on that space we can use it to discover paths. For example where we can go from A in two moves.

     A    B    C    D
A    0    0    1    1
B    1    0    1    0
C    0    1    0    1
D    1    1    0    0

We can reach C or D in single move, from C we can get to B and D in single move and from D we can get to A and B in one move. So paths are ACB, ACD, ADA and ADB. That should be enough to write code for pathfinder.

Java implementation

From some vertex we can go where adjacency matrix says we can go and we move to next vertex. For that next vertex, we are in different row but we do the same thing. Looks like case for recursion. When frames start offloading from the stack they will bring us story where repeated recursive calls were taking them. We just need to send story back. If we do not limit depth of recursion it will never stop. BTW replace depth-1 with depth-- and --depth if you are not sure how suffix and prefix are different.

public ArrayList<string> findPath(int start, int end, int depth) {
  ArrayList<string> result = new ArrayList<>();
  ArrayList<string> tmp = new ArrayList<>();
  for (int i = 0; i < adjacencyMatrix.length; i++) {
    if (adjacencyMatrix[start][i] != 0) {
      if (i == end) {
        result.add(start + "," + end);
      }
      if(depth>1)
        tmp.addAll(findPath(i, end, depth-1));
    }
  }
  if(tmp.size()>0)
    for (String s : tmp) {
      result.add(start + "," + s);
    }
  return result;
}

We have built binary adjacencyMatrix and we are traversing graph. If we find target we store it but we proceed looking for target via calling all vertices which are reachable in single move. With limitation that depth of traversal is bigger than 1 and on each new call we are using reduced incoming depth. When function call comes back we append prefix to all paths. If you print content of that string array you will get something like:

2,3,2,3,2,3
2,3,2,4,1,2,3
2,3,4,1,2,3

It is the same format as used by blokhead from PerlMonks in his solution and you can nicely compare output of your code. Do not forget to make adjacency matrix match :P
What about weights? Instead of having binary adjacency matrix we replace 1 with corresponding edge, actually weight of that edge. I will use integers for weights.

A B C D
A 0 0 3 5

Slightly modified traversal code:

public HashMap<String, Integer> findWeightedPath(int start, int end, int depth) {
  HashMap<String, Integer> result = new HashMap<>();
  for (int i = 0; i < adjacencyMatrix.length; i++) {
    if (adjacencyMatrix[start][i] != 0) {
      if (i == end) {
        result.put(start + "," + end, adjacencyMatrix[start][i]);
      }
      if (depth > 1) {
        HashMap<String, Integer> tmp = findWeightedPath(i, end, depth - 1);
        if (tmp.size() > 0) {
          for (String s : tmp.keySet()) {
            result.put(start + "," + s, 
                tmp.get(s) + adjacencyMatrix[start][i]);
          }
        }
      }
    }
  }
  return result;
}

When you print content of that HashMap it looks like this:

2,3,2,3 = 23
2,3,4,1,2,3 = 33
2,3,2,4,1,2,3 = 35

It still stops on depth. If you need it to stop on weights sum then send data upstream. Don’t go for matrix multiplication if you do not have to, check other options. Nobody is going to ask you to find all paths between all vertices, to find all paths between two nodes recursive traversal looks much better than matrix multiplication.

Tuesday, April 21, 2015

PostgreSQL and PL/Pythonu

Beside SQL PostgreSQL has support for creating functions in many other so called procedural languages. We just need to install them and they are available within database. Standard distribution includes those four languages PL/pgSQL, PL/Tcl, PL/Perl, and PL/Python. There are other externally maintained languages like Java, Scheme, PHP, Unix shell and so on. If that is not enough it is possible to write procedural language handler, it is open source and consequently fully customizable.
Hosted Python is very interesting option. It is quite powerful, versatile language and code is very concise. If it is executed within PostgreSQL it enjoys very fast communication avoiding overhead of DB calls which external programs suffer. Both Python 2 and Python 3 are supported. If you are using, for example, Ubuntu then postgresql-plpython-9.x is Python 2 and postgresql-plpython3-9.x is Python 3. It needs to be installed, server restarted and language created.

createlang plpythonu [dbname]

If we do not specify db, then it will be created on db to which are we currently connected. That u suffix have meaning that we are dealing with unrestricted/untrusted language. Power is given to us and it is our responsibility to use it to our benefit without causing damage. Once everything is in place we can write Hello World function in Python.

CREATE OR REPLACE FUNCTION lumberjack()
  RETURNS text AS
$BODY$
  a = "I'm a lumberjack, and I'm okay. "
  return a + "I sleep all night and I work all day."
$BODY$
  LANGUAGE plpythonu

From psql we try it

testpgdb=# select lumberjack(); 
                              lumberjack                               
----------------------------------------------------------------------- 
 I'm a lumberjack, and I'm okay. I sleep all night and I work all day. 
(1 row) 

Or if you prefer GUI, then use pgAdmin III start Query tool and execute it there.
To enjoy full comfort of Python one need to import modules. How do we use modules in PL/Pythonu? It is no different to ordinary Python code:

create or replace function get_response()
 returns text as
$BODY$
import httplib2
h = httplib2.Http(".cache")
resp, content = h.request("http://example.org/", "GET")
return resp
$BODY$ language 'plpythonu';

select get_response();

we should see response. If we see:

ERROR:  ImportError: No module named httplib2
CONTEXT:  Traceback (most recent call last):
  PL/Python function "get_response", line 2, in <module>
    import httplib2
PL/Python function "get_response"

That is sign that httplib2 is not in path, we can place it in path or if we do not want to reload server we can append to function:

from sys import path
path.append('/path to httplib2')

There is significant quantity of examples on Internet to help you further. With some knowledge of Python one can do really huge amount of work directly from PostgreSQL without overhead of DB connection. In role of mailer of monthly report PL/Pythonu function beats Java Application Server and even C/C++ application connecting from outside to DB.

Thursday, April 9, 2015

GCC Quad-Precision Math Library example

There is small PDF on GCC website with libquadmath documentation. It contains explanation and example how to convert string to __float128 and how to convert __float128 back to string. It took me some googling to find out how to compile example and to figure out what is width parameter. To compile it we need -lquadmath linker switch and width, fourth argument to quadmath_snprintf, is overall width of output, right anchored. Here is my version of mentioned example:

#include <quadmath.h>
#include <stdlib.h>
#include <stdio.h>

int main(){
  __float128 pi = strtoflt128 ("3.1415926535897932384626433832795029", NULL);
  __float128 dpi = 3.1415926535897932384626433832795;
  __float128 qpi = 3.1415926535897932384626433832795029Q;
  char buf[128];
  int n = quadmath_snprintf (buf, sizeof buf, "%*.34Qf",10, pi);
  if ((size_t) n < sizeof buf)
    printf ("%s\n", buf);
  n = quadmath_snprintf (buf, sizeof buf, "%*.34Qf", 40, dpi);
  if ((size_t) n < sizeof buf)
    printf ("%s\n", buf);
  n = quadmath_snprintf (buf, sizeof buf, "%*.34Qf", 45, qpi);
  if ((size_t) n < sizeof buf)
    printf ("%s\n", buf);
  return 0;
}

If we try to assign number without Q suffix then we will have ordinary double inside quad storage. The fourth argument to quadmath_snprintf will be ignored if it is smaller than length of string representation, no spaces will be added in front of number.
If we need to examine layout in memory, we can use modified example from last article. We do not need here -lquadmath linker switch to compile.

#include <stdio.h>
#include <quadmath.h>
#include <stdlib.h>

typedef struct quad_storage {
  unsigned long mant_low:64;
  unsigned long mant_high:48;
  unsigned int exponent:15;
  unsigned int sign:1;
} Quad;

typedef union quad_helper {
  __float128 number;
  Quad storage;
} Helper;

int main(){
  Helper quad = {3.1415926535897932384626433832795029Q};
  printf("sign = %u, exponent = %u, mantissa_h = %lu, mantissa_l = %lu\n", 
  quad.storage.sign, quad.storage.exponent, quad.storage.mant_high, quad.storage.mant_low);
  printf("sign = %X, exponent = %X, mantissa_h = %lX, mantissa_l = %lX\n", 
  quad.storage.sign, quad.storage.exponent, quad.storage.mant_high, quad.storage.mant_low);
  return 0;
}

Interesting part from GDB session:

(gdb) print &pi
$1 = (__float128 *) 0x7fffffffe650
(gdb) x/4 0x7fffffffe650
0x7fffffffe650: 0xc51701b8 0x8469898c 0xb54442d1 0x4000921f

We compare it with output of program to see what we are looking at.
Examining tests from GCC source, we find out that all operators which are supported for other floats are supported on __float128, so no funny function calls to add or compare two numbers. Comfort of normal syntax and 34 significant digits. For example we do square root round trip:

dpi = 2.0;
qpi = sqrtq(dpi);
n = quadmath_snprintf (buf, sizeof buf, "%*.34Qe", 45, dpi-qpi*qpi);
if ((size_t) n < sizeof buf)
  printf ("%s\n", buf);

That code produced -3.8518598887744717061119558851698546e-34 error on my box.

Double precision floats and GDB

They are not human readable in memory like integers, so if you are examining memory in GDB session it is not very likely that you will recognize what number is it. Their layout in memory is specified by IEEE 754 standard. They occupy 64 bits and there is one bit for sign, 11 bits for exponent and 52 bits significand. There is really plenty of read about floats on the web and I will not bother you with theory, significance of their discrete spectrum, rounding and similar. Just want to show you how to handle them during debug session.
I have seen in some lecture, can't remember exactly where, solution to examine memory layout via struct and union. Here is my interpretation of it:

#include <stdio.h>

typedef struct double_storage {
 unsigned long mantissa:52;
 unsigned int exponent:11;
 unsigned int sign:1;
} Double;

typedef union double_helper {
 double number;
 Double storage;
} Helper;

int main(){
 Helper dbl = {3.141592653589793};
 printf("number = %16.15lf sign = %u, exponent = %u, mantissa = %lu\n", 
  dbl.number, dbl.storage.sign, dbl.storage.exponent, dbl.storage.mantissa);
 printf("number = %16.15g sign = %X, exponent = %X, mantissa = %lX\n", 
  dbl.number, dbl.storage.sign, dbl.storage.exponent, dbl.storage.mantissa);
 return 0;
}

In struct we have specified layout and union is helper so that we can see number in human readable representation. We compile it using GCC -g switch and we start GDB session to examine how layout looks like. Before that we execute program once and save output so that we can recognize our number, the second printf is of interest.

number = 3.14159265358979 sign = 0, exponent = 400, mantissa = 921FB54442D18

Here is debug session, rather most interesting parts:

(gdb) print &dbl
$1 = (Helper *) 0x7fffffffe700
(gdb) x/2x 0x7fffffffe700
0x7fffffffe700: 0x54442d18 0x400921fb

After variable is initialized we queried address of it and then we asked for two units to be printed in hex format. Just to be on the safe side, let us print them 32bit by 32bit and 4 bytes by 4 bytes as well

(gdb) x/x 0x7fffffffe704
0x7fffffffe704: 0x400921fb
(gdb) x/4b 0x7fffffffe704
0x7fffffffe704: 0xfb 0x21 0x09 0x40
(gdb) x/x 0x7fffffffe700
0x7fffffffe700: 0x54442d18
(gdb) x/4b 0x7fffffffe700
0x7fffffffe700: 0x18 0x2d 0x44 0x54

Where we clearly have little-endian layout in memory, least significant byte at lowest address. The same number in register

(gdb) print/x $rax
$4 = 0x400921fb54442d18
(gdb) print/x $eax
$5 = 0x54442d18
(gdb) print/x $ax
$6 = 0x2d18
(gdb) print/x $al
$7 = 0x18

Here we do not really have address so no endianness either. Maybe we could just use f format?

(gdb) x/f 0x7fffffffe700
0x7fffffffe700: 3.1415926535897931