SLEEF Documentation - Quad-precision math library reference

Table of contents

Introduction

As of version 3.6, SLEEF includes a quad-precision math library. This library includes various functions for computing with IEEE 754 quadruple-precision(QP) floating-point numbers. If the compiler natively supports IEEE 754 QP FP type, numbers in this natively supported QP data type can be directly passed to the library functions. Otherwise, the library defines a data type for retaining numbers in IEEE 754 QP data type.

Many of the functions are fully vectorized to maximize the throughput of computation. For these functions, two vector registers are combined to retain a vector of QP FP numbers, where one of these registers holds the upper part of QP numbers, and the other register holds the lower part. Dedicated functions are provided to load to or store from variables in vector QP data types.

Data types

Sleef_quad

Description

Sleef_quad is a data type for retaining a single QP FP number. If the compiler natively supports IEEE 754 quad precision FP type, then Sleef_quad is an alias of that data type.

typedef __float128 Sleef_quad;

Otherwise, the library defines a 128-bit data type for retaining a number in IEEE 754 QP data format.

Convenience macros and constants

SLEEF_Q - Macro to define any QP FP constant

Synopsis

#include <sleefquad.h>

Sleef_quad SLEEF_Q( int64_t upper, uint64_t lower, int e );

Description

This is a macro for defining any QP FP constant. This macro can be used to represent a QP FP constant on a compiler that does not support __float128 data type. Unlike casting functions, compilers can optimize the code so that an immediate QP FP value is available at runtime without carrying out computation for casting. You can convert any QP FP value to this representation using qutil utility program.


Conversion and output functions

Sleef_strtoq - convert ASCII string to QP FP number

Synopsis

#include <sleefquad.h>

Sleef_quad Sleef_strtoq( const char * str, char ** endptr );

Link with -lsleefquad.

Description

This is a QP version of the strtod function. It converts the given string to a QP FP number. It supports converting from both decimal and hexadecimal numbers.


Sleef_printf, Sleef_fprintf, Sleef_snprintf, Sleef_vprintf, Sleef_vfprintf, Sleef_vsnprintf - Format various data into string and output

Synopsis

#include <sleefquad.h>

int Sleef_printf( const char * fmt, ... );
int Sleef_fprintf( FILE * fp, const char * fmt, ... );
int Sleef_snprintf( char * str, size_t size, const char * fmt, ... );
int Sleef_vprintf( const char * fmt, va_list ap );
int Sleef_vfprintf( FILE * fp, const char * fmt, va_list ap );
int Sleef_vsnprintf( char * str, size_t size, const char * fmt, va_list ap );

Link with -lsleefquad.

Description

These functions are equivalent to the corresponding printf functions in the C standard, except for the following extensions. These functions support converting from QP FP numbers using Q and P modifiers with a, e, f and g conversions. With Q modifier, an immediate QP FP value can be passed to these functions, while a QP FP value can be passed via a pointer with P modifier. These functions only supports the format strings defined in the C standard. Beware of this bug when you use this function with clang.


Sleef_registerPrintfHook, Sleef_unregisterPrintfHook - Register and unregister C library hooks to printf family funcions in the GNU C Library

Synopsis

#include <sleefquad.h>

void Sleef_registerPrintfHook( void);
void Sleef_unregisterPrintfHook( void);

Link with -lsleefquad.

Description

These functions are defined only on systems with Glibc. Glibc has a functionality for installing C library hooks to extend conversion capability of its printf-familiy functions. Sleef_registerPrintfHook installs C library hooks to add support for the Q and P modifiers. Sleef_unregisterPrintfHook uninstalls the C library hooks.


Sleef_cast_to_doubleq1_purec, Sleef_cast_to_doubleq1_purecfma - Convert QP number to double-precision number

Synopsis

#include <sleefquad.h>

double Sleef_cast_to_doubleq1_purec( Sleef_quad a );
double Sleef_cast_to_doubleq1_purecfma( Sleef_quad a );

Link with -lsleefquad.

Description

These functions convert a QP FP value to a double-precision value.


Sleef_cast_from_doubleq1_purec, Sleef_cast_from_doubleq1_purecfma - Convert double-precision number to QP number

Synopsis

#include <sleefquad.h>

Sleef_quad Sleef_cast_from_doubleq1_purec( double a );
Sleef_quad Sleef_cast_from_doubleq1_purecfma( double a );

Link with -lsleefquad.

Description

These functions convert a double-precision value to a QP FP value.


Sleef_cast_to_int64q1_purec, Sleef_cast_to_int64q1_purecfma - Convert QP number to 64-bit signed integer

Synopsis

#include <sleefquad.h>

int64_t Sleef_cast_to_int64q1_purec( Sleef_quad a );
int64_t Sleef_cast_to_int64q1_purecfma( Sleef_quad a );

Link with -lsleefquad.

Description

These functions convert a QP FP value to a 64-bit signed integer.


Sleef_cast_from_int64q1_purec, Sleef_cast_from_int64q1_purecfma - Convert 64-bit signed integer to QP number

Synopsis

#include <sleefquad.h>

Sleef_quad Sleef_cast_from_int64q1_purec( int64_t a );
Sleef_quad Sleef_cast_from_int64q1_purecfma( int64_t a );

Link with -lsleefquad.

Description

These functions convert a 64-bit signed integer to a QP FP value.


Sleef_cast_to_uint64q1_purec, Sleef_cast_to_uint64q1_purecfma - Convert QP number to 64-bit unsigned integer

Synopsis

#include <sleefquad.h>

uint64_t Sleef_cast_to_uint64q1_purec( Sleef_quad a );
uint64_t Sleef_cast_to_uint64q1_purecfma( Sleef_quad a );

Link with -lsleefquad.

Description

These functions convert a QP FP value to a 64-bit signed integer.


Sleef_cast_from_uint64q1_purec, Sleef_cast_from_uint64q1_purecfma - Convert 64-bit unsigned integer to QP number

Synopsis

#include <sleefquad.h>

Sleef_quad Sleef_cast_from_uint64q1_purec( uint64_t a );
Sleef_quad Sleef_cast_from_uint64q1_purecfma( uint64_t a );

Link with -lsleefquad.

Description

These functions convert a 64-bit unsigned integer to a QP FP value.

Comparison functions

QP comparison functions

Synopsis

#include <sleefquad.h>

int32_t Sleef_icmpltq1_purec( Sleef_quad a, Sleef_quad b );
int32_t Sleef_icmpleq1_purec( Sleef_quad a, Sleef_quad b );
int32_t Sleef_icmpgtq1_purec( Sleef_quad a, Sleef_quad b );
int32_t Sleef_icmpgeq1_purec( Sleef_quad a, Sleef_quad b );
int32_t Sleef_icmpeqq1_purec( Sleef_quad a, Sleef_quad b );
int32_t Sleef_icmpneq1_purec( Sleef_quad a, Sleef_quad b );

int32_t Sleef_icmpltq1_purecfma( Sleef_quad a, Sleef_quad b );
int32_t Sleef_icmpleq1_purecfma( Sleef_quad a, Sleef_quad b );
int32_t Sleef_icmpgtq1_purecfma( Sleef_quad a, Sleef_quad b );
int32_t Sleef_icmpgeq1_purecfma( Sleef_quad a, Sleef_quad b );
int32_t Sleef_icmpeqq1_purecfma( Sleef_quad a, Sleef_quad b );
int32_t Sleef_icmpneq1_purecfma( Sleef_quad a, Sleef_quad b );

Link with -lsleefquad.

Description

These functions compare two QP FP values. The lt, le, gt, ge, eq and ne functions return 1 if and only if a is less than, less or equal, greather than, greater or equal, equal and not equal to b, respectively. Otherwise, 0 is returned.


Sleef_icmpq1_purec, Sleef_icmpq1_purecfma - QP comparison

Synopsis

#include <sleefquad.h>

int32_t Sleef_icmpq1_purec( Sleef_quad a, Sleef_quad b );
int32_t Sleef_icmpq1_purecfma( Sleef_quad a, Sleef_quad b );

Link with -lsleefquad.

Description

These functions compare two QP FP values. If a is greater than b, they return 1. If a is less than b, they return -1. Otherwise, they return 0. If either argument is NaN, they return 0.


Sleef_iunordq1_purec, Sleef_iunordq1_purecfma - Check orderedness

Synopsis

#include <sleefquad.h>

int32_t Sleef_iunordq1_purec( Sleef_quad a, Sleef_quad b );
int32_t Sleef_iunordq1_purecfma( Sleef_quad a, Sleef_quad b );

Link with -lsleefquad.

Description

These functions return 1 if either argument is NaN, otherwise 0.

Math functions

QP functions for basic arithmetic operations

Synopsis

#include <sleefquad.h>

Sleef_quad Sleef_addq1_u05purec( Sleef_quad a, Sleef_quad b );
Sleef_quad Sleef_subq1_u05purec( Sleef_quad a, Sleef_quad b );
Sleef_quad Sleef_mulq1_u05purec( Sleef_quad a, Sleef_quad b );
Sleef_quad Sleef_divq1_u05purec( Sleef_quad a, Sleef_quad b );
Sleef_quad Sleef_negq1_purec( Sleef_quad a );

Sleef_quad Sleef_addq1_u05purecfma( Sleef_quad a, Sleef_quad b );
Sleef_quad Sleef_subq1_u05purecfma( Sleef_quad a, Sleef_quad b );
Sleef_quad Sleef_mulq1_u05purecfma( Sleef_quad a, Sleef_quad b );
Sleef_quad Sleef_divq1_u05purecfma( Sleef_quad a, Sleef_quad b );
Sleef_quad Sleef_negq1_purec( Sleef_quad a );

Link with -lsleefquad.

Description

The add, sub, mul and div functions perform addition, subtraction, multiplication and division of two QP FP values. The error bound of these functions are 0.5000000001 ULP. The neg functions return -a.


Sleef_sqrtq1_u05purec, Sleef_sqrtq1_u05purecfma - square root functions

Synopsis

#include <sleefquad.h>

Sleef_quad Sleef_sqrtq1_u05purec( Sleef_quad a );
Sleef_quad Sleef_sqrtq1_u05purecfma( Sleef_quad a );

Link with -lsleefquad.

Description

These functions return nonnegative square root of a value in a. The error bound of the returned value is 0.5000000001 ULP.


Sleef_sinq1_u10purec, Sleef_sinq1_u10purecfma - sine functions

Synopsis

#include <sleefquad.h>

Sleef_quad Sleef_sinq1_u10purec( Sleef_quad a );
Sleef_quad Sleef_sinq1_u10purecfma( Sleef_quad a );

Link with -lsleefquad.

Description

These functions evaluate the sine function of a value in a. The error bound of the returned value is 1.0 ULP.


Sleef_cosq1_u10purec, Sleef_cosq1_u10purecfma - cosine functions

Synopsis

#include <sleefquad.h>

Sleef_quad Sleef_cosq1_u10purec( Sleef_quad a );
Sleef_quad Sleef_cosq1_u10purecfma( Sleef_quad a );

Link with -lsleefquad.

Description

These functions evaluate the cosine function of a value in a. The error bound of the returned value is 1.0 ULP.


Sleef_tanq1_u10purec, Sleef_tanq1_u10purecfma - tangent functions

Synopsis

#include <sleefquad.h>

Sleef_quad Sleef_tanq1_u10purec( Sleef_quad a );
Sleef_quad Sleef_tanq1_u10purecfma( Sleef_quad a );

Link with -lsleefquad.

Description

These functions evaluate the tangent function of a value in a. The error bound of the returned value is 1.0 ULP.


Sleef_asinq1_u10purec, Sleef_asinq1_u10purecfma - arc sine functions

Synopsis

#include <sleefquad.h>

Sleef_quad Sleef_asinq1_u10purec( Sleef_quad a );
Sleef_quad Sleef_asinq1_u10purecfma( Sleef_quad a );

Link with -lsleefquad.

Description

These functions evaluate the arc sine function of a value in a. The error bound of the returned value is 1.0 ULP.


Sleef_acosq1_u10purec, Sleef_acosq1_u10purecfma - arc cosine functions

Synopsis

#include <sleefquad.h>

Sleef_quad Sleef_acosq1_u10purec( Sleef_quad a );
Sleef_quad Sleef_acosq1_u10purecfma( Sleef_quad a );

Link with -lsleefquad.

Description

These functions evaluate the arc cosine function of a value in a. The error bound of the returned value is 1.0 ULP.


Sleef_atanq1_u10purec, Sleef_atanq1_u10purecfma - arc tangent functions

Synopsis

#include <sleefquad.h>

Sleef_quad Sleef_atanq1_u10purec( Sleef_quad a );
Sleef_quad Sleef_atanq1_u10purecfma( Sleef_quad a );

Link with -lsleefquad.

Description

These functions evaluate the arc tangent function of a value in a. The error bound of the returned value is 1.0 ULP.


Sleef_expq1_u10purec, Sleef_expq1_u10purecfma - base-e exponential functions

Synopsis

#include <sleefquad.h>

Sleef_quad Sleef_expq1_u10purec( Sleef_quad a );
Sleef_quad Sleef_expq1_u10purecfma( Sleef_quad a );

Link with -lsleefquad.

Description

These functions return the value of e raised to a. The error bound of the returned value is 1.0 ULP.


Sleef_exp2q1_u10purec, Sleef_exp2q1_u10purecfma - base-2 exponential functions

Synopsis

#include <sleefquad.h>

Sleef_quad Sleef_exp2q1_u10purec( Sleef_quad a );
Sleef_quad Sleef_exp2q1_u10purecfma( Sleef_quad a );

Link with -lsleefquad.

Description

These functions return 2 raised to a. The error bound of the returned value is 1.0 ULP.


Sleef_exp10q1_u10purec, Sleef_exp10q1_u10purecfma - base-10 exponentail

Synopsis

#include <sleefquad.h>

Sleef_quad Sleef_exp10q1_u10purec( Sleef_quad a );
Sleef_quad Sleef_exp10q1_u10purecfma( Sleef_quad a );

Link with -lsleefquad.

Description

These functions return 10 raised to a. The error bound of the returned value is 1.0 ULP.


Sleef_expm1q1_u10purec, Sleef_expm1q1_u10purecfma - base-e exponential functions minus 1

Synopsis

#include <sleefquad.h>

Sleef_quad Sleef_expm1q1_u10purec( Sleef_quad a );
Sleef_quad Sleef_expm1q1_u10purecfma( Sleef_quad a );

Link with -lsleefquad.

Description

These functions return the value one less than e raised to a. The error bound of the returned value is 1.0 ULP.


Sleef_logq1_u10purec, Sleef_logq1_u10purecfma - natural logarithmic functions

Synopsis

#include <sleefquad.h>

Sleef_quad Sleef_logq1_u10purec( Sleef_quad a );
Sleef_quad Sleef_logq1_u10purecfma( Sleef_quad a );

Link with -lsleefquad.

Description

These functions return the natural logarithm of a. The error bound of the returned value is 1.0 ULP.


Sleef_log2q1_u10purec, Sleef_log2q1_u10purecfma - base-2 logarithmic functions

Synopsis

#include <sleefquad.h>

Sleef_quad Sleef_log2q1_u10purec( Sleef_quad a );
Sleef_quad Sleef_log2q1_u10purecfma( Sleef_quad a );

Link with -lsleefquad.

Description

These functions return the base-2 logarithm of a. The error bound of the returned value is 1.0 ULP.


Sleef_log10q1_u10purec, Sleef_log10q1_u10purecfma - base-10 logarithmic functions

Synopsis

#include <sleefquad.h>

Sleef_quad Sleef_log10q1_u10purec( Sleef_quad a );
Sleef_quad Sleef_log10q1_u10purecfma( Sleef_quad a );

Link with -lsleefquad.

Description

These functions return the base-10 logarithm of a. The error bound of the returned value is 1.0 ULP.


Sleef_log1pq1_u10purec, Sleef_log1pq1_u10purecfma - logarithm of one plus argument

Synopsis

#include <sleefquad.h>

Sleef_quad Sleef_log1pq1_u10purec( Sleef_quad a );
Sleef_quad Sleef_log1pq1_u10purecfma( Sleef_quad a );

Link with -lsleefquad.

Description

These functions return the natural logarithm of ( 1 + a ). The error bound of the returned value is 1.0 ULP.

Tutorial

I would like to show an example of how the vectorized QP functions can be used. Below is a source code for computing π with Machin's formula on x86 GNU systems. This formula has two terms with arc tangent which can be independently calculated. In this example, these two terms are computed using vector functions. __float128 data type is defined on x86, and you can use literals in this type to initialize the variables. Variables q0, q1, q2 and q3 retain two QP FP values each. At line 5, 8 and 11, QP values in arrays are loaded into these variables with Sleef_loadq2_sse2 function. Vector operations are carried out from line 14 to 16, and then each element of the vector variable q3 is extracted with Sleef_getq2_sse2 function and subtracted to obtain the result at line 18. Note that the QP subtract function in the standard library is called to do this subtraction. This result is output to the console using Sleef_printf function at line 22. Note that this source code can be built with clang, on which libquadmath is not available.

#include <sleefquad.h>

int main(int argc, char **argv) {
  __float128 a0[] = { 5, 239 };
  Sleef_quadx2 q0 = Sleef_loadq2_sse2(a0);

  __float128 a1[] = { 1, 1 };
  Sleef_quadx2 q1 = Sleef_loadq2_sse2(a1);

  __float128 a2[] = { 16, 4 };
  Sleef_quadx2 q2 = Sleef_loadq2_sse2(a2);

  Sleef_quadx2 q3;
  q3 = Sleef_divq2_u05sse2(q1, q0);
  q3 = Sleef_atanq2_u10sse2(q3);
  q3 = Sleef_mulq2_u05sse2(q3, q2);

  __float128 pi = Sleef_getq2_sse2(q3, 0) - Sleef_getq2_sse2(q3, 1);

  Sleef_printf("%.40Pg\n", &pi);
}

Fig. 4.1: Example source code for x86 computers


__float128 data type is not defined on MSVC, and thus we cannot use literals of this type to initialize QP variables. Sleef provides various conversion functions for this purpose. In the following source code, Sleef_cast_from_doubleq1_purec function is used to initialize q0 from line 4 to 5. Sleef_strtoq function is used to initialize q1 at line 7. Here, Sleef_splatq2_sse2 function is a function for setting the specified QP FP value to all elements in a vector. Obtaining QP FP constants by calling these functions may waste some CPU time. From line 9 and 12, SLEEF_Q macro is used to intialize q2. SLEEF_Q is a macro for defining a QP FP constant, with which an immediate value is available at runtime and there is no wasting of CPU time. You can convert any QP FP value to this representation using qutil utility program. At line 20, Sleef_subq1_u05purec function is used to subtract a scalar value from another to obtain the result.

#include <sleefquad.h>

int main(int argc, char **argv) {
  Sleef_quad a0[] = { Sleef_cast_from_doubleq1_purec(5), Sleef_cast_from_doubleq1_purec(239) };
  Sleef_quadx2 q0 = Sleef_loadq2_sse2(a0);

  Sleef_quadx2 q1 = Sleef_splatq2_sse2(Sleef_strtoq("1.0", NULL));

  Sleef_quadx2 q2 = Sleef_loadq2_sse2((Sleef_quad[]) {
      SLEEF_Q(+0x1000000000000LL, 0x0000000000000000ULL, 4), // 16.0
      SLEEF_Q(+0x1000000000000LL, 0x0000000000000000ULL, 2), // 4.0
  });

  Sleef_quadx2 q3;

  q3 = Sleef_divq2_u05sse2(q1, q0);
  q3 = Sleef_atanq2_u10sse2(q3);
  q3 = Sleef_mulq2_u05sse2(q3, q2);

  Sleef_quad pi = Sleef_subq1_u05purec(Sleef_getq2_sse2(q3, 0), Sleef_getq2_sse2(q3, 1));

  Sleef_printf("%.40Pg\n", &pi);
}

Fig. 4.2: Example source code for MSVC