VVbuys Blog

standalone Linux lover

The has_type relation is good but doesn’t give us a executable algorithm – 不是一个算法
but it’s syntax directed, just one typing rule for one term (unique typing) – translate into function!

Comparing Types

首先我们需要 check equality for types.
这里非常简单,如果是 SystemF 会麻烦很多,对 要做 local nameless 或者 alpha renaming:

1
2
3
4
5
6
7
8
9
Fixpoint eqb_ty (T1 T2:ty) : bool :=
match T1,T2 with
| Bool, Bool ⇒
true
| Arrow T11 T12, Arrow T21 T22 ⇒
andb (eqb_ty T11 T21) (eqb_ty T12 T22)
| _,_
false
end.

然后我们需要一个 refl 和一个 reflection,准确得说:「define equality by computation」,反方向用 refl 即可易证

1
2
3
4
5
Lemma eqb_ty_refl : ∀T1,
eqb_ty T1 T1 = true.

Lemma eqb_ty__eq : ∀T1 T2,
eqb_ty T1 T2 = true → T1 = T2.

The Typechecker

直接 syntax directed,不过麻烦的是需要 pattern matching option

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Fixpoint type_check (Gamma : context) (t : tm) : option ty :=
match t with
| var x =>
Gamma x
| abs x T11 t12 =>
match type_check (update Gamma x T11) t12 with (** <-- 对应 t12 的 rule **)
| Some T12 => Some (Arrow T11 T12)
| _ => None
end
| app t1 t2 =>
match type_check Gamma t1, type_check Gamma t2 with
| Some (Arrow T11 T12),Some T2 =>
if eqb_ty T11 T2 then Some T12 else None (** eqb_ty 见下文 **)
| _,_ => None
end
...

在课堂时提到关于 eqb_ty 的一个细节(我以前也经常犯,在 ML/Haskell 中……):
我们能不能在 pattern matching 里支持「用同一个 binding 来 imply 说他们两需要 be equal」?

1
2
3
4
5
(** instead of this **)
| Some (Arrow T11 T12),Some T2 => if eqb_ty T11 T2 then ...

(** can we do this? **)
| Some (Arrow T T' ),Some T => ...

the answer is NO because this demands a decidable equality.
我好奇的是,用 typeclass 是不是就可以 bake in 这个功能了?尤其是在 Coq function 还是 total 的情况下

Digression: Improving the Notation

这里我们可以自己定义一个 Haskell do notation 风格的 monadic notation:

1
2
3
4
5
6
7
8
9
10
11
Notation " x <- e1 ;; e2" := (match e1 with
| Some x ⇒ e2
| None ⇒ None
end)
(right associativity, at level 60).

Notation " 'return' e "
:= (Some e) (at level 60).

Notation " 'fail' "
:= None.

好看一些吧反正:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
Fixpoint type_check (Gamma : context) (t : tm) : option ty :=
match t with
| var x ⇒
Gamma x
| abs x T11 t12 ⇒
T12 <- type_check (update Gamma x T11) t12 ;;
return (Arrow T11 T12)
| app t1 t2 ⇒
T1 <- type_check Gamma t1 ;;
T2 <- type_check Gamma t2 ;;
match T1 with
| Arrow T11 T12 ⇒ if eqb_ty T11 T2 then return T12 else fail
| _fail
end

Properties

最后我们需要验证一下算法的正确性:
这里的 soundness 和 completess 都是围绕 “typechecking function ~ typing relation inference rule” 这组关系来说的:

1
2
3
4
5
6
Theorem type_checking_sound : ∀Gamma t T,
type_check Gamma t = Some T → has_type Gamma t T.

Theorem type_checking_complete : ∀Gamma t T,
has_type Gamma t T → type_check Gamma t = Some T.

Exercise

MoreStlc.v 里的 StlcE 写 typechecker, 然后 prove soundness / completeness (过程中用了非常 mega 的 tactics)

1
2
3
4
5
6
7
8
9
10
11
12
(** 还不能这么写 **)
| fst p =>
(Prod T1 T2) <- type_check Gamma p ;;


(** 要这样……感觉是 notation 的缘故?并且要提供 fallback case 才能通过 exhaustive check 是真的 **)
| fst p =>
Tp <- type_check Gamma p ;;
match Tp with
| (Prod T1 T2) => T1
| _ => fail
end.

Extra Exercise (Prof.Mtf)

I believe this part of exercise was added by Prof. Fluet (not found in SF website version)

MoreStlc.v 的 operational semantics 写 Interpreter (stepf), 然后 prove soundness / completeness…

step vs. stepf

首先我们定义了 value 关系的函数版本 valuef
然后我们定义 step 关系的函数版本 stepf:

以 pure STLC 为例:

1
2
3
4
5
6
7
8
9
10
11
Inductive step : tm -> tm -> Prop :=
| ST_AppAbs : forall x T11 t12 v2,
value v2 ->
(app (abs x T11 t12) v2) --> [x:=v2]t12
| ST_App1 : forall t1 t1' t2,
t1 --> t1' ->
(app t1 t2) --> (app t1' t2)
| ST_App2 : forall v1 t2 t2',
value v1 ->
t2 --> t2' ->
(app v1 t2) --> (app v1 t2')
1
2
3
4
5
6
7
8
9
10
11
12
13
Fixpoint stepf (t : tm) : option tm :=
match t with
| var x => None (* We only define step for closed terms *)
| abs x1 T1 t2 => None (* Abstraction is a value *)
| app t1 t2 =>
match stepf t1, stepf t2, t1 with
| Some t1', _ , _ => Some (app t1' t2)
| None , Some t2', _ => assert (valuef t1) (Some (app t1 t2')) (* otherwise [t1] is a normal form *)
| None , None , abs x T t11 => assert (valuef t2) (Some ([x:=t2]t11)) (* otherwise [t1], [t2] are normal forms *)
| _ , _ , _ => None
end

Definition assert (b : bool) (a : option tm) : option tm := if b then a else None.
  1. 对于关系,一直就是 implicitly applied 的,在可用时即使用。
    对于函数,我们需要手动指定 match 的顺序

  2. stepf t1 => None 只代表这是一个 normal form,但不一定就是 value,还有可能是 stuck 了,所以我们需要额外的 assertion. (失败时返回异常)
    dynamics 本身与 statics 是正交的,在 typecheck 之后我们可以有 progress,但是现在还没有

Soundness

1
2
Theorem sound_stepf : forall t t',
stepf t = Some t' -> t --> t'.

证明用了一个 given 的非常夸张的 automation…

不过帮助我找到了 stepfstep 的多处 inconsistency:

  • 3 次做 subst 时依赖的 valuef 不能省
  • valuef pair 该怎么写才合适?
    最后把 step 中的 value p -> 改成了 value v1 -> value v2 ->
    因为 valuef (pair v1 v2) 出来的 valuef v1 && valuef v2 比较麻烦。
    但底线是:两者必须 consistent! 这时就能感受到 Formal Methods 的严谨了。

Completeness

发现了 pair 实现漏了 2 个 case……然后才发现了 Soundness 自动化中的 valuef pair 问题

Extra (Mentioned)


Church Style vs. Curry Style
Rice’s Theorem

CakeML

  • prove correctness of ML lang compiler
  • latest paper on verifying GC

Adding Records

1
2
3
4
5
6
7
8
9
10
11
12
t ::=                          Terms:
| {i1=t1, ..., in=tn} record
| t.i projection
| ...

v ::= Values:
| {i1=v1, ..., in=vn} record value
| ...

T ::= Types:
| {i1:T1, ..., in:Tn} record type
| ...

Formalizing Records

Hux: this chapter is very similar to TAPL - ch13 References
But under a “formal verification” concept, it’s more interesting and practical and push you to think about it!

computational effects - “side effects” of computation - impure features

  • assign to mutable variables (reference cells, arrays, mutable record fields, etc.)
  • perform input and output to files, displays, or network connections;
  • make non-local transfers of control via exceptions, jumps, or continuations;
  • engage in inter-process synchronization and communication

The main extension will be dealing explicitly with a

  • store (or heap) and
  • pointers (or reference) that name store locations, or address

interesting refinement: type preservation

Definition

forms of assignments:

  • rare : Gallina - No
  • some : ML family - Explicit reference and dereference
  • most : C family - Implicit …

For formal study, use ML’s model.

Syntax

Types & Terms

1
2
3
4
5
6
7
8
9
10
11
12
T ::= 
| Nat
| Unit
| T → T
| Ref T

t ::=
| ... Terms
| ref t allocation
| !t dereference
| t := t assignment
| l location
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Inductive ty : Type :=
| Nat : ty
| Unit : ty
| Arrow : ty → ty → ty
| Ref : ty → ty.

Inductive tm : Type :=
(* STLC with numbers: *)
...
(* New terms: *)
| unit : tm
| ref : tm → tm
| deref : tm → tm
| assign : tm → tm → tm
| loc : nat → tm. (** 这里表示 l 的方式是 wrap 一个 nat as loc **)

Typing

                       Gamma |- t1 : T1
                   ------------------------                         (T_Ref)
                   Gamma |- ref t1 : Ref T1

                    Gamma |- t1 : Ref T11
                    ---------------------                         (T_Deref)
                      Gamma |- !t1 : T11

                    Gamma |- t1 : Ref T11
                      Gamma |- t2 : T11
                   ------------------------                      (T_Assign)
                   Gamma |- t1 := t2 : Unit

Values and Substitution

1
2
3
4
Inductive value : tm → Prop :=
...
| v_unit : value unit
| v_loc : ∀l, value (loc l). (* <-- 注意这里是一个 Π (l:nat) . value (loc l) *)
1
2
3
4
5
6
7
8
9
Fixpoint subst (x:string) (s:tm) (t:tm) : tm :=
match t with
...
| unit ⇒ t
| ref t1 ⇒ ref (subst x s t1)
| deref t1 ⇒ deref (subst x s t1)
| assign t1 t2 ⇒ assign (subst x s t1) (subst x s t2)
| loc _ ⇒ t
end.

Pragmatics

Side Effects and Sequencing

r:=succ(!r); !r

can be desugar to

(\x:Unit. !r) (r:=succ(!r)).

then we can write some “imperative programming”

r:=succ(!r); 
r:=succ(!r); 
r:=succ(!r); 
!r

References and Aliasing

shared reference brings _shared state

let r = ref 5 in
let s = r in
s := 82;
(!r)+1

Shared State

thunks as methods

1
2
3
4
5
6
7
8
9

let c = ref 0 in
let incc = \_:Unit. (c := succ (!c); !c) in
let decc = \_:Unit. (c := pred (!c); !c) in (
incc unit;
incc unit; -- in real PL: the concrete syntax is `incc()`
decc unit
)

Objects

constructor and encapsulation!

1
2
3
4
5
6
7
8
9

newcounter =
\_:Unit. -- add `(self, init_val)` would make it more "real"
let c = ref 0 in -- private and only accessible via closure (特权方法)
let incc = \_:Unit. (c := succ (!c); !c) in
let decc = \_:Unit. (c := pred (!c); !c) in
{ i=incc,
d=decc } -- return a "record", or "struct", or "object"!

References to Compound Types (e.g. Function Type)

Previously, we use closure to represent map, with functional update
这里的”数组” (这个到底算不算数组估计都有争议,虽然的确提供了 index 但是这个显然是 O(n) 都不知道算不算 random access…
并不是 in-place update 里面的数据的,仅仅是一个 ref 包住的 map 而已 (仅仅是多了可以 shared

其实或许 list (ref nat) 也可以表达数组? 反正都是 O(n) 每次都 linear search 也一样……

1
2
3
4
5
6
7

newarray = \_:Unit. ref (\n:Nat.0)
lookup = \a:NatArray. \n:Nat. (!a) n
update = \a:NatArray. \m:Nat. \v:Nat.
let oldf = !a in
a := (\n:Nat. if equal m n then v else oldf n);

Null References

nullptr!

Deref a nullptr:

  • exception in Java/C#
  • insecure in C/C++ <– violate memory safety!!
1
2
3
4

type Option T = Unit + T
type Nullable T = Option (Ref T)

Why is Option outside?
think about C, nullptr is A special const location, like Unit (None in terms of datacon) here.

Garbage Collection

last issue: store de-allocation

w/o GC, extremely difficult to achieve type safety…if a primitive for “explicit deallocation” provided
one can easily create dangling reference i.e. references -> deleted

One type-unsafe example: (pseudo code)

1
2
3
4
5
6
7

a : Ref Nat = ref 1; -- alloc loc 0
free(a); -- free loc 0
b : Ref Bool = ref True; -- alloc loc 0

a := !a + 1 -- BOOM!

Operational Semantics

Locations

what should be the values of type Ref T?

ref allocate some memory/storage!

run-time store is essentially big array of bytes.
different datatype need to allocate different size of space (region)

we think store as array of values, abstracting away different size of different values
we use the word location here to prevent from modeling pointer arithmetic, which is un-trackable by most type system

location n is float doesn’t tell you anything about location n+4

Stores

we defined replace as Fixpoint since it’s computational and easier. The consequence is it has to be total.

Reduction

Typing

typing context:

1
Definition context := partial_map ty.

Store typings

why not just make a context a map of pair?
we don’t want to complicate the dynamics of language,
and this store typing is only for type check.

The Typing Relation

Properties

Well-Typed Stores

Extending Store Typings

Preservation, Finally

Substitution Lemma

Assignment Preserves Store Typing

Weakening for Stores

Preservation!

Progress

References and Nontermination

1
2
3
4
Inductive ty : Type :=
(* record types *)
| RNil : ty
| RCons : string → ty → ty → ty.

we need typecon to identify record…

Inductive tm : Type :=
  | rproj ...?  isn't it as well?
  (* record terms *)
  | rnil : tm
  | rcons : string → tm → tm → tm.
``

as a list...


for Record, can compiler reorder the fields? (SML and OCaml)




1
From PLF Require Import LibTactics.

LibTactics vs. SSReflect (another tactics package)

  • for PL vs. for math
  • traditional vs. rethinks..so harder

Tactics for Naming and Performing Inversion

introv

1
2
3
4
5
6
Theorem ceval_deterministic: ∀c st st1 st2,
st =[ c ]⇒ st1 →
st =[ c ]⇒ st2 →
st1 = st2.
intros c st st1 st2 E1 E2. (* 以往如果想给 Hypo 命名必须说全 *)
introv E1 E2. (* 现在可以忽略 forall 的部分 *)

inverts

1
2
3
4
5
6
7
8
(* was... 需要 subst, clear *)
- inversion H. subst. inversion H2. subst.
(* now... *)
- inverts H. inverts H2.


(* 可以把 invert 出来的东西放在 goal 的位置让你自己用 intro 命名!*)
inverts E2 as.

Tactics for N-ary Connectives

Because Coq encodes conjunctions and disjunctions using binary constructors ∧ and ∨…
to work with a N-ary logical connectives…

splits

n-ary conjunction

n-ary split

branch

n-ary disjunction

faster destruct?

Tactics for Working with Equality

asserts_rewrite and cuts_rewrite

substs

better subst - not fail on circular eq

fequals

vs f_equal?

applys_eq

variant of eapply

Some Convenient Shorthands

unfolds

better unfold

false and tryfalse

better exfalso

gen

shorthand for generalize dependent, multiple arg.

1
2
3
4
5
6
(* old *)
intros Gamma x U v t S Htypt Htypv.
generalize dependent S. generalize dependent Gamma.

(* new...so nice!!! *)
introv Htypt Htypv. gen S Gamma.

admits, admit_rewrite and admit_goal

wrappers around admit

sort

proof context more readable

vars -> top
hypotheses -> bottom

Tactics for Advanced Lemma Instantiation

Working on lets

Working on applys, forwards and specializes

four proof-search tactics: auto, eauto, iauto and jauto.


How Proof Search Works

Search Depth

Backtracking

Adding Hints

Integration of Automation in Tactics


Example Proofs



Decision Procedures

Omega

Ring

Congurence

Integers, or whole number from elemental mathematics, are the most common and
fundamental numbers used in the computers. It’s represented as
fixed-point numbers, contrast to floating-point numbers in the machine.
Today we are going to learn a whole bunch of way to encode it.

There are mainly two properties to make a integer representation different:

  1. Size, of the number of bits used.
    usually the power of 2. e.g. 8-bit, 16-bit, 32-bit, 64-bit.

  2. Signed or unsigned.
    there are also multiple schemas to encode a signed integers.

We are also gonna use the below terminologies throughout the post:

  • MSB: Most Significant Bit
  • LSB: Least Significant Bit

Prerequisite - printf Recap

We will quickly recap the integers subset of usages of printf.
Basically, we used format specifier to interpolate values into strings:

Format Specifier

%[flags][width][.precision][length]specifier

  • specifier
    • d, i : signed decimal
    • u : unsigned decimal
    • c : char
    • p: pointer addr
    • x / X : lower/upper unsigned hex
  • length
    • l : long (at least 32)
    • ll : long long (at least 64)
    • h : short (usually 16)
    • hh : short short (usually 8)
1
2
3
4
5
6
7
8
using namespace std;
int main() {
cout << "Size of int = "<< sizeof(int) << endl;
cout << "Size of long = " << sizeof(long) << endl;
cout << "Size of long long = " << sizeof(long long);
}
Output in 32 bit gcc compiler: 4 4 8
Output in 64 bit gcc compiler: 4 8 8

inttypes.h from C99

Also in cppreference.com

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
// signed int (d or i)
#define PRId8 "hhd"
#define PRId16 "hd"
#define PRId32 "ld"
#define PRId64 "lld"

// unsigned int (u)
#define PRIu8 "hhd"
#define PRIu16 "hd"
#define PRIu32 "ld"
#define PRIu64 "lld"

// unsigned hex
#define PRIx8 "hhu"
#define PRIx16 "hu"
#define PRIx32 "lu"
#define PRIx64 "llu"

// uintptr_t (64 bit machine word len)
#define PRIxPTR "llx"

Unsigned Integers

The conversion between unsigned integers and binaries are trivial.
Here, we can represent 8 bits (i.e. a byte) as a hex pair, e.g.
255 == 0xff == 0b11111111.

1
2
3
4
5
6
#include <stdint.h>    // uintN_t
#include <inttypes.h> // PRI macros

uint8_t u8 = 255;
printf("0x%02" PRIx8 "\n", u8); // 0xff
printf( "%" PRId8 "\n", u8); // 255

Signed Integers

Signed integers are more complicated. We need to cut those bits to halves
to represent both positive and negative integers somehow.

There are four well-known schemas to encode it, according to
signed number representation of wikipedia.

Sign magnitude 原码

It’s also called “sign and magnitude”. From the name we can see how straightforward it is:
it’s basically put one bit (often the MSB) as the sign bit to represent sign and the remaining bits indicating
the magnitude (or absolute value), e.g.

1
2
3
4
5
6
7
  binary   | sign-magn |  unsigned
-----------|-----------|------------
0 000 0000 | +0 | 0
0 111 1111 | 127 | 127
...
1 000 0000 | -0 | 128
1 111 1111 | -127 | 255

It was used in early computer (IBM 7090) and now mainly used in the
significand part in floating-point number

Pros:

  • simple and nature for human

Cons:

  • 2 way to represent zeros (+0 and -0)
  • not as good for machine
    • add/sub/cmp require knowing the sign
      • complicate CPU ALU design; potentially more cycles

Ones’ complement 反码

It form a negative integers by applying a bitwise NOT
i.e. complement of its positive counterparts.

1
2
3
4
5
6
7
8
9
10
  binary   |  1s comp  |  unsigned
-----------|-----------|------------
0000 0000 | 0 | 0
0000 0001 | 1 | 1
...
0111 1111 | 127 | 127
1000 0000 | -127 | 128
...
1111 1110 | -1 | 254
1111 1111 | -0 | 255

N.B. MSB can still be signified by MSB.

It’s referred to as ones’ complement because the negative can be formed
by subtracting the positive from ones: 1111 1111 (-0)

1
2
3
4
  1111 1111       -0
- 0111 1111 127
---------------------
1000 0000 -127

The benefits of the complement nature is that adding becomes simple,
except we need to do an end-around carry to add resulting carry
back to get the correct result.

1
2
3
4
5
6
7
  0111 1111       127
+ 1000 0001 -126
---------------------
1 0000 0000 0
1 +1 <- add carry "1" back
---------------------
0000 0001 1

Pros:

  • Arithmetics on machien are fast.

Cons:

  • still 2 zeros!

Twos’ complement 补码

Most of the current architecture adopted this, including x86, MIPS, ARM, etc.
It differed with one’s complement by one.

1
2
3
4
5
6
7
8
9
10
11
  binary   |  2s comp  |  unsigned
-----------|-----------|------------
0000 0000 | 0 | 0
0000 0001 | 1 | 1
...
0111 1111 | 127 | 127
1000 0000 | -128 | 128
1000 0001 | -127 | 129
...
1111 1110 | -2 | 254
1111 1111 | -1 | 255

N.B. MSB can still be signified by MSB.

It’s referred to as twos’ complement because the negative can be formed
by subtracting the positive from 2 ** N (congruent to 0000 0000 (+0)),
where N is the number of bits.

E.g., for a uint8_t, the sum of any number and it’s twos’ complement would
be 256 (1 0000 0000):

1
2
3
4
1 0000 0000       256  = 2 ** 8
- 0111 1111 127
---------------------
1000 0001 -127

Becuase of this, arithmetics becomes really easier, for any number x e.g. 127
we can get its twos’ complement by:

  1. ~x => 1000 0000 bitwise NOT (like ones’ complement)
  2. +1 => 1000 0001 add 1 (the one differed from ones’ complement)

Cons:

  • bad named?

Pros:

  • fast machine arithmatics.
  • only 1 zeros!
  • the minimal negative is -128

Offset binary 移码

It’s also called excess-K (偏移 K) or biased representation, where K is
the biasing value (the new 0), e.g. in excess-128:

1
2
3
4
5
6
7
8
9
10
  binary   |  K = 128  |  unsigned
-----------|-----------|------------
0000 0000 | -128(-K)| 0
0000 0001 | -127 | 1
...
0111 1111 | -1 | 127
1000 0000 | 0 | 128 (K)
1000 0001 | 1 | 129
...
1111 1111 | 127 | 255

It’s now mainly used for the exponent part of floating-point number.

Type Conversion & Printf

This might be a little bit off topic, but I want to note down what I observed
from experimenting. Basically, printf would not perform an implicit type
conversion but merely interpret the bits arrangement of your arguments as you
told it.

  • UB! stands for undefined behaviors
1
2
3
4
5
6
7
8
9
10
11
12
13
uint8_t u8 = 0b10000000; // 128
int8_t s8 = 0b10000000; // -128

printf("%"PRIu8 "\n", u8); // 128
printf("%"PRId8 "\n", u8); // 128 (UB! but somehow it's got right)
printf("%"PRId8 "\n", (int8_t)u8); // -128

printf("%"PRId8 "\n", s8); // -128
printf("%"PRIu8 "\n", s8); // 4294967168 (UB!)
printf("%"PRId8 "\n", (uint8_t)s8); // 128

printf("%"PRIxPTR "\n", s8); // ffffff80
printf("%"PRIxPTR "\n", (uintptr_t)s8); // ffffffffffffff80

Char & ASCII

Traditionally, char is represented in the computer as 8 bits as well. And
really, ASCII is only defined between 0 and 127 and require 7 bits.
(8-bit Extended ASCII is not quite well popularized and supported.)

It’s more complicated in extension such as Unicode nowadays, but we’ll ignore
it for future posts dedicated for char and string representation.

So how is a char different with a byte?

Well, the answer is whether a char is a signed char (backed by int8_t)
or a unsigned char (backed by uint8_t) is… implementaton-defined.
And most systems made it signed since most types (e.g. int) were signed
by default.

N.B. int is standard-defined to be equivalent to signed int. This is
not the case of char.

That’s why you often see such typedef such as:

1
2
typedef unsigned char Byte_t;
typedef uint8_t byte_t;

to emphysize the nature of byte should be just plain, unsigned, bits.

References

In the last episode we talked about the data representation of integer, a kind
of fixed-point numbers. Today we’re going to learn about floating-point numbers.

Floating-point numbers are used to approximate real numbers. Because of the
fact that all the stuffs in computers are, eventually, just a limited sequence
of bits. The representation of floating-point number had to made trade-offs
between ranges and precision.

Due to its computational complexities, CPU also have a dedicated set of
instructions to accelerate on floating-point arithmetics.

Terminologies

The terminologies of floating-point number is coming from the
scientific notation,
where a real number can be represented as such:

1
2
3
1.2345 = 12345 × 10 ** -4
----- -- --
significand^ ^base ^exponent
  • significand, or mantissa, 有效数字, 尾数
  • base, or radix 底数
  • exponent, 幂

So where is the floating point? It’s the . of 1.2345. Imaging the dot
can be float to the left by one to make the representation .12345.

The dot is called radix point, because to us it’s seem to be a decimal point,
but it’s really a binary point in the computers.

Now it becomes clear that, to represent a floating-point number in computers,
we will simply assign some bits for significand and some for exponent, and
potentially a bit for sign and that’s it.

IEEE-754 32-bits Single-Precision Floats 单精度浮点数

It was called single back to IEEE-754-1985 and now binary32 in the
relatively new IEEE-754-2008 standard.

1
2
3
4
5
       (8 bits)             (23 bits)
sign exponent fraction
0 011 1111 1 000 0000 0000 0000 0000 0000

31 30 .... 23 22 ....................... 0
  • The sign part took 1 bit to indicate the sign of the floats. (0 for +
    and 1 for -. This is the same treatment as the sign magnitute.
  • The exponent part took 8 bits and used offset-binary (biased) form to represent a signed integer.
    It’s a variant form since it took out the -127 (all 0s) for zero and +128
    (all 1s) for non-numbers, thus it ranges only [-126, 127] instead of
    [-127, 128]. Then, it choose the zero offset of 127 in these 254 bits (like
    using 128 in excess-128), a.k.a the exponent bias in the standard.
  • The fraction part took 23 bits with an implicit leading bit 1 and
    represent the actual significand in total precision of 24-bits.

Don’t be confused by why it’s called fraction instead of significand!
It’s all because that the 23 bits in the representation is indeed, representing
the fraction part of the real significand in the scientific notation.

The floating-point version of “scientific notation” is more like:

1
2
3
(leading 1)
1. fraction × 2 ^ exponent × sign
(base-2) (base-2)

So what number does the above bits represent?

1
2
S     F   ×  E  =  R
+ 1.(0) × 0 = 1

Aha! It’s the real number 1!
Recall that the E = 0b0111 1111 = 0 because it used a biased representation!

We will add more non-trivial examples later.

Demoing Floats in C/C++

Writing sample code converting between binaries (in hex) and floats are not
as straightforward as it for integers. Luckily, there are still some hacks to
perform it:

C - Unsafe Cast

We unsafely cast a pointer to enable reinterpretation of the same binaries.

1
2
3
4
5
6
float f1 = 0x3f800000; // C doesn't have a floating literal taking hex.
printf("%f \n", f1); // 1065353216.000000 (???)

uint32_t u2 = 0x3f800000;
float* f2 = (float*)&u2; // unsafe cast
printf("%f \n", *f2); // 1.000000

C - Union Trick

Oh I really enjoyed this one…Union in C is not only untagged union, but also
share the exact same chunk of memory. So we are doing the same reinterpretation,
but in a more structural and technically fancier way.

1
2
3
4
5
6
7
8
9
10
11
12
13
#include <stdint.h>
#include <inttypes.h>
#include <math.h>

float pi = (float)M_PI;
union {
float f;
uint32_t u;
} f2u = { .f = pi }; // we took the data as float

printf ("pi : %f\n : 0x%" PRIx32 "\n", pi, f2u.u); // but interpret as uint32_t
pi : 3.141593
: 0x40490fdb

N.B. this trick is well-known as type punning:

In computer science, type punning is a common term for any programming technique that subverts or circumvents the type system of a programming language in order to achieve an effect that would be difficult or impossible to achieve within the bounds of the formal language.

C++ - reinterpret_cast

C++ does provide such type punning to the standard language:

1
2
3
uint32_t u = 0x40490fdb;
float a = *reinterpret_cast<float*>(&u);
std::cout << a; // 3.14159

N.B. it still need to be a conversion between pointers,
see https://en.cppreference.com/w/cpp/language/reinterpret_cast.

Besides, C++ 17 does add a floating point literal that can take hex, but it
works in a different way, using an explicit radix point in the hex:

1
2
float f = 0x1.2p3;  // 1.2 by 2^3
std::cout << f; // 9

That’s try with another direction:

1
2
3
4
5
6
7
8
9
#include <iostream>
#include <stdint.h>
#include <inttypes.h>

int main() {
double qNan = std::numeric_limits<double>::quiet_NaN();
printf("0x%" PRIx64 "\n", *reinterpret_cast<uint64_t*>(&qNan));
// 0x7ff8000000000000, the canonical qNaN!
}

Representation of Non-Numbers

There are more in the IEEE-754!

Real numbers doesn’t satisfy closure property
as integers does. Notably, the set of real numbers is NOT closed under the
division! It could produce non-number results such as infinity (e.g. 1/0)
and NaN (Not-a-Number) (e.g. taking
a square root of a negative number).

It would be algebraically ideal if the set of floating-point numbers can be
closed under all floating-point arithmetics. That would made many people’s life
easier. So the IEEE made it so! Non-numeber values are squeezed in.

We will also include the two zeros (+0/-0) into the comparison here,
since they are also special by being the only two demanding an 0x00 exponent:

1
2
3
4
5
6
7
8
9
10
             binary                |    hex    |
--------------------------------------------------------
0 00000000 00000000000000000000000 = 0000 0000 = +0
1 00000000 00000000000000000000000 = 8000 0000 = −0

0 11111111 00000000000000000000000 = 7f80 0000 = +infinity
1 11111111 00000000000000000000000 = ff80 0000 = −infinity

_ 11111111 10000000000000000000000 = _fc0 0000 = qNaN (canonical)
_ 11111111 00000000000000000000001 = _f80 0001 = sNaN (one of them)
1
2
3
4
5
6
7
8
      (8 bits)  (23 bits)
sign exponent fraction
0 00 0 ...0 0 = +0
1 00 0 ...0 0 = -0
0 FF 0 ...0 0 = +infinity
1 FF 0 ...0 0 = -infinity
_ FF 1 ...0 0 = qNaN (canonical)
_ FF 0 ...0 1 = sNaN (one of them)

Encodings of qNaN and sNaN are not specified in IEEE 754 and implemented
differently on different processors. Luckily, both x86 and ARM family use the
“most significant bit of fraction” to indicate whether it’s quite.

More on NaN

If we look carefully into the IEEE 754-2008 spec, in the page35, 6.2.1, it
actually defined anything with exponent FF and not a infinity (i.e. with
all the fraction bits being 0), a NaN!

All binary NaN bit strings have all the bits of the biased exponent field E set to 1 (see 3.4). A quiet NaN bit string should be encoded with the first bit (d1) of the trailing significand field T being 1. A signaling NaN bit string should be encoded with the first bit of the trailing significand field being 0.

That implies, we actually had 2 ** 24 - 2 of NaNs in a 32-bits float!
The 24 came from the 1 sign bit plus 23 fractions and the 2 excluded
were the +/- inf.

The continuous 22 bits inside the fraction looks quite a waste, and there
would be even 51 bits of them in the double! We will see how to made them useful
in later episodes (spoiler: they are known as NaN payload).

It’s also worth noting that it’s weird that the IEEE choose to use the MSB
instead of the sign bit for NaN quiteness/signalness:

It seems strange to me that the bit which signifies whether or not the NaN is signaling is the top bit of the mantissa rather than the sign bit; perhaps something about how floating point pipelines are implemented makes it less natural to use the sign bit to decide whether or not to raise a signal.
https://anniecherkaev.com/the-secret-life-of-nan

I guess it might be something related to the CPU pipeline? I don’t know yet.

Equality of NaNs and Zeros.

The spec defined a comparison with NaNs to return an unordered result, that
means any comparison operation except !=, i.e. >=, <=, >, <, = between a
NaN and any other floating-point number would return false.

No surprised that most (if not every) language implemented such behaviours, e.g.
in JavaScript:

1
2
3
4
NaN !== NaN   // true
NaN === NaN // false
NaN > 1 // false
NaN < 1 // false

Position and negative zeros, however, are defined to be equal!

1
2
+0 === -0  // true, using the traditional JS equality
Object.is(+0, -0) // false, using the "SameValue" equality

In Cpp, we can tell them apart by looking at its sign bit:

1
2
3
4
5
#include <cmath>   // signbit

cout << (+0.0f == -0.0f); // 1
cout << std::signbit(-0.0f); // 1
cout << std::signbit(+0.0f); // 0

IEEE-754 64-bits Double-Precision Floats

Now, the 64-bit versions floating-point number, known as double, is just a
matter of scale:

1
2
3
4
5
       (11 bits)            (52 bits)
sign exponent fraction
0

63 62 .... 52 51 ....................... 0

IEEE-754-2008 16-bits Short Floats

The 2008 edition of IEEE-754 also standardize the short float, which is
neither in C or C++ standard. Though compiler extension might include it.

It looks like:

1
2
1 sign bit | 5 exponent bits | 10 fraction bits
S E E E E E M M M M M M M M M M

References

These series of notes combined

  • My notes on reading Software Foundation and (if any) watching on Coq intensive.
  • Gotchas from my independent studies and discussion within Prof.Fluet‘s class.

The .v code is a gorgeous example of literal programming and the compiled .html website is full-fledged.
So this note is intended to be NOT self-contained and only focus on things I found essential or interesting.

This note is intended to be very personal and potentially mix English with Chinese (You can Lol)
So yeah. Don’t expect it to be well organized and well written.
I posted it on blog mainly for my own references purpose.

The quotes could either come from the book or saying from someone (even including me).

Data and Functions

Custom Notation

1
2
Notation "x && y" := (andb x y).
Notation "x || y" := (orb x y).

can go pretty far with unicode char…

making things infix

1
2
3
4
5
6
7
8
9
Notation "x + y" := (plus x y)
(at level 50, left associativity)
: nat_scope.
Notation "x - y" := (minus x y)
(at level 50, left associativity)
: nat_scope.
Notation "x * y" := (mult x y)
(at level 40, left associativity)
: nat_scope.

why 40 50? Making sure there are still rooms for priority in between…

no known PL using real number for priority though

Data Constructor with arguments

there are 2 ways to define them:

1
2
3
4
5
Inductive color : Type :=
| black
| white
| primary (p : rgb). (* ADT, need to name arg, useful in proof *)
| primary : rgb -> color. (* GADT style, dependent type *)

Syntax for arguments having the same type

As a notational convenience, if two or more arguments have the same type, they can be written together

1
2
Inductive nybble : Type :=
| bits (b0 b1 b2 b3 : bit).
1
2
Fixpoint mult (n m : nat)         : nat := 
Fixpoint plus (n : nat) (m : nat) : nat :=

Fixpoint and Structrual Recursion

This requirement is a fundamental feature of Coq’s design: In particular, it guarantees that every function that can be defined in Coq will terminate on all inputs.

However, Coq’s “decreasing analysis” is not very sophisticated. E.g.

1
2
3
4
5
6
Fixpoint evenb (n:nat) : bool :=
match n with
| O => true
| S O => false
| n => evenb (pred (pred n))
end.

will result in a error that basically complains “this structure is not shrinking”.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Error:
Recursive definition of evenb is ill-formed.

evenb : nat -> bool
n : nat
n0 : nat
n1 : nat

Recursive call to evenb has principal argument equal to
"Nat.pred (Nat.pred n)" instead of one of the following variables: "n0" "n1".

Recursive definition is:
"fun n : nat =>
match n with
| 0 => true
| 1 => false
| S (S _) => evenb (Nat.pred (Nat.pred n))
end".

N.B. the n0 and n1 are sub-terms of n where n = S (S _).

So we have to make the sub-structure explicit to indicate the structure is obviously shrinking:

1
2
3
4
5
6
Fixpoint evenb (n:nat) : bool :=
match n with
| O => true
| S O => false
| S (S n') => evenb n'
end.

Now Coq will know this Fixpoint is performing a structural recursion over the 1st recursion and it guarantees to be terminated since the structure is decreasing:

1
2
evenb is defined
evenb is recursively defined (decreasing on 1st argument)

Proof by Case Analysis

1
2
3
4
5
6
Theorem plus_1_neq_0_firsttry : ∀n : nat,
(n + 1) =? 0 = false.
Proof.
intros n.
simpl. (* does nothing! *)
Abort.

simpl. does nothing since both + and =? have 2 cases.

so we have to destruct n as 2 cases: nullary O and unary S n'.

1
intros n. destruct n as [ (* O *) | (* S *) n'] eqn:E.
  • the intro pattern as [ |n'] name new bindings.
  • eqn:E annonate the destructed eqn (equation?) as E in the premises of proofs. It could be elided if not explicitly used, but useful to keep for the sake of documentation as well.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
subgoal 1

n : nat
E : n = 0 (* case 1, n is [O] a.k.a. [0] *)
============================
(0 + 1 =? 0) = false


subgoal 2

n, n' : nat
E : n = S n' (* case 2, n is [S n'] *)
============================
(S n' + 1 =? 0) = false

If there is no need to specify any names, we could omit as clause or simply write as [|] or as [].
In fact. Any as clause could be ommited and Coq will fill in random var name auto-magically.

A small caveat on intro

1
intros x y. destruct y as [ | y ] eqn:E.

By doing this, name y is shadowed. It’d usually better to use, say y' for this purpose.

Qed

standing for Latin words “Quod Erat Demonstrandum”…meaning “that which was to be demonstrated”.

0%