Sloppy conversion from lecture slides + extra info

Notation

This note uses for negation. SAT literature often uses bar notation instead, which is more compact but can look like a different variable.

Practical SAT Solving

SAT solvers work on formulas in normal form because arbitrary tree-structured formulas are hard to simplify globally (changes only affect subtrees).

Why CNF over DNF?

Both CNF and DNF can represent any propositional formula, but they have complementary strengths:

DNF (Disjunctive Normal Form): OR of ANDs, e.g.,
Satisfiability is trivial (just check if any conjunction is consistent), but converting to DNF can be exponential.

CNF (Conjunctive Normal Form): AND of ORs, e.g.,
Refutability is easy to show (find an empty clause), and conversion can be done in polynomial time. CNF is the standard input format for SAT solvers (DIMACS format).


Transformation to CNF

Approach 1: Multiplication

Produces an equivalent formula. Three steps:

  1. Eliminate derived connectives:

  2. Convert to Negation Normal Form (NNF): push negations inward using De Morgan’s laws, eliminate double negation. Result: only , , and literals.

  3. Distribute over to get CNF.

Distributivity of OR over AND

The core rule:

Symmetrically:

Why it works: For the left side to be true, either is true (making both clauses on the right true), or both and are true (again satisfying both clauses). The equivalence holds in both directions.

The goal is to push all s inside all s. When a disjunction sits above a conjunction, distribute. Repeat until every is at the innermost level (forming clauses) and every is at the outermost level (conjoining clauses).

Exponential blowup

Distributivity can cause exponential growth. For example, transforming to CNF produces clauses.


Approach 2: Tseitin Encoding

Produces an equisatisfiable formula: same satisfiability status, but not logically equivalent (introduces new variables). Polynomial complexity.

The idea: instead of distributing (which explodes), introduce a fresh variable for each subformula and constrain it to equal that subformula’s value.

Process:

  1. Label each non-literal subformula with a fresh variable

  2. For each labeled subformula, write a definition
    The children are either literals or labels of subformulas (not the subformulas themselves).

  3. AND all definitions together, then convert to CNF using Approach 1

  4. AND the root label (asserting the whole formula must be true)

Why no blowup: Each definition has the form where is a binary connective, or for negation. That’s at most 3 variables per definition. Converting a 3-variable equivalence to CNF produces a fixed small number of clauses (see below), no matter how large the original formula.

Converting definitions to CNF

An equivalence means . Convert each implication separately, then AND together.

Conjunction:

  • : if then both and
  • : if both and then
  • Result: — 3 clauses, 3 variables

Disjunction:

  • Result: — 3 clauses, 3 variables

Negation:

  • Result: — 2 clauses, 2 variables

Implication: — same as , use disjunction pattern

Worked example

Transform

The syntax tree:

  		→  (root)
  	  /   \
  	 ¬	 ∧
  	 |	/ \
  	↔	¬   e
  	   / \   |
  	  a   b  ∧
  		/ \
  	   c   d
  	```
  	
  	 *Step 1: Label non-literal subformulas bottom-up*
  	
  	 Start at the deepest nodes and work toward the root. Each non-literal subformula gets a fresh variable:
  	
  	 Deepest level:
  	 - $v_1$ for $(c \land d)$
  	 - $v_2$ for $(a \leftrightarrow b)$
  	
  	 Next level up:
  	 - $v_3$ for $\neg v_1$ (the negation of $c \land d$)
  	 - $v_4$ for $\neg v_2$ (the negation of $a \leftrightarrow b$)
  	
  	 Next level:
  	 - $v_5$ for $(v_3 \land e)$ (the right subtree of $\to$)
  	
  	 Root:
  	 - $v_6$ for $(v_4 \to v_5)$ (the whole formula)
  	
  	 *Step 2: Write a definition for each label*
  	
  	 Each definition says "this variable equals this subformula":
  	 - $v_1 \leftrightarrow (c \land d)$
  	 - $v_2 \leftrightarrow (a \leftrightarrow b)$
  	 - $v_3 \leftrightarrow \neg v_1$
  	 - $v_4 \leftrightarrow \neg v_2$
  	 - $v_5 \leftrightarrow (v_3 \land e)$
  	 - $v_6 \leftrightarrow (v_4 \to v_5)$
  	
  	 Notice: each definition has at most 3 variables (the label + up to 2 children).
  	
  	 *Step 3: Convert each definition to CNF*
  	
  	 $v_1 \leftrightarrow (c \land d)$ — conjunction pattern:
  	 $(\neg v_1 \lor c) \land (\neg v_1 \lor d) \land (v_1 \lor \neg c \lor \neg d)$
  	
  	 $v_3 \leftrightarrow \neg v_1$ — negation pattern:
  	 $(v_1 \lor v_3) \land (\neg v_1 \lor \neg v_3)$
  	
  	 $v_5 \leftrightarrow (v_3 \land e)$ — conjunction pattern:
  	 $(\neg v_5 \lor v_3) \land (\neg v_5 \lor e) \land (v_5 \lor \neg v_3 \lor \neg e)$
  	
  	 $v_6 \leftrightarrow (v_4 \to v_5)$ — implication is $(\neg v_4 \lor v_5)$, use disjunction pattern:
  	 $(\neg v_6 \lor \neg v_4 \lor v_5) \land (v_6 \lor v_4) \land (v_6 \lor \neg v_5)$
  	
  	 (Similarly for $v_2, v_4$)
  	
  	 *Step 4: AND all clauses together, plus assert the root*
  	
  	 Final CNF: (all clauses from step 3) $\land\; v_6$
  	
  	 The unit clause $(v_6)$ forces the root to be true, which propagates constraints through all the definitions.

Why "equisatisfiable" not "equivalent"

The CNF has extra variables () that don’t exist in the original formula. If the original is satisfiable, we can extend any satisfying assignment to the new variables (just compute what each subformula evaluates to). If the CNF is satisfiable, restricting to the original variables satisfies the original formula. But the formulas aren’t logically equivalent because they have different variable sets.


DPLL Algorithm

DPLL (Davis-Putnam-Logemann-Loveland, 1962) is the foundation of modern SAT solvers. It operates on CNF and combines:

  • Binary Constraint Propagation (BCP) for deterministic inference
  • Branching with backtracking for search

Binary Constraint Propagation (BCP)

Given a CNF formula containing a unit clause (a clause with exactly one literal), :

  1. Removes all clauses containing (they’re satisfied)

  2. Removes from all remaining clauses (it contributes nothing)

    Apply repeatedly until no unit clauses remain.

BCP outcomes:

  • Empty CNF ( = ): formula is satisfiable under current assignments
  • Empty clause ( = ): formula is unsatisfiable under current assignments (conflict)
  • Neither: BCP reaches fixpoint, branching required

BCP example

  1. Unit clause : propagate

  2. Unit clause : propagate

  3. Unit clause : propagate

    Satisfiable with .

DPLL Pseudocode

function DPLL(φ):
	while true:
		φ = BCP(φ)
		if φ == ⊤: return SAT
		if φ == ⊥:
			if stack.empty(): return UNSAT
			(l, φ) = stack.pop()	// backtrack
			φ = φ ∧ l			   // try other branch
		else:
			select literal l in φ
			stack.push(¬l, φ)		// save backtrack point
			φ = φ ∧ l			   // branch on l

The algorithm explores assignments depth-first, using BCP to prune the search space.


Modern SAT Solvers

DPLL alone isn’t enough. State-of-the-art solvers (Lingeling, CaDiCaL, MiniSAT, Glucose) add:

  • CDCL (Conflict-Driven Clause Learning): when a conflict occurs, analyze it to learn a new clause that prevents similar conflicts, then backjump non-chronologically
  • Variable selection heuristics: VSIDS and variants prioritize variables involved in recent conflicts
  • Restart strategies: periodically restart search to escape bad parts of the search tree
  • Phase saving: remember polarities from previous assignments
  • Two-watched literals: lazy data structure for efficient BCP
  • Preprocessing/inprocessing: simplify formula before and during search

DPLL variants extend to other logics: QBF (quantified boolean formulas), SMT (satisfiability modulo theories).


Practice Problems