Sloppy conversion from lecture slides + extra info
Notation
This note uses for negation. SAT literature often uses bar notation instead, which is more compact but can look like a different variable.
Practical SAT Solving
SAT solvers work on formulas in normal form because arbitrary tree-structured formulas are hard to simplify globally (changes only affect subtrees).
Why CNF over DNF?
Both CNF and DNF can represent any propositional formula, but they have complementary strengths:
DNF (Disjunctive Normal Form): OR of ANDs, e.g.,
Satisfiability is trivial (just check if any conjunction is consistent), but converting to DNF can be exponential.
CNF (Conjunctive Normal Form): AND of ORs, e.g.,
Refutability is easy to show (find an empty clause), and conversion can be done in polynomial time. CNF is the standard input format for SAT solvers (DIMACS format).
Transformation to CNF
Approach 1: Multiplication
Produces an equivalent formula. Three steps:
-
Eliminate derived connectives:
-
Convert to Negation Normal Form (NNF): push negations inward using De Morgan’s laws, eliminate double negation. Result: only , , and literals.
-
Distribute over to get CNF.
Distributivity of OR over AND
The core rule:
Symmetrically:
Why it works: For the left side to be true, either is true (making both clauses on the right true), or both and are true (again satisfying both clauses). The equivalence holds in both directions.
The goal is to push all s inside all s. When a disjunction sits above a conjunction, distribute. Repeat until every is at the innermost level (forming clauses) and every is at the outermost level (conjoining clauses).
Distributivity step-by-step
Transform to CNF.
This is a disjunction of two conjunctions. We need to “multiply out” like , but with as multiplication and as addition.
Step 1: Treat as where
Step 2: Distribute in the left conjunct:
Step 3: Distribute in the right conjunct:
Result:
Four clauses from two 2-literal conjunctions: clauses.
General pattern
When distributing :
Each must pair with each to form a clause .
Result: clauses:
Example:
clauses:
Chained disjunctions
For :
First combine the first two terms, then combine with the third.
Step 1:
Step 2: Now OR this 4-clause conjunction with . Each of the 4 clauses distributes with both and :
In general, yields clauses.
Exponential blowup
Distributivity can cause exponential growth. For example, transforming to CNF produces clauses.
Worked example
Transform
Step 1a (remove equivalences):
Step 1b (remove implications):
Step 2 (NNF via De Morgan + double negation):
Step 3 (distribute over ):
We have where:
, , ,Applying the pattern: each element of the left conjunction pairs with each element of the right:
Substituting back and flattening nested ORs:
Approach 2: Tseitin Encoding
Produces an equisatisfiable formula: same satisfiability status, but not logically equivalent (introduces new variables). Polynomial complexity.
The idea: instead of distributing (which explodes), introduce a fresh variable for each subformula and constrain it to equal that subformula’s value.
Process:
-
Label each non-literal subformula with a fresh variable
-
For each labeled subformula, write a definition
The children are either literals or labels of subformulas (not the subformulas themselves). -
AND all definitions together, then convert to CNF using Approach 1
-
AND the root label (asserting the whole formula must be true)
Why no blowup: Each definition has the form where is a binary connective, or for negation. That’s at most 3 variables per definition. Converting a 3-variable equivalence to CNF produces a fixed small number of clauses (see below), no matter how large the original formula.
Converting definitions to CNF
An equivalence means . Convert each implication separately, then AND together.
Conjunction:
- : if then both and
- : if both and then
- Result: — 3 clauses, 3 variables
Disjunction:
- Result: — 3 clauses, 3 variables
Negation:
- Result: — 2 clauses, 2 variables
Implication: — same as , use disjunction pattern
Worked example
Transform
The syntax tree:
→ (root) / \ ¬ ∧ | / \ ↔ ¬ e / \ | a b ∧ / \ c d ``` *Step 1: Label non-literal subformulas bottom-up* Start at the deepest nodes and work toward the root. Each non-literal subformula gets a fresh variable: Deepest level: - $v_1$ for $(c \land d)$ - $v_2$ for $(a \leftrightarrow b)$ Next level up: - $v_3$ for $\neg v_1$ (the negation of $c \land d$) - $v_4$ for $\neg v_2$ (the negation of $a \leftrightarrow b$) Next level: - $v_5$ for $(v_3 \land e)$ (the right subtree of $\to$) Root: - $v_6$ for $(v_4 \to v_5)$ (the whole formula) *Step 2: Write a definition for each label* Each definition says "this variable equals this subformula": - $v_1 \leftrightarrow (c \land d)$ - $v_2 \leftrightarrow (a \leftrightarrow b)$ - $v_3 \leftrightarrow \neg v_1$ - $v_4 \leftrightarrow \neg v_2$ - $v_5 \leftrightarrow (v_3 \land e)$ - $v_6 \leftrightarrow (v_4 \to v_5)$ Notice: each definition has at most 3 variables (the label + up to 2 children). *Step 3: Convert each definition to CNF* $v_1 \leftrightarrow (c \land d)$ — conjunction pattern: $(\neg v_1 \lor c) \land (\neg v_1 \lor d) \land (v_1 \lor \neg c \lor \neg d)$ $v_3 \leftrightarrow \neg v_1$ — negation pattern: $(v_1 \lor v_3) \land (\neg v_1 \lor \neg v_3)$ $v_5 \leftrightarrow (v_3 \land e)$ — conjunction pattern: $(\neg v_5 \lor v_3) \land (\neg v_5 \lor e) \land (v_5 \lor \neg v_3 \lor \neg e)$ $v_6 \leftrightarrow (v_4 \to v_5)$ — implication is $(\neg v_4 \lor v_5)$, use disjunction pattern: $(\neg v_6 \lor \neg v_4 \lor v_5) \land (v_6 \lor v_4) \land (v_6 \lor \neg v_5)$ (Similarly for $v_2, v_4$) *Step 4: AND all clauses together, plus assert the root* Final CNF: (all clauses from step 3) $\land\; v_6$ The unit clause $(v_6)$ forces the root to be true, which propagates constraints through all the definitions.
Why "equisatisfiable" not "equivalent"
The CNF has extra variables () that don’t exist in the original formula. If the original is satisfiable, we can extend any satisfying assignment to the new variables (just compute what each subformula evaluates to). If the CNF is satisfiable, restricting to the original variables satisfies the original formula. But the formulas aren’t logically equivalent because they have different variable sets.
DPLL Algorithm
DPLL (Davis-Putnam-Logemann-Loveland, 1962) is the foundation of modern SAT solvers. It operates on CNF and combines:
- Binary Constraint Propagation (BCP) for deterministic inference
- Branching with backtracking for search
Binary Constraint Propagation (BCP)
Given a CNF formula containing a unit clause (a clause with exactly one literal), :
Removes all clauses containing (they’re satisfied)
Removes from all remaining clauses (it contributes nothing)
Apply repeatedly until no unit clauses remain.
BCP outcomes:
- Empty CNF ( = ): formula is satisfiable under current assignments
- Empty clause ( = ): formula is unsatisfiable under current assignments (conflict)
- Neither: BCP reaches fixpoint, branching required
BCP example
Unit clause : propagate
Unit clause : propagate
Unit clause : propagate
Satisfiable with .
DPLL Pseudocode
function DPLL(φ):
while true:
φ = BCP(φ)
if φ == ⊤: return SAT
if φ == ⊥:
if stack.empty(): return UNSAT
(l, φ) = stack.pop() // backtrack
φ = φ ∧ l // try other branch
else:
select literal l in φ
stack.push(¬l, φ) // save backtrack point
φ = φ ∧ l // branch on l
The algorithm explores assignments depth-first, using BCP to prune the search space.
Modern SAT Solvers
DPLL alone isn’t enough. State-of-the-art solvers (Lingeling, CaDiCaL, MiniSAT, Glucose) add:
- CDCL (Conflict-Driven Clause Learning): when a conflict occurs, analyze it to learn a new clause that prevents similar conflicts, then backjump non-chronologically
- Variable selection heuristics: VSIDS and variants prioritize variables involved in recent conflicts
- Restart strategies: periodically restart search to escape bad parts of the search tree
- Phase saving: remember polarities from previous assignments
- Two-watched literals: lazy data structure for efficient BCP
- Preprocessing/inprocessing: simplify formula before and during search
DPLL variants extend to other logics: QBF (quantified boolean formulas), SMT (satisfiability modulo theories).
Practice Problems
Exercise: BCP on complex formula
Unit clauses: and
Conflict detected.
Exercise: BCP leading to UNSAT
Unsatisfiable.
Exercise: BCP with multiple propagations
Unsatisfiable.