Quantitative Discriminant Rule
- general form: $\forall X,$target_class$(X)\Leftrightarrow$ contition$_1(X)[t:w_1,d:\omega_1]\vee\cdots\vee$ condition$_n(X)[t: w_n, d:\omega_n]$
- Discriminant rule: sufficient condition of the target class
- d-weight = $\frac{\text{count}(q_a\in C_{\text{target}})}{\sum_{i=1}^N\text{count}(q_a\in C_i)}$
- d-weight: discriminability of each disjunct in the rule
- important
- Characteristic rule: necessary condition
- $\sum t_i=100%$
Association
-
Association rules: $A\Rightarrow B$[support, condifence]
-
Support
$$\text{support}(A\Rightarrow B)=P(A,B)=\frac{\text{# of tuples includes }A,B}{\text{# of total tuples}}$$
-
Confidence (t-weight)
$$\text{confidence}(A\Rightarrow B)=P(B|A)=\frac{\text{# of tuples includes }A,B}{\text{# of tuples includes }A}$$
-
-
strong: support > min_sup, confidence > min_conf
-
k-itemset: contains k items
-
frequency: support count
-
frequent itemset: support > min_sup
Apriori
- Two-step process
- find all frequent itemsets
- generate strong association rules (easy)
- INSIGHT: All non-empty subsets of a frequent itemset must also be frequent
- frequent $k$-itemset: $L_k$
- Join: $C_{k+1} = L_k\Join L_k={A\Join B|A,B\in L_k,|A\cap B|=k-1}$
- Prune: determine candidates in $C_{k+1}$ to get $L_{k+1}$
- ordering $L_k$
Apriori variant
-
AIS: for each tuple, expand $L_k$ by adding other items contained in the tuple to generate $C_{k+1}$
-
AprioriTid: calculate support in $\overline{C_k}$
-
DHP (Direct Hashing and Pruning)
- INSIGHT: the processes in the initial iterations of Apriori dominates the total execution cost
- Knowledge 1: Any member of a candidate frequent itemset must be hashed into a bucket whose count $\geq$ min_sup
- Knowledge 2: Any tuple useful in determining $L_{k+1}$ must contain at least $k+1$ sets in $C_k$
- Knowledge 3: For any items contained in a tuple, if it is useful in determining $L_{k+1}$, it must appear in at least $k$ sets in $C_k$
- Knowledge 4: For any items contained in a tuple, if it is useful in determining $L_{k+1}$ it must appear in at least one (k+1)-itemset whose k-itemsets are all candidate frequent k-itemsets
-
Improvement
- partitioning
- sampling
- transaction reduction
FP-growth
- Mining from FP-tree
- FP-tree construction
- scan DB once, find frequent 1-itemset
- sort frequent items in frequency descending order to get F-list
- scan DB again construct FP-tree
- for each frequent item in reverse frequency, construct its conditional pattern-base and conditional FP-tree
- repeat on newly created conditional FP-tree until empty
Others
- closed frequent itemset
- closed: no proper super-itemset $Y$ such that $Y$ has the same support count as $X$ in $S$
- Maximal frequent itemset: if $X$ is frequent, and there exists no proper super-itemset $Y$ such that $Y$ is frequent in $S$
- Multilevel association rules: Rules generated from association rule mining with concept hierarchies
- same min_sup for all levels
- level-by-level independent
- level-cross filtering by $k$-itemsets
- Level-crossfilteringbysingleitem
- Controlled level-cross filtering by single item
- Cross-level association rules
- Quantitative Association rules
- Step 1. Binning
- Step 2. Finding frequent predicatesets
- Step 3. Clustering association rules
- Distance-based association rules
- Criticism: Strong association rules may not be interesting
- Correlation analysis
- Lift: $\frac{P(AB)}{P(A)P(B)}$