Association

Quantitative Discriminant Rule

general form: $\forall X,$target_class$(X)\Leftrightarrow$ contition$_1(X)[t:w_1,d:\omega_1]\vee\cdots\vee$ condition$_n(X)[t: w_n, d:\omega_n]$
Discriminant rule: sufficient condition of the target class
- d-weight = $\frac{\text{count}(q_a\in C_{\text{target}})}{\sum_{i=1}^N\text{count}(q_a\in C_i)}$
- d-weight: discriminability of each disjunct in the rule
- important
Characteristic rule: necessary condition
- $\sum t_i=100%$

Two-step process
- find all frequent itemsets
- generate strong association rules (easy)
INSIGHT: All non-empty subsets of a frequent itemset must also be frequent
frequent $k$-itemset: $L_k$
Join: $C_{k+1} = L_k\Join L_k={A\Join B|A,B\in L_k,|A\cap B|=k-1}$
Prune: determine candidates in $C_{k+1}$ to get $L_{k+1}$
ordering $L_k$

AIS: for each tuple, expand $L_k$ by adding other items contained in the tuple to generate $C_{k+1}$
AprioriTid: calculate support in $\overline{C_k}$
DHP (Direct Hashing and Pruning)
- INSIGHT: the processes in the initial iterations of Apriori dominates the total execution cost
- Knowledge 1: Any member of a candidate frequent itemset must be hashed into a bucket whose count $\geq$ min_sup
- Knowledge 2: Any tuple useful in determining $L_{k+1}$ must contain at least $k+1$ sets in $C_k$
- Knowledge 3: For any items contained in a tuple, if it is useful in determining $L_{k+1}$, it must appear in at least $k$ sets in $C_k$
- Knowledge 4: For any items contained in a tuple, if it is useful in determining $L_{k+1}$ it must appear in at least one (k+1)-itemset whose k-itemsets are all candidate frequent k-itemsets
Improvement
- partitioning
- sampling
- transaction reduction

closed frequent itemset
- closed: no proper super-itemset $Y$ such that $Y$ has the same support count as $X$ in $S$
Maximal frequent itemset: if $X$ is frequent, and there exists no proper super-itemset $Y$ such that $Y$ is frequent in $S$
Multilevel association rules: Rules generated from association rule mining with concept hierarchies
- same min_sup for all levels
- level-by-level independent
- level-cross filtering by $k$-itemsets
- Level-crossfilteringbysingleitem
- Controlled level-cross filtering by single item
Cross-level association rules
Quantitative Association rules
- Step 1. Binning
- Step 2. Finding frequent predicatesets
- Step 3. Clustering association rules
Distance-based association rules
Criticism: Strong association rules may not be interesting
Correlation analysis
- Lift: $\frac{P(AB)}{P(A)P(B)}$