Simple Instructions For programming questions, please contact lliu@cs.kent.edu. For other questions, please contact jin@cs.kent.edu or yxiang@cs.kent.edu. In this instruction, the content in the () is added for explanation. What contained in the source code: 1. Folder datasets: chess.dat, connect.data, pumsb.dat, retail.data and T40I10D100K.dat. 2. Folder Mafia: the source code for software Mafia, which also can be downloaded from "http://himalaya-tools.sourceforge.net/Mafia/". 3. Folder src: our source code. How to run the source code: 1. Extact "source_code.zip"; 2. Compile the source code, by typing "make" in "/src" directory; 3. Compile Apriori, by typing "make" in "/apriori/apriori/src" directory, and copy the executable file "apriori" into "/src" directory. Note: New version of apriori software may be incompatible with our program. 4. Install Mafia, by typing "./configure" and then "make" in "/Mafia" directory. 5. to run the python script for each dataset. for example, for chess.dat, ./chess.py 1 or ./chess.py 2 (The 1 and 2 indicate "exact method" and "with-false-positive method", respectively. How to check the result (chess.dat, for example): Final results (in "/src/result" directory): 1. Parameters and statistics for all support levels are in "/src/result/chess/statistics.dat". 2. Bicliques : e.g. for alpha=0.7, all bicliques are recorded in file "/src/result/chess/biclique/0.7.txt". Format: ":" separates multipliers and "," separates itemsets. The complete running record: biclique_chess.dat contains the biclique constructed during the whole running process, /src/result/biclique_chess.dat. The format is: ... ... ... biclique[# of the bicliuqe], cost[how much cost (i.e. the number of itemsets) of this biclique], MFI_covered[how many maximal frequent itemsets have been covered] A (the left multiplier) contains: 1 2 ; 3 4; (";" separates itemsets) B (the right multiplier) contains: 5 6 7; (1,2,3,4,5,6,7 stands for the item in the dataset.) ... ... ... PARAMETERS: ALPHA 0.02 (support for the mfi); BETA 0.0001 (support for fi_mfi); GAMMA 0.001 (support for bipartite graph); MEHTHOD 2 (exact or with-false-positive method); CONSTANT 0 (weight in the algorithm). TOTAL MFI : 2015 (number of maximal frequent itemsets in this support); COVERED MFI: 2015 (how many mfi have been covered); TOTAL LENGTH of COVERED MFI: 3689 (how many items are in the covered mfi). TOTAL BIC : 12 (total number of bicliques); TOTAL COST: 1595 (the total cost of the bicliques); TOTAL LENGTH of ALL SETs: 2676 (how many items are in those bicliques). DISTINCT ITEMS in BIC: 1561 (how many distinct items are in thos bicliques). 3 SECONDs to find the bicliques! ... ... ... All other files in /src/result record intermediate results. Future work: In our phython script programs have assigned beta and delta (i.e. gamma) parameters for each datasets under different alpha. If we adjust beta and delta to a lower value for a dataset with support alpha, it may take (much) longer time to run, but the result will be more concise (i.e. smaller cost). In the future, we will develop and algorithm automatically dertermine the best beta and delta values for a dataset with support alpha, such that the algorithm runs in a reasonable time for a most concise representation.