1. Using Entropy-Related Measures in Categorical Data Visualization, Jamal Alsakran, Xiaoke Huang, Ye Zhao, Jing Yang, Karl Fast, Proceedings of IEEE Pacific Visualization, pages 81-88, March, 2014, IEEE (PDF)(PPT)(Bibtex).
Abstract: A wide variety of real-world applications generate massive high dimensional categorical datasets. These datasets con-tain categorical variables whose values comprise a set of discrete categories. Visually exploring these datasets for insights is of great interest and importance. However, their discrete nature often confounds the direct application of existing multidimensional vi-sualization techniques. We propose using entropy related measures as a means of harnessing this discreteness to generate more effective visualizations. Entropy, mutual information, and joint entropy are applied to understanding categorical data facts, managing visualization layouts, and promoting visual analytics. A set of visualization enhancements are developed with word cloud, treemap, scatter plot matrix, and parallel coordinates, in particular for categorical data. We conducted a user study to assess the benefits of using entropy-related measures in knowledge discovery over parallel coordinates. A new quantitative method uses ribbon size distribution to evaluate parallel coordinates visualizations. Moreover, we propose entropy-weighted similarities to improve categorical data clustering. A new TabularCluster visualization is designed to depict categorical cluster characteristics, which can promote data comparison and understanding in analytical tasks.
2. Visualizing Clusters in Parallel Coordinates for Visual Knowledge Discovery, Yang Xiang, David Fuhry, Ruoming Jin, Ye Zhao, and Kun Huang, The Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), May 2012 (PDF)(Bibtex).
Abstract: Parallel coordinates is frequently used to visualize multi-dimensional data. In this paper, we are interested in how to effectively visualize clusters of multi-dimensional data in parallel coordinates for the purpose of facilitating knowledge discovery. In particular, we would like to efficiently find a good order of coordinates for different emphases on visual knowledge discovery. To solve this problem, we link it to the metric-space Hamiltonian path problem by defining the cost between every pair of coordinates as the number of inter-cluster or intra-cluster crossings. This definition connects to various efficient solutions and leads to very fast algorithms. In addition, to better observe cluster interactions, we also propose to shape clusters smoothly by an energy reduction model which provides both macro and micro view of clusters.
3. Tile-based parallel coordinates and its application in financial visualization, Jamal Alsakran, Ye Zhao and Xinlei Zhao, Proceedings of the SPIE conference on Visualization and Data Analysis, Volume 7530, pages 753003-12 , San Jose, CA, 2010, SPIE. (PDF)(PPT)(Bibtex)
This work and its visual contents have been cited and used in two finance books:
Handbook of Financial Data and Risk Information II (Cambridge Univeristy Press )
Financial Analysis and Risk Management: Data Governance, Analytics and Life Cycle Management (Springer)
Abstract: In this paper, we first propose a tile-based parallel coordinates, where the plotting area is divided into rectangular tiles. Each tile stores an intersection density that counts the total number of polylines intersecting with that tile. Consequently, the intersection density is mapped to optical attributes, such as color and opacity, by interactive transfer functions. The method visualizes the polylines efficiently and informatively in accordance with the density distribution, and thus, reduces visual cluttering and promotes knowledge discovery. The interactivity of our method allows the user to instantaneously manipulate the tiles distribution and the transfer functions. Specifically, the classic parallel coordinates rendering is a special case of our method when each tile represents only one pixel. A case study on a real world data set, U.S. stock mutual fund data of year 2006, is presented to show the capability of our method in visually analyzing financial data. The presented visual analysis is conducted by an expert in the domain of finance. Our method gains the support from professionals in the finance field, they embrace it as a potential investment analysis tool for mutual fund managers, financial planners, and investors.
4. Visual Analysis of Mutual Fund Performance, Jamal Alsakran, Ye Zhao, Xinlei Zhao, Proceedings of the 13th International Conference on Information Visualization (IV09), pages 252-259, Barcelona, Spain, July, 2009, IEEE Computer Society (PDF)(Bibtex)
Abstract: Mutual funds are probably the most important investment instruments for investors. Their performance is mainly affected by their characteristics, such as asset size, turnover and fee structure. It is thus of the investors' highest priorities to understand the relation between fund performance and these properties. Typically, financial researchers use the linear regression technique to statistically assess the relation from massive fund performance data. Unfortunately, the prevailing methods may be confounded by the existence of on-linearity and outliers, and give conflicting conclusions. In this paper, we propose a visualization-based method to improve the mutual fund analysis, where a new visual analytical tool, the density-based distribution map, is applied. The new visual representations greatly help to understand the critical relations, reveal the deficiency of current analytical algorithms, and support mutual funds selection. The tool is used to perform an expert financial analysis, and establish a fund selection strategy from a real-world database of the US stock funds. Our method gains the admiration and support from professionals in the finance field, they embrace it as a potential investment analysis tool for mutual fund managers, financial planners, and investors.