Saturday, February 10, 2007

Multidimensional Scaling

Multidimensional scaling is an exploratory technique used to visualize proximities in a low dimensional space. Interpretation of the dimensions can lead to an understanding of the processes underlying the perceived nearness of entities. Furthermore it is possible to incorporate individual or group differences in the solution.

Fisher's linear discriminant

Fisher's linear discriminant is a classification method that projects high-dimensional data onto a line and performs classification in this one-dimensional space. The projection maximizes the distance between the means of the two classes while minimizing the variance within each class. This defines the Fisher criterion, which is maximized over all linear projections, w:
where m represents a mean, s2 represents a variance, and the subscripts denote the two classes. In signal theory, this criterion is also known as the signal-to-interference ratio. Maximizing this criterion yields a closed form solution that involves the inverse of a covariance-like matrix. This method has strong parallels to linear perceptrons. We learn the threshold by optimizing a cost function on the training set.

Principal component analysis

Principal component analysis (PCA) involves a mathematical procedure that transforms a number of (possibly) correlated variables into a (smaller) number of uncorrelated variables called principal components. The first principal component accounts for as much of the variability in the data as possible, and each succeeding component accounts for as much of the remaining variability as possible.
Objectives of principal component analysis:1)To discover or to reduce the dimensionality of the data set. 2)To identify new meaningful underlying variables. Matlab code of PCA:

function [patterns, targets, UW, m, W] = PCA(patterns, targets, dimension)
[r,c] = size(patterns);
if (r < dimension)dimension = r;end
%Calculate cov matrix and the PCA matrixes m = mean(patterns')';S = ((patterns - m*ones(1,c)) * (patterns - m*ones(1,c))');[V, D] = eig(S);W = V(:,r-dimension+1:r)';U = S*W'*inv(W*S*W');
%Calculate new patternsUW = U*W;patterns = W*patterns;

Friday, February 09, 2007

LaTex Templates

LaTex Sites

  1. CTeX: 中文TeX第一站(BBS@CTeX)
  2. ChinaTex
  3. TeX@SMTH(水木社区TeX版)
  4. 中文TeX与数学网站交流会(华东师范大学), 2004
  5. LaTeX编辑部

Some Books about LaTex

  1. LaTeX入门与提高(第二版),陈志杰等编著,2006
  2. 郭力、张林波、葛向阳,CCT中外文科技激光照排系统用户手册,1993
  3. 邓建松,LaTeX2ε 科技排版指南,科学出版社,2001
  4. Donald E. Knuth,TeXbook,1984
  5. L. Lamport,LaTeX: A Document Preparation System (2nd),1994
  6. M. Goosens, et al, The Latex Companion (2nd Edition),2004
  7. H. Kopka, P.W. Daly,Guide to LaTeX (4th Edition),2003
  8. G. Gratzer,Math Into LaTeX (3rd Edition),2000

LaTeX – A document preparation system

LaTeX is a high-quality typesetting system; it includes features designed for the production of technical and scientific documentation. LaTeX is the de facto standard for the communication and publication of scientific documents.
Today,I try to write my paper with LaTex,this is my first time to use the tools.It's do very well in preparing thesis and paper.I enjoy the holiday!I can do what i like.Cheer!

Tuesday, February 06, 2007

The Curse of Dimensionality

Posted by Picasa

Manifold learning

A manifold is a topological space which is locally Euclidean.

face recognition and semi-supervised learning

When there is a large amount of unlabeledsamples available, these methods may outperform traditional supervised learning algorithms such as Support Vector Machinesand regression [Belkin et al., 2004]. However, in some applications such as face recognition, the unlabeled samples may not be available, thus these semi-supervised learning methods can not be applied.

manifold based semi-supervised learning

Geometricallymotivated approaches to data analysis in high dimensionalspaces have been shown to be effective in discovering thegeometrical structure of the underlying manifold.Examples include ISOAMP [Tenenbaum etal., 2000], Laplacian Eigenmap [Belkin and Niyogi, 2001],Locally Linear Embedding [Roweis and Saul, 2000].However,they are unsupervised in nature and fail to discover the discriminantstructure in the data. In the meantime, manifold based semi-supervised learning has attracted considerable attention[Zhou et al., 2003], [Belkin et al., 2004]. Thesemethodsmake use of both labeled and unlabeled samples. The labeledsamples are used to discover the discriminant structure,while the unlabeled samples are used to discover the geometricalstructure.

priorknowledge in kernel methods

How to incorporate some priorknowledge into kernel methods?It's a interesting question in the area of machine learing.

Monday, February 05, 2007

Sunday, February 04, 2007

Semi-Supervised Nonlinear Dimensionality Reduction

Prior information canbe obtained from experts on the subject of inter-est and/or by performing experiments. For exam-ple, in moving object tracking, the coordinates of theobject in certain frames can be determined manu-ally, and can be used as prior information. We con-sider prior information in the form of on-manifoldcoordinates of certain data samples. We considerboth exact and inexact prior information. We callthe new algorithms Semi-Supervised LLE (SS-LLE),Semi-Supervised ISOMAP (SS-ISOMAP), and Semi-Supervised LTSA (SS-LTSA). Assuming the prior in-formation has a physical meaning, then our semi-supervised algorithms yield global low dimensional co-ordinates that bear the same physical meaning.
Xin Yang et al in a paper consider both exact and inexact prior information. They call the new algorithms Semi-Supervised LLE (SS-LLE),Semi-Supervised ISOMAP (SS-ISOMAP), and Semi-Supervised LTSA (SS-LTSA). Assuming the prior in-formation has a physical meaning, then the semi-supervised algorithms yield global low dimensional coordinates that bear the same physical meaning.

Nonlinear Dimensionality Reduction

Tradition-ally,multidimensional scaling (MDS) (Hastie et al.,2001) and principal component analysis (PCA) (Hastieet al., 2001) have been used for dimensionality reduc-tion.MDS and PCA perform well if the input data lieon or are close to a linear subspace, but are not de-Appearing in Proceedings of the 23rd International Conferenceon Machine Learning, Pittsburgh, PA, 2006. Copy-right 2006 by the author(s)/owner(s).signed to discover nonlinear structures, and often failto do so.Weinberger et al (Wein-berger et al., 2005) proposed using semi-definite programming and kernel matrix factorization to maximizethe variance in feature space while preserving the distance and angles between nearest neighbors.
Classical methods, such as LLE, ISOMAP, and LTSAare all unsupervised learning algorithms, that is, theyassume no prior information on the input data. Fur-thermore, these algorithms do not always yield lowdimensional coordinates that bear any physical meaning.

Something about semi-supervised learning

There has been much work on applying multiple-instance (MI) learning to content based image retrieval (CBIR) where the goal is to rank all images in a known repository using a small labeled data set. Now somebody proposed a method -MISSL (Multiple-Instance Semi-Supervised Learning) that transforms any MI problem into an input for a graph-based single-instance semi-supervised learning method.

一切为了开题

面临开题的压力,实在颇大!流形、半监督学习、HMM、unlabeled data。。。。。。。。。头都大了,努力,希望尽快有质的突破!!!!!!!!

一剪梅

Posted by Picasa