Library: | xplore |
See also: | sort cumsum paf diff |
Quantlet: | discrete | |
Description: | Reduces a matrix to its distinct rows and gives the number of replications of each row in the original dataset. An optional second matrix y can be given, the rows of y are summed up accordingly. |
Usage: | {xr,yr} = discrete(x{,y}) | |
Input: | ||
x | n x p matrix, the data matrix to reduce, in regression usually the design matrix. The matrix may be numeric or string, in the latter case no y is possible. | |
y | optional, n x q matrix, in regression usually the observations of the dependent variable. Not possible for string matrix x. | |
Output: | ||
xr | m x p matrix, reduced data matrix (sorted). | |
yr | m x 1 vector or m x (q+1) matrix, contains in the first column the number of replications. If y was given, sums of y-rows with same x-row are contained in the other q columns of r. |
library("xplore") n=100 b=1|2 x=ceil(normal(n,rows(b))) y=x*b + normal(n) ; -------------------------------------- ; data reduction ; -------------------------------------- {xr,yr}=discrete(x,y) r =yr[,1] yr=yr[,2] rows(r) ; -------------------------------------- ; descriptive statistics of x ; -------------------------------------- meanxr = sum(r.*xr)/sum(r) varxr = sum(r.*(xr-meanxr)^2)/(sum(r)-1) mean(x)'~meanxr' var(x)'~varxr' ; -------------------------------------- ; linear regression ; -------------------------------------- b=inv(x'*x)*x'*y br=inv(xr'*diag(r)*xr)*xr'*yr b~br
Matrices x, y with 100 rows are reduced to a matrix xr (containing distinct rows of x) and yr (sums of y with same rows in x). r gives the number of replications. The mean and variance of x coincide with the weighted mean and variance of xr. The linear regression of y on x coincides with the weighted regression of yr on xr.