Keywords - Function groups - @ A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Library: xplore
See also: sort cumsum paf diff

Quantlet: discrete
Description: Reduces a matrix to its distinct rows and gives the number of replications of each row in the original dataset. An optional second matrix y can be given, the rows of y are summed up accordingly.

Usage: {xr,yr} = discrete(x{,y})
Input:
x n x p matrix, the data matrix to reduce, in regression usually the design matrix. The matrix may be numeric or string, in the latter case no y is possible.
y optional, n x q matrix, in regression usually the observations of the dependent variable. Not possible for string matrix x.
Output:
xr m x p matrix, reduced data matrix (sorted).
yr m x 1 vector or m x (q+1) matrix, contains in the first column the number of replications. If y was given, sums of y-rows with same x-row are contained in the other q columns of r.

Example:
library("xplore")
n=100
b=1|2
x=ceil(normal(n,rows(b)))
y=x*b + normal(n)
; --------------------------------------
;  data reduction
; --------------------------------------
{xr,yr}=discrete(x,y)
r =yr[,1]
yr=yr[,2]
rows(r)
; --------------------------------------
;  descriptive statistics of x
; --------------------------------------
meanxr = sum(r.*xr)/sum(r)
varxr  = sum(r.*(xr-meanxr)^2)/(sum(r)-1)
mean(x)'~meanxr'
var(x)'~varxr'
; --------------------------------------
;  linear regression
; --------------------------------------
b=inv(x'*x)*x'*y
br=inv(xr'*diag(r)*xr)*xr'*yr
b~br

Result:
Matrices x, y with 100 rows are reduced to a matrix xr
(containing distinct rows of x) and yr (sums of y with same
rows in x).
r gives the number of replications. The mean and variance
of x coincide with the weighted mean and variance of xr.
The linear regression of y on x coincides with the weighted
regression of yr on xr.



Author: T. Koetter, M. Mueller, 19970325
(C) MD*TECH Method and Data Technologies, 05.02.2006