proc iml; IML ready > reset log print; > x = 12.3; X 1 row 1 col (numeric) 12.3 > quit; Exiting IML
> x = 12.3; X 1 row 1 col (numeric) 12.3 > y = {57}; Y 1 row 1 col (numeric) 57 > name = 'Bob'; NAME 1 row 1 col (character, size 3) Bob
> a = { 2 4, 3 1}; A 2 rows 2 cols (numeric) 2 4 3 1 > b = { 4 5, 0 1}; B 2 rows 2 cols (numeric) 4 5 0 1 aa={'a' 'b' 'c', 'd' 'e' 'f'}; /* a 2 x 3 char matrix */ b = { 1 2 3 4 5 }; /* row vector */ c = { 1, 2, 3, 4, 5}; /* column vector */
> index = 1:5; INDEX 1 row 5 cols (numeric) 1 2 3 4 5 > col = 4:6`; COL 3 rows 1 col (numeric) 4 5 6 > rindex= 5:1; RINDEX 1 row 5 cols (numeric) 5 4 3 2 1 > vars = 'XX1': 'XX7'; VARS 1 row 7 cols (character, size 3) XX1 XX2 XX3 XX4 XX5 XX6 XX7
> series= do(12,72,12); SERIES 1 row 6 cols (numeric) 12 24 36 48 60 72
a=I(6); /* a 6x6 identity matrix */
a=j(5,5,0); /* a 5x5 matrix of zeroes */ b=j(6,1); /* a 6x1 column vector of 1's */
> d = diag( {1 2 4} ); D 3 rows 3 cols (numeric) 1 0 0 0 2 0 0 0 4
> d = diag( {1 2, 3 4} ); D 2 rows 2 cols (numeric) 1 0 0 4
a = { 2 4, 3 1}; b = { 4 5, 0 1}; sum = a + b; SUM 2 rows 2 cols (numeric) 6 9 3 2 diff = a - b; DIFF 2 rows 2 cols (numeric) -2 -1 3 0
times = a # b; TIMES 2 rows 2 cols (numeric) 8 20 0 1 prod = a * b; PROD 2 rows 2 cols (numeric) 8 14 12 16
a[1,2]=0; /* changes element [1,2] to zero */ a[1,]=0; /* changes first row values to 0 */ a[1,1:3]=0; /* 1:3 creates a vector of values from 1 to 3. So this changes values of the first row, in columns 1-3 to zero */ ind={2 3}; b=a[ind,ind]; /* set b equal to rows 2,3 and cols 2,3 of a */
b=a[+,]; /* b is the column sums of a */ b=a[##,]; /* b is the column sums of squares of a */ b=a[,:]; /* when used alone as an index, the : operator gives the mean. So b will be a vector containing the mean of each row */
Note: IML functions will often perform the same task more quickly and efficiently than using index operators as above. For example:
b=a[+,+]; c=sum(a); /* b and c will have the same value, but the sum function will be more efficient.*/
matrix2=LOC(matrix1=value);For example:
a={1 . 1 1, 2 2 2 2, 3 3 3 3, 4 4 4 4}; notmiss=loc( a[,2] ^= .); /* notmiss will equal the location (rows) in which which the second column of a is not missing */ newa=a[notmiss,]; /* newa contains rows of a with no missing elements elements in the second column. So, newa = {2 2 2 2, 3 3 3 3, 4 4 4 4};Even more efficient (but more confusing to follow) is to bypass the intermediate step of creating the vector notmiss:
newa = a[ loc(a[,2]^=.), ];
a={1 . 1 1, 2 2 2 2, 3 3 3 3, 4 4 4 4}; b=a+a; /* b equals {2 . 2 2, 4 4 4 4, 6 6 6 6, 8 8 8 8}; */ c=a#a; /* c equals {1 . 1 1, 4 4 4 4, 9 9 9 9, 16 16 16 16} */
coffee = { 4 2 2 3 2, 3 3 1 2 2, 2 1 0 4 5 }; COFFEE 3 rows 5 cols (numeric) 4 2 2 3 2 3 3 1 2 2 2 1 0 4 5 days = { Mon Tue Wed Thu Fri }; DAYS 1 row 5 cols (character, size 3) MON TUE WED THU FRI names = { 'Lenny', 'Linda', 'Sue'}; NAMES 3 rows 1 col (character, size 5) Lenny Linda Sue
print coffee[r=names c=days]; COFFEE MON TUE WED THU FRI Lenny 4 2 2 3 2 Linda 3 3 1 2 2 Sue 2 1 0 4 5
daycost = .50 # coffee; DAYCOST 3 rows 5 cols (numeric) 2 1 1 1.5 1 1.5 1.5 0.5 1 1 1 0.5 0 2 2.5 ones = j(5,1); weektot = daycost * ones; WEEKTOT 3 rows 1 col (numeric) 6.5 5.5 6 weektot = daycost[,+]; WEEKTOT 3 rows 1 col (numeric) 6.5 5.5 6 daytot = daycost[+,]; DAYTOT 1 row 5 cols (numeric) 4.5 3 1.5 4.5 4.5 total = daycost[+,+]; TOTAL 1 row 1 col (numeric) 18
print coffee[r=names c=days] weektot[format=dollar7.2] , daytot[c=days f=dollar8.2] ' ' total[f=dollar7.2]; COFFEE MON TUE WED THU FRI WEEKTOT Lenny 4 2 2 3 2 $6.50 Linda 3 3 1 2 2 $5.50 Sue 2 1 0 4 5 $6.00 DAYTOT MON TUE WED THU FRI TOTAL $4.50 $3.00 $1.50 $4.50 $4.50 $18.00
use psy303.fitness; read all into mat[rowname=name];The rowname option reads the variable name from the dataset, creating a character vector to be used as row labels.
For output to a SAS dataset, use the create and append statements, as in
*-- Output results to data set out ; xys = yhat || res || weight; cname = {"_YHAT_" "_RESID_" "_WEIGHT_" }; create out from xys [ colname=cname ]; append from xys;This creates the SAS dataset, WORK.OUT, containing three variables, whose names are specified by the vector cname.
USE libref.dataset (dataset options);
EDIT libref.dataset (dataset options);
Note: IML can have only one input and one output dataset at a time. The EDIT statement will assign one dataset as the current input and current output dataset. The USE statement will assign the dataset as the current input dataset without changing the current output dataset.
READ <range> <VAR operand> <WHERE (expression)> <INTO name <[rowname=variable colname=matrix]>>
WHERE (var1^=.)
Note: the WHERE clause does not 'override' the range specification. If range is not specified (default is current), WHERE clause will only evaluate current observation.
use psy303.fitness; read all into mat[rowname=name];
use psy303.fitness; read all into mat[rowname=name] where sex='M';
use data1; read all var {x1 x2 x3} into x;
keepobs=(1:10)#10; READ point keepobs into x[rowname=id colname=coln];
CREATE libref.dataset <VAR operand>; CREATE libref.dataset <FROM matrix <[r=vector1 c=vector2]>>;
Note: The CREATE statement does not put data into the dataset but only defines the structure of the dataset. The VAR clause and the FROM clause are mutually exclusive.
APPEND <VAR operand>; OR APPEND <FROM matrix <[r=vector1]>>;- the VAR clause and the FROM clause operate as they do in the CREATE statement. Note that the FROM clause does not have a c= option, since no data need actually be read from the colname vector. When the VAR clause has been used on the CREATE statement, it need not be specified on the APPEND statement.
Note: The VAR clause and the FROM clause are mutually exclusive. The APPEND statement will always output to the current output dataset, whether that dataset has been specified via the CREATE or the EDIT statement.
Note: It is possible to use external files (i.e. not SAS datasets) in IML. In terms of syntax, this is very similar to reading and writing these files in the SAS data step, but it is usually easier to do this in a SAS data step.
This section focusses on IML programming features, namely iterative and conditional processing. IML programming can take place in 'open' code or within modules (compiled programs). If the program is only used once then open code is generally preferable. If the program is used often, whether in one session or across sessions, the module format is probably preferred. Modules may be stored permanently in compiled form.
IF expression THEN statement1; ELSE IF expression THEN statement2;
Note: IML uses the symbol | for OR and the symbol & for AND. It will not accept the words as alternatives for logical operators as in the data step.
x=3; if x=3 then print 'x=' x; else if x=4 then print 'x is 4'; else print 'x is bad'; x= 3 x=4; if x=3 then print 'x=' x; else if x=4 then print 'x is 4'; else print 'x is bad'; x is 4 x=5; if x=3 then print 'x=' x; else if x=4 then print 'x is 4'; else print 'x is bad'; x is bad
DO variable = start TO stope.g., do i=1 to 100 by 10; ... end; do j=1 to 10; ... end; DO WHILE (expression); e.g., count=1; do while (count<5); ... end; DO UNTIL (expression); e.g., do until (count<5); ... end;
Note: the DO WHILE loop is evaluated at the top, meaning that if count was 10 in this example, the loop would not execute. The DO UNTIL loop is evaluated at the bottom, meaning that it will always execute at least once. In the above example, if count equals 1 to start, the DO loop will still execute once even though count is less than 5 to start with.
reset name; x=1; do while (x<2); print x; x=x+1; end; X 1 x=3; do while (x<2); /* note this loop does not execute */ print x; x=x+1; end; do until (x<4); print '** do until loop executes although X is less than 4', x; x=x-1; end; ** do until loop executes although X is less than 4 X 3
START module-name <(argument1, argument2,...)>; IML statements; FINISH;To run a program module:
RUN module-name <(argument1, argument2, ...)>;
start rmiss(mat1, mat2, miss); if nrow(miss)=0 then miss={.}; badpos=loc(mat1=miss); print badpos; /* positions of missing values in row-major order */ badrow=ceil(badpos/ncol(mat1)); print badrow; /* badrow will be rows with at least one msg value */ keeprow=remove(1:nrow(mat1),badrow); print keeprow; /* 1:nrow(mat1) creates vector of values from 1 to the number of rows of mat1. Then badrow numbers are removed from this vector */ mat2=mat1[keeprow,]; print mat2; /* mat2 is subset of mat1 containing only rows with no msg values */ finish;The RMISS module is used as follows:
x={1 . 1 1, 2 2 2 2, 3 . 3 ., 4 4 4 4}; run rmiss(x,y,miss); BADPOS 2 10 12 BADROW 1 3 3 KEEPROW 2 4 MAT2 2 2 2 2 4 4 4 4
START module-name <(argument1, argument2, ...)>; IML statements; RETURN matrix; FINISH;To use an assignment module:
mat1=module-name <(argument1, argument2, ...)>;
*-- Define a length function (LEN); start len(X); ssq = X[##,]; return (sqrt( ssq )); finish;
Note: It is not possible to directly assign default values for module arguments. It seems to be completely impossible for function modules. For an example of how this can be done in a program module, see the RMISS module example.
a=10; b=20; c=30; /* A,B,C are all global */ start mod1; /* module uses global table */ p=a+b; /* p is global */ c=40; /* c already global */ finish; run mod1; print a b c p; /* note c changed to 40 */ A B C P 10 20 40 30When a module is defined with arguments, a local symbol table is created. This symbol table is temporary and is unique to the module. These modules will have access only to specified arguments from the global symbol table. If modules are nested, the local symbol table of the 'parent' module acts as the global symbol table for the called module. If matrix C exists in the local table and the global table, the global value of C will not be affected by operations on the local value of C (unless global C was specified as the argument corresponding to local C).
start mod2(a,b); /* module with args creates local table */ p=2*(a+b); /* p is local */ b=50; /* b is local */ finish; run mod2(a,c); /* note that b (global) remains the same. Since C (global) is defined as b (local) and b is changed in the module, C (global) is changed. Note that p also remains the same. */ print a b c p; A B C P 10 20 50 30
IML storage catalogs are useful for saving large intermediate results for later use when memory is a concern. Also, these catalogs are necessary for having access to IML matrices and modules after an IML session is completed.
STORE a b c; /* stores matrices A,B and C*/ STORE module=mod1; /* stores module MOD1 */ STORE module=(mod1 mod2); /* stores modules MOD1 and MOD2 */ STORE; /* stores EVERYTHING */