SAS Macros

Variable names to macro

Reference: https://www.listendata.com/2015/06/sas-macro-getting-variable-names-in.html

Get all the variable names except ID variable

proc sql noprint;
select name into : vars separated by " "
from dictionary.columns
where LIBNAME = upcase("work")
and MEMNAME = upcase("predata")
and upcase(name) ne upcase("id");
quit;

Remark:

  1. ne means not.
  2. Macro variable vars contains selected variables.

Processing variable names in a macro variable

Add a suffix to each name:

%macro px;
%let newvars=;
%do i=1 %to %sysfunc(countw(&vars));
   %let name = %scan(&vars, &i, %str( ));
	%let newvars= &newvars &name._1;
%end;
%put newvars=&newvars;
%mend;

Remark:

  1. %scan(&vars,&i) return ith name in vars.
  2. %let newvars= &newvars &name._1; don’t forget ._1. We have to add a dot before _.

Notes for Xilinx Vitis

General

  1. There is a Vitis Serial Terminal which can be used to send input to cin and receive output by cout. However, it only appears on Debug Prespective (no matter whether running the project by Debug or Run).

lscript.ld

  1. Set heap and stack size. 0x10000000 = 256MB. 0x19000000=400MB. If one doesn’t set it correctly. Then there maybe a std::bad_alloc error when initializing a large vector<>. For Eigen library, MatrixXd uses heap memory and Matrix<double, N, N> uses stack memory.
  2. Set ps7_ddr_0.
  3. “lscript” means “linker script”, it sets some parameters such as heap and stack size, and it is used to link with elf file.

C++ Project Settings

  1. Sometimes after right clicking project title and opening C++ build settings, there is nowhere to add include paths. One should right click source file and open properties to add them.
  2. Project title – C++ Build Settings – C/C++ Build – Settings – Tool Settings to add flags like -std=c++17.
  3. One system project can have multiple application projects. Right click a system project and choose Add project… .

Launch Configuration

  1. Check Main – Performance Analysis to view CPU usage.

SAS PROC OPTMODEL Examples

Integer programming

proc optmodel;
	num n=544, t1=26842838.69, t2=-9536251.37;
	var x{1..n} binary;
	num v1{1..n}, v2{1..n};
	read data dt into [_N_] v1='第一维度'n v2='第二维度'n; 	

	min k=sum{i in 1..n} x[i];

	con a: t1-1.5<=sum{i in 1..n} v1[i]*x[i]<=t1+1.5;
	con b: t2-1.5<=sum{i in 1..n} v2[i]*x[i]<=t2+1.5;

	solve;
quit;

Continuous

Example 1

Solution: [0,\cdots,0].

proc optmodel;
number D init 1000;
var x{1..D} >=-32 <=32;
min z=sum{ i in 1..D} (x[i]**2-10*cos(2*constant('pi')*x[i])+10);
solve with lso/ maxgen=2000;
create data answer from [i] x z;
quit;
  1. solve with lso: lso is based on GA.
  2. DON’T forget parentheses after sum{} as sum{} x+y is equal to (sum{} x)+y which is very different from sum{}(x+y).
  3. In create data answer from [i] x z; [i] is used to index x and z.

Example 2

Shifted function:

proc optmodel;
number D init 1000;
number xt{i in 1..D} init i/5;
var x{i in 1..D} init i/10;
min z=sum{ i in 1..D} ((x[i]-xt[i])**2-10*cos(2*constant('pi')*(x[i]-xt[i]))+10);
solve with lso/ maxgen=2000 primalin;
create data answer from [i] x z xt;
quit;

Notes:

  1. primalin is used to tell that initial value of x is used as an agent.

Example 3

Use result of LSO as starting value of NLP.

proc optmodel;
number D init 1000;
number xt{i in 1..D} init i/5;
var x{i in 1..D} init i/10;
min z=sum{ i in 1..D} ((x[i]-xt[i])**2-10*cos(2*constant('pi')*(x[i]-xt[i]))+10);
solve with lso/ maxgen=2000 primalin;
create data answer from [i] x z xt;
quit;

proc optmodel;
number D init 1000;
number xt{i in 1..D} init i/5;
var x{1..D};
read data answer into [i] x;
min z=sum{ i in 1..D} ((x[i]-xt[i])**2-10*cos(2*constant('pi')*(x[i]-xt[i]))+10);
solve with nlp;
create data anser_nlp from [i] x z;
quit;

ggplot2 Simple Tricks

Grid plots

library(gridExtra)
grid.arrange(p1,p2,p3,p4,p5,p6,ncol=2)

#Similar:
#Note: order is column-wise.
m=marrangeGrob(list(p1,p2,p3,p4,p5,p6),nrow=3,ncol=2)
ggsave("xx.pdf", m)

#Similar:
p=marrangeGrob(list(p1,p2,p3,p4,p5,p6),nrow=3,ncol=2,top=NULL)
#Set pointsize for high DPI png.
png("../result/random forest - regression - density.png",height=16.9, width=18,units="cm",res=600 , pointsize=10)
print(p)
dev.off()

Grouped density plot

library(ggplo2)
library(dply)

#mu is group statistics.
#"y" is target variable and "type" is group variable.
mu <- ddply(dfy, "type", summarise, grp.mean = mean(y))

#alpha = 0.4: set alpha for transparency.
#geom_vline(): plot vertical lines.
p <- ggplot(dfy, aes(x = y, color = type)) +
    geom_density(alpha = 0.4) +
    geom_vline(
      data = mu,
      aes(xintercept = grp.mean, color = type),
      linetype = "dashed") +
    ggtitle(title) +
    xlim(-1, 1) +
    ylab("Density") +
    xlab("x")

Stata若干问题

  1. 假如文件乱码,用unicode translate时常用的三种中文编码:gb18030, gb2312, gbk,可能失败。实践中,即便编码是正确的,转换仍然可能失败,似乎是由于版本问题。此时可以考虑将数据导出成csv,然后在notepad++中转换encoding为UTF8-BOM。对于标签,可以用describe , replace,然后导出成csv文件。但这样一来,数据和变量标签是分开的。可以编程生成一个dofile,用la var给变量加上标签。

科学研究的一般目标

科学可以从自然科学与社会科学的角度来看,也可以从理论与应用的角度来看,本文采用后一种角度来考查科学研究的一般目标。

从理论层面,第一,提出一种新的理论或者使用现有的工具来解释某种现象的原因,这实际上是在考查因果。有时,理论落后于实践,实践中一种方法好用,但我们不知道为什么,于是提出一种理论来说明,这也是属于理论层面的。

在应用层面,科学研究的目标是使用现有的理论来解决某个场景下的特定问题。比如应用人工智能进行芯片布局、设计推荐系统。这里的解决问题与理论层面的解释某种现象的原因是有重叠的,但本文认为应用层面的问题一般是带有一定目的性的。像天为什么是蓝色的,这样的问题就不带有目的性,而如何提高生产效率,这就是有目的的。

理论的知识和实践的知识是很不一样的,最理想的情况,如果完全能够理论化,那么可以直接按理论来,但现实世界是复杂的,有些知识无法纳入理论层面,所以只能在实践中得到,或者内化为人的某种心得体会。

理论研究通常从一组概念的公理出发,研究概念间的联系。而应用研究通常是从简化出发,将现实世界的事物简化、抽象为概念,进行理论推导,最后将理论结果还原为现实世界的事物;或者是进行不断的实验,此时可能很少涉及理论。对实践知识进行理论化,也需要对现实世界进行简化,这与应用研究的方法有重叠,这种重叠与之前提到的解决问题与解释现象的重叠是对应的。

应用研究的另一个重要方法就是近似,有时纯理论构造很难直接实现,那么就需要用一些方法来近似,对应着不同的科学流派。划分科学流派的另一依据就是构造理论的出发点,对同一事物,从不同的角度出发,可以构造不同的理论。这些理论可能是等价的,可能是不等价的。但一般来说,总会有那么几种理论比其它理论实现起来更容易、描述起来更简单,从而成为主流。

数学本身并不属于科学,但数学研究主要还是理论层面的研究。