Take a look on the following simple C function:

double triplesum(double a, double b, double c) {
  return a + b + c; 
}

And we want to use it in Python:

>>> from ctypes import *
>>> lib1 = cdll.LoadLibrary('triplesum.so')
>>> lib1.triplesum.argtypes = [c_double, c_double, c_double]
>>> lib1.triplesum.restype = c_double
>>> lib1.triplesum(1.2, 3.4, 2.3)
6.8999999999999995

In R:
C code:

void triplesum(double* a, double* b, double* c, double* d) {
  *d =  *a + *b + *c; 
}

Compile it by

 $ R CMD SHLIB forR.c

R code:

> dyn.load("forR.so")
> a <- 1.2; b <- 4.3; c <- 2.3; d  .C("triplesum", a, b, c, d)[[4]]
[1] 6.9

It’s hard to say which one is more convenient, but Python seems better because you don’t need to modify the code.

R‘s mechanism makes me feel like writing Fortran subroutine but in C syntax.

R 打敗 Python?

November 14, 2010

對統計專業又天天說R的人來說, 這篇自high必備文當然是一定要的了。

至於會說不止一種統計計算語言的人,另一件事可能會讓他們更驚訝, Google 大神欽定的 AppEngine 兩種程式語言之一的 Python 竟然上不了全世界最臭屁的新約克時間, 一篇報導也沒有。

這算是新約克的記者程度太差還是, 嗯, 還是 R 己經(至少在統計計算上)把 Python完全比下去了?

Math Stat 20100829-1

August 26, 2010

The linear programming mentioned before:
For unknown $p$-tuple vector $\boldsymbol x$, consider vector $\boldsymbol c$
minimize \[ \boldsymbol c^T\boldsymbol x\]
with restriction
\[ \mathbf A_1\boldsymbol x \leq \boldsymbol b_1\]
\[ \mathbf A_2\boldsymbol x \geq \boldsymbol b_2\]
and
\[ x_i \geq 0, i = 1,\ldots, p\]
thus we have
\[ \left(\begin{array}{c}\mathbf A_1 \\ -\mathbf A_2 \\ \mathbf A_3 \end{array}\right)\boldsymbol x = \mathbf A\boldsymbol x \leq \boldsymbol b = \left(\begin{array}{c}\mathbf b_1 \\ -\mathbf b_2 \\ \mathbf b_3\end{array}\right)\]
where $\mathbf A_3 = -\mathbf I_p$, $\boldsymbol b_3 = 0\cdot\boldsymbol 1_p$

by using Python’s CVXOPT package:

from scipy import array
from cvxopt import matrix
from cvxopt import solvers

A = matrix([[1.0, 1.0, 0.0, 0.0, 0.0, 0.0], 
            [0.0, 0.0, 1.0, 1.0, 0.0, 0.0], 
            [0.0, 0.0, 0.0, 0.0, 1.0, 1.0], 
            [-1.0, 0.0, -1.0, 0.0, -1.0, 0.0], 
            [0.0, -1.0, 0.0, -1.0, 0.0, -1.0],
            [-1.0, 0.0, 0.0, 0.0, 0.0, 0.0],
            [0.0, -1.0, 0.0, 0.0, 0.0, 0.0],
            [0.0, 0.0, -1.0, 0.0, 0.0, 0.0],
            [0.0, 0.0, 0.0, -1.0, 0.0, 0.0],
            [0.0, 0.0, 0.0, 0.0, -1.0, 0.0],
            [0.0, 0.0, 0.0, 0.0, 0.0, -1.0]]).T

b = matrix([[350.0, 250.0, 600.0, -500.0, -650.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]]) 

c = matrix([[1.4, 2.7, 1.8, 2.5, 2.0, 3.2]])

sol = solvers.lp(c, A, b)

print(array(sol['x']).round(1))

and the results:

>>>      pcost       dcost       gap    pres   dres   k/t
 0:  2.3333e+03  2.2344e+03  5e+03  3e-01  8e-01  1e+00
 1:  2.7258e+03  2.7257e+03  3e+02  3e-02  7e-02  8e+00
 2:  2.7083e+03  2.7085e+03  4e+01  4e-03  1e-02  1e+00
 3:  2.6954e+03  2.6954e+03  2e+00  2e-04  6e-04  1e-01
 4:  2.6950e+03  2.6950e+03  2e-02  2e-06  6e-06  1e-03
 5:  2.6950e+03  2.6950e+03  2e-04  2e-08  6e-08  1e-05
Optimal solution found.
[[ 350.]
 [   0.]
 [   0.]
 [ 250.]
 [ 150.]
 [ 400.]]
>>> 

下面的Python碼:

import timeit
from scipy import *
from scipy.linalg import *
from scipy.stats.distributions import *

def svdTime():
    return svd(norm.rvs(size=[1e3, 1e3]))

t = timeit.Timer('svdTime()', 'from __main__ import svdTime')
print(t.timeit(number=1))

會得到:

gong-yi@21:17 ~ $ python test.py 
8.96158790588

這還是在Intel core i5上得到的結果咧!
R可以在2秒左右, 而Octave更是在1秒左右,SciPy的開發團隊要注意一下了

You can use the following method to make GotoBLAS2 (the fastest BLAS I have ever used) available and usable on your Debian/Ubuntu system:

  1. Compile GotoBLAS2. Make use that you have configured USE_PTHREAD=1 and disable library profiling function (personal opinion, I found it’s a little annoying).
  2. ar x libgoto2.a
  3. gfortran -shared -lpthread -lm -Wl,-soname=libblas.so.3gf -o libblas.so.3gf.0
  4. Install the library at somewhere under LD_PATH. Here I assume that you install it under /usr/lib/gotoblas2
  5. sudo cp libblas.so.3gf.0 /usr/lib/gotoblas2/.
  6. Edit /etc/ld.so.conf.d/00-gotoblas2.conf, add /usr/lib/gotoblas2 to it.
  7. sudo ldconfig
  8. Finished.

Now you can test it, for example, SVD: $\mathbf M = \mathbf V\mathbf D\mathbf U$.

iPython in emacs

January 16, 2010

It sucks. You have to:

In [4]: kk = unicode("天生我材必有用", "utf-8")

In [6]: kk
Out[6]: u'\u5929\u751f\u6211\u6750\u5fc5\u6709\u7528'

In [7]: print kk
------> print(kk)
天生我材必有用
;; General function keywords highlight
(defface font-lock-my-functions-face
   '((((class color) (min-colors 88) (background light)) (:foreground "RoyalBlue4"  )))
   "Font Lock mode face used to highlight function names."
   :group 'font-lock-faces)

(defvar font-lock-my-functions-face 'font-lock-my-functions-face
   "Face name to use for functions.")

(defvar my-functions-name-regexp
  "\\s\"?\\(\\(\\sw\\|\\s_\\)+\\(<-\\)?\\)\\s\"?*\\s-*(" 
   "Regexp for function names")

(defvar my-functions-color  
   (list
    (cons my-functions-name-regexp '(1 font-lock-my-functions-face keep)))
   "Defines highlighting pattern for functions")

(font-lock-add-keywords 'c-mode
   '(("\\s\"?\\(\\(\\sw\\|\\s_\\)+\\(<-\\)?\\)\\s\"?*\\s-*(" . font-lock-my-functions-face)))
(font-lock-add-keywords 'python-mode
   '(("\\s\"?\\(\\(\\sw\\|\\s_\\)+\\(<-\\)?\\)\\s\"?*\\s-*(" . font-lock-my-functions-face)))

(setq ess-R-mode-font-lock-keywords
       (append  ess-R-mode-font-lock-keywords    my-functions-color))
(setq inferior-ess-R-font-lock-keywords
       (append inferior-ess-R-font-lock-keywords my-functions-color)) 
Emacs

The linear ODE system to describe prey-predator relationship :
\[ \frac{dx}{dt} = -x(\alpha – \beta y) \]
\[ \frac{dy}{dt} = -y(\gamma – \delta x) \]
Using the cookbook example from scipy.org:



Another approach is statistical approach:
\[ y_t = F(x_t) + \epsilon_t\]
What is the problem? The problem is statistical approach is static.

Note that the statistical approach is just using some simple function:
\[ y_t = \beta_0 + \beta_1 x_t + \beta_2 x_t^2 + \cdots + \beta_k x_t^k + \epsilon_t \]
this leads to a problem: this model assume that $y$ can not explain $x$.

搞死我的CEDET 1.0pre6

January 2, 2010

裝了cedet 1.0pre6,C程式沒變得好寫很多, python倒是出了問題 sigh………

As following:

from scipy import *
from scipy.linalg import *
from numpy.random import *
from scipy.stats.distributions import *

def qCondMVnor(P, rsNormVec, cpar, meanVec, covMatx):
    x2 = delete(rsNormVec, cpar)
    mu2 = delete(meanVec, cpar)
    mu1 = meanVec[cpar]
    cov22 = delete(delete(covMatx, s_[cpar], axis=0), s_[cpar], axis=1)
    cov12 = delete(covMatx[1,:], cpar, axis=1)
    cov21 = delete(covMatx[:,1], cpar, axis=0)
    sd = float(sqrt(covMatx[cpar, cpar] - cov12*inv(cov22)*cov21))
    mean = float(mu1 + cov12*inv(cov22)*(matrix(x2 - mu2).reshape(2,1)))
    return norm.ppf(P, loc=mean, scale=sd)