偶像切尔斯基:敏捷质疑: TDD

我认为工业界是时候严肃认真的考虑测试环境了, 最好在语言中内建对测试的支持, 一些为产品环境设计的语言特性, 应该在测试环境中关闭, 而在产品环境中生效. 其实之前很多编译器都支持 Release 和 Debug 两种环境, 也是从代码质量的方面考虑的. 现在毫无疑问证实单元测试比 Debug 更有效, 是时候与时俱进增加对 Test 的支持而逐渐罢黜对 Debug 的支持.

不过我认为工业界是时候严肃认真的考虑的不是内建对测试的支持而是对类型推导的重视。要让程序不出错,一大堆和更大的一堆错误侦测机制都解决不了这个问题,你应该写不会出错的程序。当然debugger本身就是一个鸡肋,在exception stack trace帮不了你的情况下,debugger也帮不了你。

(写出像 Erlang 那样不会出错的程序吧。写出像 Haskell 那样不会出错的程序吧。)

尤其是在并行计算的时代。当然就算不考虑并行计算,类型体系是能让程序不出错的严肃认真的办法。写得快并且具有类型推导的语言是应该被严肃认真的考虑的。

Dear all,

I'm glad to announce that Stomperl 0.0.2 is out. You can now check it out at http://stomperl.googlecode.com/svn/tags/0.0.2/ .

Since the first preview version [http://gigix.thoughtworkers.org/2007/12/12/announcement-stomperl-0-0-1], we've made some mentionable progress. The most significant one is the support to message queuing. Message destinations in Stomperl version 0.0.1 were only allowed to be topics. Now they are allowed to be queues as well: destination with its name starting with "queue^" would behave as a queue. (Check out the difference between a topic and a queue from "Enterprise Integration Patterns" [http://www.enterpriseintegrationpatterns.com/].)

Furthermore, Stomperl 0.0.2 supports transaction: messages in a transaction would be send all-or-not. It also supports ACK and ERROR frames. Actually it supports all commands listed in the protocol so far. Although we haven't got any official compatibility test suite yet, I feel it's fairly safe to say that Stomperl is a 100% Stomp compatible broker.

What's next then? I'd like to do some investigation to other Stomp brokers (as well as clients) and do some performance benchmarks. Besides that, I suppose there would be some defects and housecleaning need to be done. Still, any suggestion and feedback would be highly appreciated. 

Announcement: Stomperl 0.0.1

December 12th, 2007

Dear all,

Stomperl 0.0.1 (the first preview release) is out.

Stomperl [http://code.google.com/p/stomperl/] is an implementation of Stomp [http://stomp.codehaus.org/] broker with Erlang. That means performance, scalability, reliability and elegance in concurrent programming are our goals. And since Stomp is simple enough, it's a good start point to learn Erlang/OTP programming.

Version 0.0.1 is the first preview release. The main purpose is to gather feedbacks from the community. So far it supports basic elements in Stomp protocol: CONNECT, DISCONNECT, SUBSCRIBE, UNSUBSCRIBE, SEND and RECEIPT. It passes acceptance tests built with both Java and Perl clients. That's why I consider it as "usable" and decide to announce it.

To give Stomperl a try (NOTE: EUnit later than 2.0 beta is required):

  1. Check it out with Subversion: svn checkout http://stomperl.googlecode.com/svn/tags/0.0.1/ stomperl-0.0.1
  2. Kick off the broker: make startup
  3. Now you can connect to the broker at port 61613. However I suggest you run the test suite first: make test

What's next? I suppose Stomperl will support full Stomp protocol in its 0.0.2 version, along with a better test coverage. We will do more acceptance test, compatibility test and performance test in the future. But first, any suggestion and feedback would be highly appreciated. 

在Erlang程序里处理状态

December 7th, 2007

从题外话说起:据我亲身经历,很多令人郁结的程序员最大的问题不是不熟悉语言和类库,不是不了解算法,不是不会用工具,而是对程序里的信息流没有概念——弄不清哪些信息应该在什么地方、信息从哪里来、经过怎样的转换、到哪里去。而顺序化编程语言(C、C++、Java、C#……)很大程度上加重了这个毛病:在一些不那么漂亮的代码里经常可以看到被滥用的static方法和变量,说到底还是不恰当的全局变量的延续,说到底还是没弄清楚哪些信息应该在哪些位置出现。

而Erlang的编程练习对此很有帮助。没有全局变量,变量赋值后就不能改变。于是一些常见的bad smell自然而然地就不会出现了,一些常用的重构手法自然而然地就用不上了。不过呢,这个世界毕竟是有状态的。比如说一个Stomp server就需要记住哪个client订阅了哪个频道。于是当你认真思考“什么信息应该在什么地方”这个问题时,Erlang的几种选择就显得很有意思了。

参数传递。只有当你认真思考的时候,你才会发现原来很多信息都是很容易得到的。要控制一个函数的行为,最简单也最常用的办法就是改变传递给它的参数。如果这个函数需要一种新的状态,也许那意味着给它增加一个参数。

进程字典 。调用putget方法可以把信息放入一个“每个进程一个实例”的字典。例如random 在字典里放了一个名叫random_seed的变量,用来生成伪随机数。

ETS 。同样是一张二维表,ETS里的信息是所有进程都能访问的。例如Stomperl 需要记录哪个client订阅哪个mailer进程,显然所有监听socket的进程都需要了解这个订阅信息,才能正确分发消息。于是订阅信息就应该(至少)在ETS里保存。

DETS 。ETS只在内存中存在,这意味着两件事:第一,程序结束数据就消失;第二,数据只能在一个节点共享。DETS的API和ETS相似,但它是基于文件的,所以持久保存和多节点共享都是题中应有之义。注意,ETS和DETS保存的数据都必须是tuple。

Mnesia 。这是一个真正的数据库。功能齐备,并且仍然软实时。

以上四种方式的排列不是随机的。应该首先考虑靠前的手段,如果有明确的理由表明一种手段不能满足需要时才可以考虑比较靠后的手段。这很费脑子,有时让人沮丧。但经过深思熟虑的程序好过不假思索的程序,发现自己犯错好过犯错而不自知。

Stomperl: Stomp with Erlang

December 7th, 2007

Stomperl is an attempt to build something not-so-that-non-trivial (in this case, a Stomp server) with Erlang. To kick it off, I stole the server architecture from here and here. Currently it doesn't even support the full protocol: only CONNECT, SUBSCRIBE and SEND commands are supported. But anyway, it's moving forward and I'm learning from it.

To make it run:

  1. make test, which hopefully succeeds.
  2. make start, then you'll get an Erlang console.
  3. In the Erlang console, tcp_server_sup:start_server().
  4. In another shell console, make acceptance, which hopefully succeeds.

I created an extremely simple acceptance test with Gozirra

What's the next? Well, I suppose I'll implement the full protocol, and fix some defects. As a newbie to Erlang, I made and am making stupid mistakes. Welcome to be stupid together with me.

原文:Programming Distributed Erlang Applications: Pitfalls and Recipes

译文:编写分布式的Erlang程序:陷阱和对策

为了在Erlang运行时系统基础上开发更可靠的分布式系统和算法,我们研究了Erlang编程语言中分布式的部分。使用Erlang,把一个运行在单个节点上的程序转换成完全分布式(运行在多个节点上)的应用程序可谓易如反掌(只需要修改对spawn函数的调用,使之在不同节点上产生进程);但尽管如此,Erlang语言和API中仍然有一些阴暗的角落可能在引入分布式运算时带来问题。在本文中,我们将介绍几个这样的陷阱:在这些地方,取决于进程是否运行在同一个节点上,进程间通信的语义会有显著的差异。我们同时还提供了一些关于“编写安全的分布式系统”的指导原则。

更多Erlang文档见Erlang文档计划

原文:Extended Process Registry for Erlang

中译:扩展Erlang的进程注册表

内建的进程注册表早已被实践证明是Erlang语言中一项极其有用的特性。它使得开发者能够很轻松地提供具名服务 (named services):用户无需知道服务进程的进程标识符(process identifier,PID)即可使用这些服务。

但目前的进程注册表也有其局限性:进程的名字必须是atom(不支持有结构的数据),每个进程只能用一个名字注册,并且缺乏有效的搜索和遍历机制。

在Ericsson下属的IMS Gateways的产品开发中,我们经常需要维护一张映射表,以便根据各种属性找到负责处理调用的进程。我们从中发现了一个通用的模式(一种索引表),并由此开始开发一个扩展的进程注册表。

一开始这个想法并没有立即体现出价值,甚至看不出在实用中提供了多大的便利。但随着开发的进行,程序设计者们越来越多地使用这个扩展的进程注册表,并因此显著减少了代码量、提高了实现一致性。此外,扩展的进程注册表还提供了一种强大的调试机制,能够在数万个进程中进行有效的调试 。

本文介绍了这种扩展的进程注册表,并对其进行检讨,从而提出一种新的实现方式,使之更具一致性、效率更高、并且支持全局命名空间。

全文见Erlang文档计划

从Py2Erl开始的半天搜索

October 24th, 2007

(昨天发在ECUG的一个邮件,想了一下,还是放到自己blog上)

今天上午,尝试用ErlyWeb做一个petstore,最终被击败了。CaoYuan的blog帮了很大忙:
http://blogtrader.net/page/dcaoyuan/entry/from_rails_to_erlyweb_part2

结论:ErlyWeb在开发便利性方面距离Rails不是一点半点。尤其是view可用的工具太少,有太多东西要从头做起。用来做web前端,不仅有高射炮打蚊子之嫌,而且颇费劲。不靠谱。

中午写InfoQ的这个报道,其间看了一遍"Py2Erl"那个讲稿,兴趣起来了。
InfoQ报道:http://www.infoq.com/cn/news/2007/10/cn-erlounge-ii
讲稿:http://www.erlang.org.cn/ecug/071013-erlparty2/071014-py2erl/

找到了Stackless Python,写了一段小程序。好玩,靠谱。
Stackless Python:http://www.stackless.com/
抄一段小程序:http://gigix.thoughtworkers.org/2007/10/23/is-stackless-python-the-way
有人做了benchmark,差强人意吧
http://muharem.wordpress.com/2007/07/31/erlang-vs-stackless-python-a-...

函数式编程,现在已经不成其为卖点了。Erlang最吸引我的是"那种"对并发程序设计的建模方式。从stackless那里看到,原来这个模式叫Actors Model,有年头。
C2的解释:http://c2.com/cgi/wiki?ActorsModel
这篇文章非常好看:http://www.cypherpunks.to/erights/history/actors/AIM-410.pdf
这篇也好看,就是太玄虚了点:http://www.cypherpunks.to/erights/history/actors/AIM-691.pdf

好吧……Ruby咋样呢?继续人肉搜索……要说Ruby(和/或Python)社区从Erlang那里得到什么,直接转过去是可能性不大滴,主要还是(1)学习人家的先进思想;(2)混合语言编程。Ruby在这方面的努力,包括Ruby-Erlang bridging和自己实现Actors Model。
Erlectricity是一个bridge:http://code.google.com/p/erlectricity/
Rebar是另一个bridge,成熟度更低:http://rubyisawesome.com/2007/4/30/calling-erlang-from-ruby-teaser
Omnibus实现了Actors Model,成熟度也很低:http://groups.google.com/group/ruby-talk-google/browse_frm/thread/ec4...

拿着Omnibus玩了一会儿。这个语法写出来就等而下之了。看了看源代码,没有什么奇妙的,只是把Thread封装了一下而已。倒是future的概念,是用native C代码实现的。
又搞了一段小程序:http://gigix.thoughtworkers.org/2007/10/23/is-concurrent-ruby-better
什么是future? http://www.ps.uni-sb.de/alice/manual/futures.html

听说Ruby 1.9要加入一个叫做Fiber的东西。这个,把语法糖扔掉以后,和Omnibus基本上同一回事……
http://www.infoq.com/news/2007/08/ruby-1-9-fibers

还有一个围绕着Ruby线程模型的讨论。GIL会对并发编程造成什么影响呢?没认真去想。
http://www.infoq.com/news/2007/05/ruby-threading-futures

以上。

--
Jeff Xiong
Software Journeyman - http://gigix.thoughtworkers.org
Open Source Contributor - http://rubyworks.rubyforge.org
Technical Evangelist - http://www.infoq.com/cn/

Is Stackless Python THE Way?

October 23rd, 2007

Code with Stackless Python

#
# pingpong_stackless.py
#

import stackless

ping_channel = stackless.channel()
pong_channel = stackless.channel()

def ping():
    while ping_channel.receive(): #blocks here
        print "PING" 
        pong_channel.send("from ping")

def pong():
    while pong_channel.receive():
        print "PONG" 
        ping_channel.send("from pong")

stackless.tasklet(ping)()
stackless.tasklet(pong)()

# we need to 'prime' the game by sending a start message
# if not, both tasklets will block
stackless.tasklet(ping_channel.send)('startup')

stackless.run()

And it runs…forever.

李剑在InfoQ中文站的新闻里提了一个好问题,

“信 仰”这个词从广义来讲说范围太大,如果对其进行细化,至少可以分成两类:一种是自己并无亲身体验,只是听到众多技术专家、敏捷推动者对TDD的宣传、推 崇,就陷入了对这个“魔咒”无限的狂热中,这个可以算是盲从;另外一种,则是在实际开发过程中有过使用TDD的丰富经验,并且对TDD所带来的益处深有体 会,所以才会坚定不移的继续使用TDD,并致力于它的推广。那么,你对敏捷和TDD的感觉应该属于哪一种?

实际上James Coplien的抱怨——“ 我们被告知‘只有做TDD你才是一个专家’……,却不告诉我们为什么要相信这一点”——很多时候恰好是由于像ThoughtWorks这样的咨询公司被误解的好意:我们真的知道,在那个情景下TDD(或者别的敏捷实践)会有所帮助;但要拿出证据来证明这一点,或者说清楚具体应该怎么做,我们没有这么多(免费的)时间和精力了。有时候说话说一半确实会造成误解,不过我仍然认为:信息多一点总比少一点的强。

另外某同志在给客户的信里说Erlang会是web开发的下一个王子。可他举的例子真算不上有吸引力:第一,用Erlang做这种东西怎么看也觉得是大炮轰蚊子;第二,这个SlideAware怎么看也觉得是个过度设计的Web 2.0作品——至少我个人还是比较倾向于Google Doc。所以这事情仍然很tricky:多少网站当真需要“9个9”的可靠性?为了一个关键部位的完美实现而提高整个网站的开发(和维护)难度,这事情在什么情况下会划算?ErlyWebRails还有多远?

Do not program "defensively"

September 25th, 2007

(From Erlang Programming Rules )

A defensive program is one where the programmer does not “trust” the input data to the part of the system they are programming. In general one should not test input data to functions for correctness. Most of the code in the system should be written with the assumption that the input data to the function in question is correct. Only a small part of the code should actually perform any checking of the data. This is usually done when data “enters” the system for the first time, once data has been checked as it enters the system it should thereafter be assumed correct.

Example:

%% Args: Option is all|normal
get_server_usage_info(Option, AsciiPid) ->
  Pid = list_to_pid(AsciiPid),
  case Option of
    all -> get_all_info(Pid);
    normal -> get_normal_info(Pid)
  end.
The function will crash if Option neither normal nor all, and it should do that. The caller is responsible for supplying correct input.

[抄书]Erlang的接口库

June 18th, 2007

However, several libraries included in the Erlang distribution simplify the job of interfacing Erlang to external programs; these include the following:

http://www.erlang.org/doc/pdf/erl_interface.pdf

Erl interface (ei) is a set of C routines and macros for encoding and decoding the Erlang external format. On the Erlang side, an Erlang program uses term_to_binary to serialize an Erlang term, and on the C side the routines in ei can be used to unpack this binary. ei can also be used to construct a binary, which the Erlang side can unpack with binary_to_term.

http://www.erlang.org/doc/pdf/ic.pdf

The Erlang IDL Compiler (ic). The ic application is an Erlang implementation of an OMG IDL complier.

http://www.erlang.org/doc/pdf/jinterface.pdf

Jinteface is a set of tools for interfacing Java to Erlang. It provides a full mapping of Erlang types to Java objects, encoding and decoding Erlang terms, linking to Erlang processes, and so on, as well as a wide range of additional features.

Many programming languages allow code in foreign languages to be linked into the application executable. In Erlang, we don’t allow this for reasons of safety.1 If we were to link an external program into the Erlang executable, then a mistake in the external program could easily crash the Erlang system. For this reason, all foreign language code must be run outside the Erlang system in an external operating system process. The Erlang system and the external process communicate through a byte stream.

Erlang继续学习中……

June 15th, 2007

Programming Erlang读到第9章,错误处理。把多个进程link起来,监控或者错误恢复或者后备。记得以前看到过一句很牛的话:Erlang is not reliable—Java is reliable, Erlang just rocks.

不过疑问仍然存在:什么时候才需要这样“rocks”的进程管理能力?什么时候才需要如此清晰的进程建模?当然高性能服务器需要。但高性能服务器不是谁都会拿着玩玩的。Ruby on Rails会走红,因为谁都可以拿它做个网站然后做创业梦。那么Erlang呢?它带来的联想是什么?

Ring benchmark:创建N个进程,把它们组合成环状;把一条消息在环上传递M圈,这样总共有N*M次消息传递。对于不同的N和M值,记录所需的时间。

+ erl -noshell -s ring benchmark 3000 1000 -s init stop
Benchmark starting: ring with 3000 nodes, send message around 1000 times
Benchmark done
Time=750000 (827000) microseconds

第二问:用你熟悉的其他语言编写类似的程序,比较结果。

首先,0.8秒完成三百万次进程间消息传递,这足以说明问题。其次(也更重要的),我很怀疑自己是否能用别的语言写出同样功能的程序——Erlang的程序共计41行。