Live Strong

体温-37℃ · 重生

JRobin sucks

December 25th, 2007

  RRDtool is the OpenSource industry standard, high performance data logging and graphing system for time series data. Due to its popularity, it has many language binding and porting. One of them is JRobin, which is a java port.

 

  Maybe JRobin is good pure java rrdtool library with good performance and usage, but i can’t understand why it takes a different RRD format, which means not compatible to the official format. What’s worse, JRobin do provide the convertor, but only provide the convertor which can be used to convert the RRDTool files to its own RRD format, but no  way inverted.

 

  These days, I got several JRobin rrd files and want to use them in my ruby application. But unfortunately, because of JRobin’s sucks non-compatible feature, it’s no doubt that i failed with the effort. So I went through the source code and to see how the JRobin convertor does. After some works, I find what a stupid way JRobin does. Let’s see how can we get a RRDtool file from a JRobin file.

 

  Below is the very simple sample source code.

include Java
require ‘jrobin-1.5.8.jar’
import org.jrobin.core.RrdDb;

name = $*.shift
rrdDb = RrdDb.new(name)
rrdDb.dumpXml("#{name}.xml")
`rrdtool restore -r -f #{name}.xml #{name}`

  That’s all. Everything works well and then I get a expected rrdtool file. From the code, we can see that we just dump the jrobin file to a xml file and then restore the xml file to a rrdtool file. So when this can work, why should JRobin use a non-compatible format? Why can’t JRobin use the official RRDtool format?  Only the god knows, and i have to say that "JRobin sucks".

 

[Update]  The two important points for the reason.

  • RRD4J RRD files are portable, RRDTool files are not. Try to copy a RRDTool file from Linux to Windows platform and fetch data from it. It does not work! But with RRD4J you are free to create your RRD files on Solaris and transfer them to Windows or Linux platform for further processing. It works! That is why I had to define my own file format which is different from the format used in RRDTool - there is no point in creating portable Java application backed by non-portable data files.
  • RRD4J uses the same XML format for RRD dump as RRDTool. You can dump your RRD4J file to an XML file which can be imported by RRDTool. And vice versa.

[RTranslate] Google translate ruby client api 0.2

November 16th, 2007

After i announced the package at ruby-talk, Konrad gives me a good suggestion about the command line tool. What a wonderful suggestion. I think it’s much more convenient to use. So i take a little time to improve it, :)

 

Now the version is 0.2 and you can use a command at shell already.

 

Example:

sishen@sishen:~$ rtranslate -f en -t zh-CN "test translation"
测试翻译

Really cool.  You can get the latest gem at http://rtranslate.googlecode.com. Also, there is a wiki page about the supported language.


[Ann] Google translate ruby client api 0.1 release.

November 16th, 2007

I really like the google translate service. It’s very useful, especially to someone whose is not a native english speaker, such as me. However, I really hate to open a browser/tab to get a translation of some words. It’s no doubt that this work is time-consuming.
 Programmer is lazy, :)

 

So i wrote a ruby client api fro google translate service, then i can get the answer directly from the terminal. The world is much more beautiful. lol~

 

Install

sudo gem install rtranslate

Usage

sishen@lifegoo:~$ irb -rubygems
irb(main):001:0> require ‘rtranslate’
=> true
irb(main):002:0> $KCODE = ‘u’
=> "u"
irb(main):003:0> result = Translate.t("china", Language::ENGLISH, Language::CHINESE_SIMPLIFIED)
=> "中国"
irb(main):004:0> result = Translate.t("china", Language::ENGLISH, Language::JAPANESE)
=> "中華人民共和国"
irb(main):005:0> result = Translate.t("china", Language::ENGLISH, "zh-TW")
=> "中國"

Really very easy to use, I think. And the source is hosting on http://rtranslate.googlecode.com. Enjoy it.

 

Test with rails and postgresql

November 1st, 2007

These days, I worked on testing with postgresql and rails. So many problems then I think it’s so easy to be in a mess. If you are in the same situation, I’m sure this article is for you. I hope it’s helpful, :)

  • Do you need other features of postgresql but doesn’t support by activerecord, such as view?
  • Do you need the contrib functions of postgresql, such as fuzzymath, tsearch2?
  • Do you have to be a normal user instead of superuser in the postgresql environment?

It’s such a pity that all of my answers to the questions are "YES!". All right, to be honest, I should admit that the last answer is "No. But I prefer to".

 

Basically, there are two ways to dump a database: db:schema:dump and db:structure:dump. db:schema:dump is "Create a db/schema.rb file that can be portably used against any DB supported by AR", while db:structure:dump is "Dump the database structure to a SQL file". As a result, I have to use db:structure:dump to dump the database because AR doesn’t support view. Remember to modify the config/environment.rb and set "config.active_record.schema_format = :sql", or the default dump way will be db:schema:dump.

 

Contrib functions is the extended features of postgresql and they live with the database. Let’s think about testing. A clean way of test is to keep the tests independent. So before every test rails will try to provide a clean database, which means it will dropdb and then createdb. That’s the problem. After dropdb/createdb, we have to recreate the contrib functions. But it requires superuser privilege

 

If you are a superuser, all the things will be much easier. You can just run db:test:purge -> db:structure:dump -> db:test:clone_structure, and all will be ok. But what should you do while you can’t get the superuser privilege?

 

Next is my solution. At first, you should create the database and install the required contrib functions.

  1. rewrite the task db:test:purge. Instead of dropdb/createdb, I just let it migrate to VERSION 0 and then delete the schema_info table.
  2. rewrite the task db:structure:dump, and strip the sql which required super privilege.

 

You can read the code below, especially take care of the code under the adapter "postgresql".

Rake::TaskManager.class_eval do
  def delete_task(task_name)
    @tasks.delete(task_name.to_s)
  end
  Rake.application.delete_task("db:test:purge")
  Rake.application.delete_task("db:structure:dump")
end

namespace :db do
  namespace :structure do
    desc "Dump the database structure to a SQL file"
    task :dump => :environment do
      abcs = ActiveRecord::Base.configurations
      case abcs[RAILS_ENV]["adapter"]
      when "postgresql"
        ENV[’PGHOST’]     = abcs[RAILS_ENV]["host"] if abcs[RAILS_ENV]["host"]
        ENV[’PGPORT’]     = abcs[RAILS_ENV]["port"].to_s if abcs[RAILS_ENV]["port"]
        ENV[’PGPASSWORD’] = abcs[RAILS_ENV]["password"].to_s if abcs[RAILS_ENV]["password"]
        search_path = abcs[RAILS_ENV]["schema_search_path"]
        search_path = "–schema=#{search_path}" if search_path
        File.open("db/#{RAILS_ENV}_structure.sql", "w+") do |f|
          schema = `pg_dump -i -U "#{abcs[RAILS_ENV]["username"]}" -s -x -O #{search_path} #{abcs[RAILS_ENV]["database"]}`
          raise "Error dumping database" if $?.exitstatus == 1
          strip = false
          result = ""
          schema.each_line do |l|
            if strip
              if l =~ /.*;$/
                strip = false
              end
            else
              if l =~ /^COMMENT ON SCHEMA public.*$/
              elsif l =~ /^CREATE FUNCTION.*$/
                strip = true
              else
                result << l
              end
            end
          end
          f.write result
        end

      else
        raise "Task not supported by ‘#{abcs["test"]["adapter"]}’"
      end

      if ActiveRecord::Base.connection.supports_migrations?
        File.open("db/#{RAILS_ENV}_structure.sql", "a") { |f| f << ActiveRecord::Base.connection.dump_schema_information }
      end
    end
  end

  namespace :test do
    desc "Empty the test database"
    task :purge => :environment do
      abcs = ActiveRecord::Base.configurations
      case abcs["test"]["adapter"]
      when "postgresql"
        ActiveRecord::Base.establish_connection(:test)
        ActiveRecord::Migrator.migrate("db/migrate/", 0)
        ActiveRecord::Base.connection.execute("drop table schema_info");

      else
        raise "Task not supported by ‘#{ abcs["test"]["adapter"]}’"
      end
    end
  end
end 

 

 

Introduction to rchardet, a universal character detecter in ruby

October 17th, 2007

Do you have been frustrating about the mess text in the internet?

 

Do you have been frustrating about the sucking encoding probelm?

 

Do you have been met the problem about extracting some text from a website without encoding?

 

Yes, most of us met these problems on and off. So pity that the world is not made of utf-8, :(

 

Thanks for modern browser, such as firefox, those headache is less. Why? That’s because all of them have a built-in lib which is used to auto-detect text encoding of a web page. It’s really amazing, :)

 

If you need some basic knowledge about that, you must remember to read this paper "A composite approach to language/encoding detection". It presents three types of auto-detection methods to determine encodings of documents without explicit charset declaration: 1) Coding scheme method, 2) Character Distribution, and 3) 2-Char Sequence Distribution.

 

Also, this library has many ports, such as python(python-chardet), ruby(rchardet). Now let me introduce the basic usage of rchardet, which is a universal character detecter in ruby.

 

Installation:

Really easy.

$gem install rchardet

Basic Usage:

$irb -rubygems
irb(main):001:0> require ‘rchardet’
=> true
irb(main):002:0> CharDet.detect("\xA4\xCF")
=> {"encoding"=>"EUC-JP", "confidence"=>0.99}
irb(main):003:0> CharDet.detect("中国")
=> {"encoding"=>"utf-8", "confidence"=>0.7525}

Adavanced Usage:

$ irb -rubygems
irb(main):001:0> require ‘open-uri’
=> true
irb(main):002:0> require ‘rchardet’
=> true
irb(main):003:0> rawdata = URI.parse(’http://nextlib.lifegoo.com’).read
=> xxx
irb(main):004:0> CharDet.detect(rawdata)
=> {"encoding"=>"utf-8", "confidence"=>0.99}

Also, there are others available library, for example, charguess. You can read the document yourself if you are interested.

 

 

[Postgresql] levenshtein distance support multibyte string

August 26th, 2007

I don’t want to explain what the levenshtein distance mean, if you need some basic knowledge about this, check here levenshtein distance.

Postgresql have the build-in support for levenshtein distance which is written by Joe Conway,

Let’s see the results at first.

postgres=> select length(’阿’);
 length
——–
      1
(1 row)

postgres=> select levenshtein(’阿’, ‘俄’);
 levenshtein
————-
           3
(1 row)

Do you find anything strangely? The length of ‘阿’ is 1, it’s ok. It means postgresql have great support with the unicode/utf8 encoding. But the levenshtein result isn’t very friendly. From the above output, it’s clear that it can only support the ascii character, that’s why the result is 3. I think it disobeyed the design principle of postgresql.

I hacked the code and add the support of multibyte string. It works very well in my environment.

postgres=> select levenshtein(’阿’);
 levenshtein
————-
           1
(1 row)

postgres=> select levenshtein(’阿’, ‘俄’);
 levenshtein
————-
           1
(1 row)

You can download the patch here. Or if you are a debian user, i rebuild the contrib package with the 8.1.9 version. You can get it here.

If you have any problem, please contact me(yedingding@gmail.com), thanks.

[Rails] add_column met the problem of default value with postgresql adapter

August 25th, 2007

Somedays ago, I met a problem with add_column migration with postgresql adapter. When i add a column to a table with the default value, i found that the value of the column in the table didn’t be the default value at all, instead it’s null. However, when i test with the mysql adapter, the world is worked as my expect. Very strange, isn’t it?

For example, read the below code and if you run the migration, you will find that no default value is set with the current records in the user table.

class AddUserStatus < ActiveRecord::Migration
  def self.up
        add_column :users, :status, :integer, :default => 1
  end

  def self.down
        remove_column :users, :status
  end
end

Problem is there, and now it’s time to fight with that. Open source gives us a chance, :)

Let’s see what the add_column method do for us.

#activerecord/lib/active_record/connection_adapters/abstract/schema_statements.rb
      def add_column(table_name, column_name, type, options = {})
        add_column_sql = "ALTER TABLE #{table_name} ADD #{quote_column_name(column_name)} #{type_to_sql(type, options[:limit], options[:precision], options[:scale])}"
        add_column_options!(add_column_sql, options)
        execute(add_column_sql)
      end

      def add_column_options!(sql, options) #:nodoc:
        sql << " DEFAULT #{quote(options[:default], options[:column])}" if options_include_default?(options)
        sql << " NOT NULL" if options[:null] == false
      end

It seems everything goes very well. But unluckily, postgresql redefine the method, so go on.

#activerecord/lib/active_record/connection_adapters/postgresql_adapter.rb
      # Adds a column to a table.
      def add_column(table_name, column_name, type, options = {})
        default = options[:default]
        notnull = options[:null] == false

        quoted_column_name = quote_column_name(column_name)

        # Add the column.
        execute("ALTER TABLE #{table_name} ADD COLUMN #{quoted_column_name} #{type_to_sql(type, options[:limit])}")
        # Set optional default. If not null, update nulls to the new default.
        if options_include_default?(options)
          change_column_default(table_name, column_name, default)
          if notnull
            execute("UPDATE #{table_name} SET #{quoted_column_name}=#{quote(default, options[:column])} WHERE #{quoted_column_name} IS NULL")
          end
        end

        if notnull
          execute("ALTER TABLE #{table_name} ALTER #{quoted_column_name} SET NOT NULL")
        end
      end

      # Changes the default value of a table column.
      def change_column_default(table_name, column_name, default)
        execute "ALTER TABLE #{table_name} ALTER COLUMN #{quote_column_name(column_name)} SET DEFAULT #{quote(default)}"
      end

That’s the problem. The default value is set through the method "change_column_default". And in the postgresql manual, the "ALTER TABLE" section has the following text:

SET/DROP DEFAULT
    These forms set or remove the default value for a column. The default values only apply to subsequent INSERT commands; they do not cause rows already in the table to change. Defaults may also be created for views, in which case they are inserted into INSERT statements on the view before the view’s ON INSERT rule is applied.

Very clear, isn’t it? But we still leave something out. That is,

         if notnull
            execute("UPDATE #{table_name} SET #{quoted_column_name}=#{quote(default, options[:column])} WHERE #{quoted_column_name} IS NULL")
          end

Which means when the value is not null, update the whole table and set the value to the default value when it’s null.

So the solution is

class AddUserStatus < ActiveRecord::Migration
  def self.up
        add_column :users, :status, :integer, :default => 1, :null => false
  end

  def self.down
        remove_column :users, :status
  end
end

 

However, i think it’s not a prefect design because the inconsistent design among the adapters. When i commented the method and let postgresql adapter use the default add_column method in the abstract_adapter, I also passed whole the test suite. So i really confused by that. Whether this is a design dilemma, i have no idea.

Feel free to leave your thought, :)

[Ruby-GetText] Avoid the annoying error when ‘rake updatepo’: “undefined method ‘untranslate_all?’ for Foo:Class

August 2nd, 2007

Gettext is a great tool for translating user interfaces of applications into different languages.  Masao Mutoh wrote Ruby-Gettext-Package for the ruby developer to do the L10N and I18N. It’s really great. Also, Using Gettext To Translate Your Rails Application is a great tutorial for Ruby-GetText-Package and Ruby on Rails by Sascha Ebach.

 

After upgrading the version from 1.9.0 to 1.10.0, I met a annoying error when executing `rake updatepo`, that is, "undefiend method ‘untranslate_all?’ for Foo:Class" for any model class. It’s a pity that I can’t find any useful tips about the problem with the help of Google.

 

Thanks to the open source, then i can directly look through the source code. After the effort, I at last got the cause of the problem.

The related file is lib/gettext/parser/active_record.rb. The code is

 

            begin
              ENV["RAILS_ENV"] = @config[:db_mode]
              require ‘config/boot.rb’
              require ‘config/environment.rb’
#             require ‘app/controllers/application.rb’
            rescue LoadError
              require_rails ‘rubygems’
              if Kernel.respond_to? :gem
                gem ‘activerecord’
              else
                require_gem ‘activerecord’
              end
              require_rails ‘active_record’
              require_rails ‘gettext/active_record’
            end

Look at the comment line, that’s the difference with the 1.9.0 version. And most of the tutorials tell user to require ‘gettext/rails’ in application.rb. That is, 

 

require ‘gettext/rails’

class ApplicationController < ActionController::Base
    init_gettext "myapp"
end

That’s the problem! 

 

So the solution is easy:  put the line "require ‘gettext/rails’" into the file config/environment.rb.

[rubygem]A library to convert chinese to pinyin

August 1st, 2007

前些日子, 用ruby写了一个汉字转化为拼音的类. 大家有用的话就自便吧, :).

原理很简单. 现在网上到处可以见到各种语种的实现, 原理都是大同小异. 简而言之, 先把汉字转化成GB2312编码, 然后查阅GB2312 Character Coding Table, 获取其对应的拼音码.

主代码:

      def to_pinyin(chinese)
        array = chinese.unpack("U*")

        pinyin = array.inject("") do |pinyin, integer|
          if isChineseUnicode(integer)
            pinyin += to_Pronunciation(integer)
          else
            # todo: other unicode
            pinyin += integer.chr
          end
        end
        pinyin.strip!
        pinyin
      end

 
由上面可以看出, 整个处理process是 chinese -> Unicode number -> iconv (ucs-2 -> gb2312) -> pinyin table, 具体的还是自己看代码吧, 很简单, :)

Test Case:

$:.unshift "../lib"

require ‘test/unit’
require ‘pinyin’

class ChineseToPinyinTest < Test::Unit::TestCase
  def test_pinyin
    assert_equal("zhong", Chinese::Pronunciation.to_pinyin("中"))
    assert_equal("ss", Chinese::Pronunciation.to_pinyin("ss"))
    assert_equal("zhongguo", Chinese::Pronunciation.to_pinyin("中国"))
    assert_equal(’womenenglishzifu’, Chinese::Pronunciation.to_pinyin(’我们english字符’))
    assert_equal(’zhonghuarenmingongheguo’, Chinese::Pronunciation.to_pinyin(’中华人民共和国’))
  end
end

Let’s run it:

$ rake
/usr/bin/ruby1.8 -Ilib:lib "/var/lib/gems/1.8/gems/rake-0.7.3/lib/rake/rake_test_loader.rb" "tests/ts_pinyin.rb"
Loaded suite /var/lib/gems/1.8/gems/rake-0.7.3/lib/rake/rake_test_loader
Started
.
Finished in 0.001338 seconds.

1 tests, 7 assertions, 0 failures, 0 errors

README:

= Pinyin
Turn chinese to pronunciation

This class enable ruby programs to turn chinese hanzi to its
pronunciation, that is, pinyin. It supports the GB2312 character
set coding table.

== Usage

require "pinyin"

pinyin = Chinese::Pronunciation.to_pinyin("xxx")

== CHANGES
v0.1.0 July 18
* initial release

Copyright (c) 2007 Dingding Ye <yedingding@gmail.com>
Distributed under MIT License

 

You can download the gem here.  I hope it’s helpful.

[GM Script]Baidu Mp3 Show download link

July 10th, 2007

GreaseMonkey is really a evil, :)  It makes our life easier and comfortable.

Userscripts.org has a script which used to show the directly baidu mp3 download link. Of course it’s useful, especially for me. But it’s pity that it doesn’t work any more. So i made some changes depends on his base, also this.

 

ChangeLog:

  1. process dom to get the url, instead of xpath.
  2. use GM_xmlhttpRequest to get the url.

 

You can see the effect from the below picture, :)

greasemonkey script for baidu mp3 link show

 

在这里, 我支持Allen su一下, ^_^

 You can download the src here.