2018-04-26

SQL语句中的NULL判断

今天在写SQL过滤条件时，有个URL判空逻辑，想当然的写成了如下格式:

1	select * from record where url != null

然而，明明有符合条件的数据，但查询结果并没有出现。问题在哪呢？
检索了一下，发现SQL语句中针对NULL的处理逻辑有些特殊。

1	NULL表示一个未知的值，不能与任何类型的数据进行比较。如果进行了比较，则比较结果依然是NULL，而不是True或者False。

那如何判断呢？答案是，使用is NULL 或者 is not NULL。
上面的SQL语句改成如下即可。

1	select * from record where url is not null

MySQL手册还给了几个比较典型的case，有助于进一步理解:

mysql> SELECT 1 IS NULL, 1 IS NOT NULL;
+-----------+---------------+
| 1 IS NULL | 1 IS NOT NULL |
+-----------+---------------+
|         0 |             1 |
+-----------+---------------+

mysql> SELECT 1 = NULL, 1 <> NULL, 1 < NULL, 1 > NULL;
+----------+-----------+----------+----------+
| 1 = NULL | 1 <> NULL | 1 < NULL | 1 > NULL |
+----------+-----------+----------+----------+
|     NULL |      NULL |     NULL |     NULL |
+----------+-----------+----------+----------+

mysql> SELECT 0 IS NULL, 0 IS NOT NULL, '' IS NULL, '' IS NOT NULL;
+-----------+---------------+------------+----------------+
| 0 IS NULL | 0 IS NOT NULL | '' IS NULL | '' IS NOT NULL |
+-----------+---------------+------------+----------------+
|         0 |             1 |          0 |              1 |
+-----------+---------------+------------+----------------+

完整说明请参考MySQL手册-Working with NULL Values。

2018-04-14

Lua pcall及xpcall函数

Lua应用在一般情况下很少使用到异常错误处理，但有时为了防止模块调用异常、函数调用异常、文件读写异常等一些非关键路径(有重试/容错手段)直接抛出异常，中断执行，会封装这些函数的调用，进行异常捕获。

Lua的异常捕获主要基于pcall及xpcall函数。

pcall函数

Summary
Calls a function in protected mode
Prototype
ok, result [ , result2 ...] = pcall (f, arg1, arg2, ...)

Description
Calls function f with the supplied arguments in protected mode. Catches errors and returns:

On success:
true
function result(s) - may be more than one

On failure:
false
error message

举个简单的例子:

--- 求和
function sum(a,b,c)
    d = a + b + c
    return d
end

local e = sum(10, 20, 30)
print ("e:", e)
local h = sum("ten", "forty", "nine")
print ("h:", h)

--output:
e:      60
lua: src/pcall_test.lua:12: attempt to perform arithmetic on local 'a' (a string value)
stack traceback:
        src/pcall_test.lua:12: in function 'sum'
        src/pcall_test.lua:26: in main chunk
        [C]: in ?

如上述代码所示，当sum函数碰到无法处理的字符串输出时，抛出了一个异常，中止了程序运行。

如果我们期望捕获这种异常，做处理，并继续运行程序，可以如下这样调用：

local f, vrf = pcall(sum, "ten", "twenty", "thirty")
if f then
    print(vrf)
else
    print("failed to call sum function:" .. vrf)
end

--output:
failed to call sum function:src/pcall_test.lua:12: attempt to perform arithmetic on local 'a' (a string value)

xpcall函数

Summary
Calls a function with a custom error handler

Prototype
ok, result = xpcall (f, err)

If an error occurs in f it is caught and the error-handler 'err' is called. Then xpcall returns false, and whatever the error handler returned.

If there is no error in f, then xpcall returns true, followed by the function results from f.

举个简单的例子:

local function err_handle(x)
    print("err_handle info:" .. x)
end

local f, res = xpcall(function ()
    return sum(10, 20, "a")
end , err_handle)
print(f, res)

--output:
err_handle info:src/pcall_test.lua:12: attempt to perform arithmetic on local 'c' (a string value)
false   nil

上面的err_handle就是定义的一个错误处理函数，当然也可以直接改成debug自带的相关函数，如下debug.traceback：

local f, res = xpcall(function ()
    return sum(10, 20, "a")
end , debug.traceback)
print(f, res)

-- output:
false   src/pcall_test.lua:12: attempt to perform arithmetic on local 'c' (a string value)
stack traceback:
        src/pcall_test.lua:12: in function <src/pcall_test.lua:11>
        (...tail calls...)
        [C]: in function 'xpcall'
        src/pcall_test.lua:41: in main chunk
        [C]: in ?

2018-04-12

关于openresty lua使用的一些tips

nginx 是多 worker 进程的模型，所以除了共享内存字典是所有 worker 进程共享之外，其他的数据都是每 worker 一份的，无论是在 init_by_lua 里面创建的全局变量，还是 Lua 模块里的状态变量。
在某个请求里面更新某个 Lua 变量，只是更新了当前处理这个请求的 nginx worker 进程里的状态，并不会影响其他的 worker 进程（除非只配置了一个 nginx worker）。
Lua VM 是每一个 nginx worker 进程一份。这些独立的 Lua VM 副本是从 nginx master 进程的 Lua VM 给 fork 出来的。而 init_by_lua 运行在 master 进程的 Lua VM 中，时间点发生在进程 fork 之前。
在共享内存字典中保存最新的数据，每个 worker 进程里通过 Lua 模块变量或者 init_by_lua 创建的全局变量追踪当前 worker 里实际使用的数据(worker需要不断同共享内存的数据进行比较并更新)。
关于上述1、2、3、4点，更多请参考:
- google group-init_by_lua中全局变量的用法
- google group-worker内多个请求共享全局变量
lua_code_cache的使用
- 关闭lua_code_cache, 则每一个请求都由一个独立的lua VM来处理。因此，通过A请求变更的lua数据(如模块变量)，不会被B请求解析到，即使只配置了一个。
- 关闭lua_code_cache的好处，对于纯lua文件(不涉及nginx解析的),在不重启nginx的情况下也能立即生效。
- 启用lua_code_cache, 则同一个worker的所有请求共享一个lua VM的数据。因此，由该worker处理的A请求变更了lua数据(如模块变量)，则会被同一个worker处理的B请求访问到。
- 生产环境强烈建议启用lua_code_cache,否则会带来较大的性能损失。
- 更多参考这里
关于lua变量共享问题
- 尽量不使用全局变量
- 如果要使用，使用模块变量
- 如果模块变量无法满足，使用共享内存或者分布式缓存
- 更多参考 lua-variable-scope、变量的共享范围
- Data Sharing within an Nginx Worker
- data-sharing-in-openresty
不应使用模块级的局部变量以及模块属性，存放任何请求级的数据。否则在 luacodecache 开启时，会造成请求间相互影响和数据竞争，产生不可预知的异常状况。
- 关于 OPENRESTY 的两三事

关于变量共享的一个最小化配置:
-- share.lua
local _M={}
local data = {}

function _M.get_value(key)
    return data[key]
end
function _M.set_value(key,value)
    data[key] = value
end

return _M

### server.conf
server {
    listen 8081;
    server_name 127.0.0.1;

    ### 通过请求A设置模块共享变量
    location = /1 {
        content_by_lua_block {
            local share = require('share')
            share.set_value('a','b')
            ngx.say(share.get_value('a'))
        }
    }

    ### 通过请求B读取共享变量
    location = /2 {
        content_by_lua_block {
            local share = require('share')
            ngx.say(share.get_value('a'))
        }
    }
}

-- init.lua
package.path = "/usr/local/Cellar/openresty/1.13.6.1/lualib/?.lua;/usr/local/etc/openresty/lua/?.lua;;";

### nginx 主配置文件部分内容
http {
    include       mime.types;
    default_type  application/octet-stream;

    log_format main '$remote_addr - [$time_local] "$request" $status '
    ' "$http_referer" "$http_user_agent" "$http_x_forwarded_for" ';

    access_log  logs/access.log  main;
    sendfile        on;
    keepalive_timeout  60;
    include server.conf;
    lua_code_cache on;
    init_by_lua_file lua/init.lua;
}

2018-04-08

Git 合并分支多个commits

多人参与开发的项目不可避免会碰到代码合并的问题。
有人实现一个功能过程中，可能会在自己的分支上提交了很多次，产生了多次提交，如：

commit 1：add guava cache
commit 2：use redis
commit 3：add unit test
commit 4：fix bug

然后提了一个merge request，这时若直接merge，则在主分支上会新增4条commit记录。
为了维护主分支的干净整齐(尤其是强迫症患者)，可以采用一些方式来合并分支上的多条commit。

# 开始开发一个新 feature
$ git checkout -b new-feature master
# 改了一些代码
$ git commit -a -m "add guava cache"
# 改一下实现
$ git commit -a -m "use redis"
$ git commit -a -m "add unit test"
$ git commit -a -m "fix bug"
 
# 紧急修复，直接在 master 分支上改点东西
$ git checkout master
# 改了一些代码
$ git commit -a -m "fix typo"
 
# 开始交互式地 rebase 了
$ git checkout new-feature
$ git rebase -i master

进入交互页面，如下：

pick 12618c4 add guava cache
pick hde761d use redis
pick iau76h1 add unit test
pick l98ax6d fix bug
 
# Commands:
#  p, pick = use commit
#  r, reword = use commit, but edit the commit message
#  e, edit = use commit, but stop for amending
#  s, squash = use commit, but meld into previous commit
#  f, fixup = like "squash", but discard this commit's log message
#  x, exec = run command (the rest of the line) using shell
#
# These lines can be re-ordered; they are executed from top to bottom.
#
# If you remove a line here THAT COMMIT WILL BE LOST.
#
# However, if you remove everything, the rebase will be aborted.
#
# Note that empty commits are commented out

如上述Commands介绍，这时可以使用squash替换pick。
比如修改成如下：

pick 12618c4 add guava cache
squash hde761d use redis
squash iau76h1 add unit test
squash l98ax6d fix bug

然后保存，并在主分支上进行merge和push，这样在主分支上就只有一条commit记录了。

参考资料:

2018-04-07

个人站点工具集锦

评论可以使用disqus，登录注册后使用Universal Code install instructions。

统计

站点阅读数、文章阅读数、站点UV等，可以使用不蒜子。

2018-03-20

Couchbase client 2.x 自定义压缩支持

Couchbase client 2.x存储对象主要以Json格式的Document为主。
为了支持N1QL查询特性，除了LegacyDocument，client内部其它定义的Document均都未支持数据压缩功能。

/**
 * This document is fully compatible with Java SDK 1.* stored documents.
 *
 * It is not compatible with other SDKs. It should be used to interact with legacy documents and code, but it is
 * recommended to switch to the unifying document types (Json* and String) if possible to guarantee better
 * interoperability in the future.
 *
 * @author Michael Nitschinger
 * @since 2.0
 */
public class LegacyDocument extends AbstractDocument<Object> {
    ...
}

如上面Java doc说明，LegacyDocument是兼容了1.x版本的Java SDK。
所以LegacyDocument势必要支持数据压缩功能，它的压缩机制是通过LegacyTranscoder实现的。

LegacyTranscoder中有两个方法：doEncode和doDecode。
顾名思义，doEncode实现了序列化编码功能，而doDecode实现了序列化解码功能。

public class LegacyTranscoder extends AbstractTranscoder<LegacyDocument, Object> {

    public static final int DEFAULT_COMPRESSION_THRESHOLD = 16384;
    
    ...

    private final int compressionThreshold;
    public LegacyTranscoder(int compressionThreshold) {
            this.compressionThreshold = compressionThreshold;
    }
    ...

    @Override
    protected Tuple2<ByteBuf, Integer> doEncode(LegacyDocument document)
        throws Exception {

        int flags = 0;
        Object content = document.content();

        boolean isJson = false;
        ByteBuf encoded;
        if (content instanceof String) {
            String c = (String) content;
            isJson = isJsonObject(c);
            encoded = TranscoderUtils.encodeStringAsUtf8(c);
        } else {
            encoded = Unpooled.buffer();

            if (content instanceof Long) {
                flags |= SPECIAL_LONG;
                encoded.writeBytes(encodeNum((Long) content, 8));
            } else if (content instanceof Integer) {
                flags |= SPECIAL_INT;
                encoded.writeBytes(encodeNum((Integer) content, 4));
            } else if (content instanceof Boolean) {
                flags |= SPECIAL_BOOLEAN;
                boolean b = (Boolean) content;
                encoded = Unpooled.buffer().writeByte(b ? '1' : '0');
            } else if (content instanceof Date) {
                flags |= SPECIAL_DATE;
                encoded.writeBytes(encodeNum(((Date) content).getTime(), 8));
            } else if (content instanceof Byte) {
                flags |= SPECIAL_BYTE;
                encoded.writeByte((Byte) content);
            } else if (content instanceof Float) {
                flags |= SPECIAL_FLOAT;
                encoded.writeBytes(encodeNum(Float.floatToRawIntBits((Float) content), 4));
            } else if (content instanceof Double) {
                flags |= SPECIAL_DOUBLE;
                encoded.writeBytes(encodeNum(Double.doubleToRawLongBits((Double) content), 8));
            } else if (content instanceof byte[]) {
                flags |= SPECIAL_BYTEARRAY;
                encoded.writeBytes((byte[]) content);
            } else {
                flags |= SERIALIZED;
                encoded.writeBytes(serialize(content));
            }
        }

        if (!isJson && encoded.readableBytes() >= compressionThreshold) {
            byte[] compressed = compress(encoded.copy().array());
            if (compressed.length < encoded.array().length) {
                encoded.clear().writeBytes(compressed);
                flags |= COMPRESSED;
            }
        }

        return Tuple.create(encoded, flags);
    }

    ...
}

这里有一个注意点:
doEncode默认是不支持JSON格式的字符串进行压缩的。
如上述代码描述的，若存储内容是一个字符串，它会优先判断是不是JSON格式的字符串，若是，则设置isJson为true，后续流程就跳过了压缩逻辑。

因此，若要支持JSON格式的字符串压缩，一种可选的方案是，使用LegacyDocument，重写LegacyTranscoder，覆盖doEncode逻辑，去掉对JSON字符串的判断处理。

此外，在调用CLUSTER.openBucket方法时，使用类似如下包含transcoders签名参数的方法，将自定义的transcoder传入。

1	Bucket openBucket(String name, List<Transcoder<? extends Document, ?>> transcoders);

另外一点:
LegacyTranscoder默认设置了压缩阈值16k，即存储内容大小达到16k以后才会压缩。这对有些使用场景来说，阈值设置太大了。
由于compressionThreshold字段是私有的，因此，若需要调整阈值，可选的办法:

继承LegacyTranscoder，在构造方法中重新给compressionThreshold赋值。
如处理压缩逻辑一样，直接继承AbstractTranscoder，重写LegacyTranscoder。

关于非JSON Document的存储，详情可以进一步参考Couchbase文档-Non-JSON Documents。

2018-03-11

Springboot获取内部属性值

为了实时检测线上服务的版本是否正确，需要通过一定的方式暴露当前运行服务的版本信息。
springboot可以很方便的通过spring-boot-actuator来暴露各种endpoint信息，很适合服务的监控。

但有时候为了兼容已有的监控或者部署工具，需要采取一些措施来支持相关信息的暴露。如Jar内部属性文件，如HTTP接口，然后监控部署服务通过检查属性文件或者调用相关服务接口来确认当前服务部署版本是否是目标版本。

内部属性文件方式
- 可以通过git-commit-id-plugin生成git.properties文件等方式。参见此文
- 或者通过maven-resources-plugin给动态给属性文件中的属性赋值。下面的配置实例可以在编译的时候，将git.commit.id.abbrev和git.build.time分别赋值给info节点的git-version属性和app-build-time属性。
- 这样就将项目的部分属性值写入到了文件中。

#application.properties
info.git-version=@git.commit.id.abbrev@
info.app-build-time=@git.build.time@ 

#pom.xml
<plugin>
    <groupId>org.apache.maven.plugins</groupId>
    <artifactId>maven-resources-plugin</artifactId>
    <version>2.6</version>
    <configuration>
        <delimiters>
            <delimiter>@</delimiter>
        </delimiters>
        <useDefaultDelimiters>false</useDefaultDelimiters>
    </configuration>
</plugin>

最终打包好的application.properties文件属性会变成如下:
info.git-version=b0145c5
info.app-build-time=20180309205538

接口方式
- 通过接口等方式暴露出去，那么首先需要在服务运行时，能够拿到这些属性值。
  1. 通过Java Properties来load属性文件。
  2. 实现Spring的 EnvironmentAware接口，在setEnvironment方法中获取属性值。
  3. 通过Spring的@Value注解。

# 第一种
private static Properties PROJECT_PROPERTIES = new Properties();
PROJECT_PROPERTIES.load(BOOT.class.getClassLoader().getResourceAsStream("application.properties"));
# 然后就能拿到相关的属性值
String version = PROJECT_PROPERTIES.getProperty("info.git-version");
String buildTime = PROJECT_PROPERTIES.getProperty("info.app-build-time");

# 第二种
public class TestAbc implements EnvironmentAware {
    private String gitVersion;
    private String buildTime;
    private long startTime = 0;

    @Override
    public void setEnvironment(Environment environment) {
        gitVersion = environment.getProperty("info.git-version");
        buildTime = environment.getProperty("info.app-build-time");
        startTime = ManagementFactory.getRuntimeMXBean().getStartTime();
    }

# 第三种
@Value("${info.git-version}")
private String gitVersion;
@Value("${info.app-build-time}")    
private String buildTime;
# 注意: 这里的info.git-version就是指前述属性文件application.properties中的属性字段
# info.app-build-time以此类推。

2018-03-11

招聘

后台开发工程师-北京

【工作职责】

负责视频相关高并发、大数据量系统服务的设计与开发，保证系统高可靠运行
对线上系统进行持续不断的优化，使之能稳定承载不断增长的高并发请求
负责对请求日志数据分析，通过机器学习、数据挖掘对线上服务评测及风险预警

【任职资格】

本科及以上学历，计算机或其相关专业，有大型互联网公司经验优先
扎实的Java基础功底，熟悉常用设计模式、Java集合框架、多线程、JVM性能调优，有大型分布式、高可用、高并发系统设计及开发经验者优先
熟悉Spring相关技术，熟悉Linux Shell常用命令，有Python或者Lua开发使用经验等优先
熟悉关系数据库，有NoSQL使用经验优先
有消息中间件、分布式服务框架、Thrift/Protobuf、Zookeeper等开发使用经验的优先
有Spark、Flink等计算框架使用经验者优先
工作认真，有条理，能力强，有责任心，对技术有强烈的兴趣，具有较强的沟通能力及团队合作精神

【工作地点】

北京市海淀区中关村爱奇艺创新大厦

【简历投递】

2018-03-11

Maven打包使用代码版本号和时间戳

为了显示区分部署代码版本，一般会在打包的时候带上SVN/Git版本号，如果是多机房部署的，还需要带上机房标签。

若代码发布在Git仓库，可以使用maven插件git-commit-id-plugin。
该插件会产生一个git.properties文件，并被包含进最终的jar文件中。
简单配置如下：

<plugin>
       <groupId>pl.project13.maven</groupId>
       <artifactId>git-commit-id-plugin</artifactId>
       <version>2.2.4</version>
       <executions>
           <execution>
               <goals>
                   <goal>revision</goal>
               </goals>
           </execution>
       </executions>
       <configuration>
           <!-- 使properties扩展到整个maven bulid 周期
           Ref: https://github.com/ktoso/maven-git-commit-id-plugin/issues/280 -->
           <injectAllReactorProjects>true</injectAllReactorProjects>
           <!--日期格式;默认值:dd.MM.yyyy '@' HH:mm:ss z;-->
           <dateFormat>yyyyMMddHHmmss</dateFormat>
           <!--,构建过程中,是否打印详细信息;默认值:false;-->
           <verbose>true</verbose>
           <!--是否生成"git.properties"文件;默认值:false;-->
           <generateGitPropertiesFile>true</generateGitPropertiesFile>
           <!-- ".git"文件路径;默认值:${project.basedir}/.git; ..表示上一级-->
           <dotGitDirectory>${project.basedir}/../.git</dotGitDirectory>
           <gitDescribe>
               <!--提交操作ID显式字符长度,最大值为:40;默认值:7;0代表特殊意义;-->
               <abbrev>7</abbrev>
               <!--构建触发时,代码有修改时(即"dirty state"),添加指定后缀;默认值:"";-->
               <dirty>-dirty</dirty>
           </gitDescribe>
       </configuration>
   </plugin>

项目module可以配置打包的finalName如下：

1
2
3

<finalName>
	project-module-${dc}-${git.commit.id.abbrev}-${git.build.time}
</finalName>

其中git.commit.id.abbre是提交Git仓库时的版本号缩写，git.build.time顾名思义是打包时间。

对应的maven打包命令如下：

1	mvn clean package -Dmaven.test.skip=true -Ddc=$dc -P $dc

若有多机房信息，为方便配置，一般会将profile配置成机房编码，这样上面的机房dc参数和项目profile参数即可以共享同一个参数值。

打包时间还可以通过另外一个maven插件build-helper-maven-plugin读取。

<plugin>
      <groupId>org.codehaus.mojo</groupId>
      <artifactId>build-helper-maven-plugin</artifactId>
      <version>3.0.0</version>
      <executions>
          <execution>
              <id>timestamp-property</id>
              <goals>
                  <goal>timestamp-property</goal>
              </goals>
          </execution>
      </executions>
      <configuration>
          <name>current.time</name>
          <pattern>yyyyMMddHHmmss</pattern>
          <timeZone>GMT+8</timeZone>
      </configuration>
  </plugin>

上述current.time字段值即是打包时间，可以被pom文件引用。

参考资料:

评论

统计

内部属性文件方式

接口方式

【工作职责】

【任职资格】

【工作地点】

【简历投递】