My works

0. 演讲 & 文章发表

DataFunTalk演讲:解密商业化广告投放平台技术架构https://zhuanlan.zhihu.com/p/93006837

知乎文章:浅谈OLAP系统核心技术点https://zhuanlan.zhihu.com/p/163236128

InfoQ:谈谈后端业务系统的微服务化改造http://www.infoq.com/cn/articles/the-back-end-business-systems-service-transformation

InfoQ:体系化认识RPChttp://www.infoq.com/cn/articles/get-to-know-rpc

知乎专栏:深入解析Spark中的RPChttps://zhuanlan.zhihu.com/p/28893155

 

1. A MySQL InnoDB Engine tool – innodb-java-reader

alibaba/innodb-java-reader (https://github.com/alibaba/innodb-java-reader) is a java implementation to access MySQL InnoDB storage engine file directly. With the library or command-line tool, it provides basic read-only features like examining pages, looking up record by primary key and generating page heat map by LSN or filling rate. This project is useful for prototyping and learning MySQL. Moreover, this can be a tool to dump table data by offloading from MySQL process. 

 

2. Meta’s Llama2 inference in Java

Meta’s Llama2 is an open-source state-of-the-art pre-trained model in scale from 7B to 70B parameters. There is a project delivered by OpenAI’s founding member Andrej Karpathy that can inference Llama2 models in one pure C file. This is super simple, minimal, and educational. This leads me to develop Llama2.java https://github.com/neoremind/llama2.java as a fun weekend project, the result shows that Java is able to achieve 95% of tok/s as the highly optimized C code (-Ofast -march=native), achieve the same performance of tok/s in normal C code (-O3), and you can inference a baby llama story series model in your laptop (Mac or Windows), and run llama2 7B fp32 model in CPU based cloud host.

 

3. Apache Calcite InnoDB Adapter

Calcite’s InnoDB adapter (https://calcite.apache.org/docs/innodb_adapter.html) allows you to query the data based on InnoDB data files directly on data files are also known as .ibd files. This adapter is different from JDBC adapter which maps a schema in a JDBC data source and requires a MySQL server to serve response, it works on pure .ibd files.

 

4. Fluent-validator

Fluent-validatorhttps://github.com/neoremind/fluent-validator): A fluent API style validation framework, annotation-based and JSR 303 support for validating input parameters in your code easily, no intrusive for your business logic code, and integrated with IoC Container Spring.

 

5. Distributed RPC framework – Navi

Navihttps://github.com/neoremind/navi)is a distributed service framework that provides cluster management and high performance RPC. With Navi, you can easily build distributed applications with minimal effort to create a highly scalable architecture capable of handling remote procedure call and service registration and discovery.

Implemented in Java and Spring framework, Navi wraps ZooKeeper and uses Protostuff/Protobuf for transport to make it easy to build a cluster aware application. Navi allows you to focus your efforts on your application logic, so programming experience is very friendly with its simple XML or annotation configuration.

Started at 2013, this framework was widely adopted within Baidu Programmatic Ads tech teams.

 

6. Protobuf-RPC

Pbrpchttps://github.com/neoremind/navi-pbrpc) provides a rpc solution for using protocol buffer. This library enables client and server to communicate in a peer-to-peer and full duplexing way.

The server-side is built upon netty which supports asynchronous and non-blocking io functionality, while the client-side provides a wide variety of options to communicate with server, which includes short live connection, keep-alive tcp connection, high availability and failover strategy.

 

7. MySQL binlog Change Data Capture tool

Fountain (https://github.com/neoremind/fountain) is a Java based toolkit for syncing and capturing MySQL binlog (INSERTION/DELETION/UPDATING) and provides an easy API to process/publish events.

 

8. Dynamic-proxy

Dynamic proxy (https://github.com/neoremind/dynamic-proxy) is a useful library for Java developers to generate proxy object. This library leverages a wide range of byte-code generation methods, including:

  • – ASM
  • – CGLIB
  • – Javassist
  • – JDK Dynamic Proxy
  • – ByteBuddy

 

9. Bean mapping tool – Easy-mapper

Easy-mapperhttps://github.com/neoremind/easy-mapper) is a simple, light-weighted, high performance java bean mapping framework. By leveraging Javassist, easy mapper can generate mapping byte-code at runtime and load them into JVM so that classes can be reused for later mapping invocations.

 

10. Spark RPC – Kraps-RPC

Kraps-rpc (https://github.com/neoremind/kraps-rpc) is a RPC framework split from Spark, you can regard it as spark-rpc with the word spark reversed.

This module is mainly for studying how RPC works in Spark, as people knows that Spark consists many distributed components, such as driver, master, executor, block manager, etc, and they communicate with each other through RPC. In Spark project the functionality is sealed in Spark-core module. Kraps-rpc separates the core RCP part from it, not including security and streaming download feature.

Document for illustrating can be found in 知乎专栏https://zhuanlan.zhihu.com/p/28893155

PR: [SPARK-21701] [CORE] Enable RPC client to use `SO_RCVBUF` and `SO_SNDBUF` in SparkConf

 

11. Long-running services in Hadoop YARN

Apache Hadoop Yarn是big data领域通用的资源管理与调度平台,很多计算框架均可以跑在Yarn上,例如Mapreduce、Spark、Flink、Storm等,这些计算框架可以专注于计算本身,Yarn提供的高度抽象的接口来做集成。除了big data以外,实际一些长服务(long time running service)也可以跑在Yarn上,这个项目就是一个探索的项目,基于底层Yarn的API操作,开发了一个Demo。 

 

12. SSHXCUTE

SSHXCUTE is a framework. It was designed to let engineers to use Java call to execute command/script on remote Linux/UNIX system through SSH connection way, which make software testing or system deployment easier and specifically to make it easier to automate software testing and system environment deployment.

 

13. IBM DeveloperWorks published articles

面向 Java开发与测试人员的远程执行Linux/UNIX系统上任务的框架

利用 Jython 与 Ajax 技术构建一个简单的 Web 应用程序

基于用户输入的 Rational Functional Tester 测试用例自动选择和执行工具