tonglin0325 - 博客园_a56爆大奖在线娱乐

2024年5月12日

摘要： 1.查询PrestoDB（facebook版本） 1.创建PrestoDB环境使用docker创建presto测试环境 https://hub.docker.com/r/prestodb/presto/tags 拉取镜像 docker pull prestodb/presto:0.284 启动 d 阅读全文

posted @ 2024-05-12 14:55 tonglin0325 阅读(31) 评论(0) 推荐(0) 编辑

go学习笔记——wire依赖注入

摘要： wire是google开源的使用依赖注入来自动连接组件的代码生成工具安装 go install github.com/google/wire/cmd/wire@latest 官方使用文档： https://github.com/google/wire/blob/main/docs/guide.md 阅读全文

posted @ 2024-05-12 14:52 tonglin0325 阅读(48) 评论(0) 推荐(0) 编辑

2024年5月9日

go学习笔记——gin框架

摘要： gin是一款轻量级的go web开发框架，官方文档 https://gin-gonic.com/docs/examples/ 1.gin web项目结构参考 https://github.com/voyagegroup/gin-boilerplate gin+protobuf wire参考 htt 阅读全文

posted @ 2024-05-09 22:21 tonglin0325 阅读(9) 评论(0) 推荐(0) 编辑

go学习笔记——Kratos框架

摘要：官方文档 https://go-kratos.dev/en/docs/getting-started/start/ 1.安装Go 参考：mac安装go1.20 2.安装Kratos框架 kratos依赖protobuf grpc等框架，需要先进行安装 brew install grpc brew i 阅读全文

posted @ 2024-05-09 22:08 tonglin0325 阅读(182) 评论(0) 推荐(0) 编辑

go学习笔记——常用命令

摘要： 1.查找go依赖 go依赖可以去下面网站查找package https://pkg.go.dev/ 比如 https://pkg.go.dev/github.com/confluentinc/confluent-kafka-go#section-readme 2.go切换源 # 启用 Go Modu 阅读全文

posted @ 2024-05-09 10:03 tonglin0325 阅读(7) 评论(0) 推荐(0) 编辑

screen使用教程

摘要：在terminal上使用跳板机远程登录其他机器的时候，经常会因为和跳板机的连接断开而丢失会话，如下这时候可以使用screen命令来创建和恢复会话 1.创建会话 screen 或者 screen -S session_name 这时a56爆大奖在线娱乐们就进到了一个screen会话中，比如a56爆大奖在线娱乐们进到/tmp目录下 2. 阅读全文

posted @ 2024-05-09 10:01 tonglin0325 阅读(26) 评论(0) 推荐(0) 编辑

2022年7月22日

kafka学习笔记——topic配置

摘要：在创建kafka topic的时候可以添加很多配置，如下表格参考：Kafka Topic配置参数名含义值 cleanup.policy 日志清除的策略，默认为 delete。如果要使用日志压缩，就需要让策略包含 compact。需要注意的是，如果开启了 compact 策略，则客户端提交的消阅读全文

posted @ 2022-07-22 16:37 tonglin0325 阅读(682) 评论(0) 推荐(0) 编辑

2022年1月13日

HBase学习笔记——客户端API

摘要：介绍HBase的Java API，参考：HBase读写的几种方式（一）java篇和 Hbase--put、BufferedMutator、get 1.写HBase 1.单行put HTable非线程安全，切较为低效 2.客户端的写缓冲区和List<Put> 一个put操作都是一个RPC操作，只适合阅读全文

posted @ 2022-01-13 13:02 tonglin0325 阅读(237) 评论(0) 推荐(0) 编辑

2022年1月12日

Flink学习笔记——读写HBase

摘要： 1.如果是csa(Cloudera Streaming Analytics)版本的高版本HBase 可以参考Cloudera官方例子，通过引入官方提供的flink-hbase来实现 <dependency> <groupId>org.apache.flink</groupId> <artifactI 阅读全文

posted @ 2022-01-12 22:16 tonglin0325 阅读(2328) 评论(0) 推荐(0) 编辑

2021年12月26日

Hive学习笔记——metastore listener

摘要：除了使用hive hook来记录hive上用户的操作之外，还可以使用hive metastore listener来进行记录，参考： https://towardsdatascience.com/apache-hive-hooks-and-metastore-listeners-a-tale-of- 阅读全文

posted @ 2021-12-26 22:03 tonglin0325 阅读(791) 评论(0) 推荐(0) 编辑

2021年12月2日

mac下安装gradle7.3

摘要： gradle和maven类似，是一个构建工具 gradle安装和配置 1.mac安装gradle brew install gradle 或者下载gradle的二进制安装包 https://gradle.org/releases/ 然后在~/.bash_profile中配置 # gradle exp 阅读全文

posted @ 2021-12-02 15:57 tonglin0325 阅读(1486) 评论(0) 推荐(0) 编辑

2021年11月9日

Spark学习笔记——读写ScyllaDB

摘要： Scylla兼容cassandra API，a56爆大奖在线娱乐可以使用spark读写cassandra的方法来进行读写 1.查看scyllaDB对应的cassandra版本 cqlsh:my_db> SHOW VERSION [cqlsh 5.0.1 | Cassandra 3.0.8 | CQL spec 3. 阅读全文

posted @ 2021-11-09 22:01 tonglin0325 阅读(346) 评论(0) 推荐(0) 编辑

2021年9月15日

DataGrip2017.1连接Hive

摘要：在使用低版本的DataGrip的时候，还没有hive的data source，需要自行添加数据源 1.下载hive driver，如果你使用的EMR的大数据集群的话，下载地址 https://docs.aws.amazon.com/emr/latest/ReleaseGuide/HiveJDBCDr 阅读全文

posted @ 2021-09-15 14:28 tonglin0325 阅读(188) 评论(0) 推荐(0) 编辑

2021年7月7日

Kafka学习笔记——Consumer API

摘要：参考kafka官方文档，版本1.0.x http://kafka.apache.org/10/documentation.html#consumerapi 依赖，选择 Cloudera Rel 中的 1.0.1-kafka-3.1.0 <dependency> <groupId>org.apache 阅读全文

posted @ 2021-07-07 16:03 tonglin0325 阅读(124) 评论(0) 推荐(0) 编辑

2021年5月21日

zigzag编码原理

摘要：在Thrift，Protobuf和avro序列化框架中，不约而同使用了zigzag编码来对数字进行编码，从而达到减少数据传输量的目的。 zigzag算法的核心主要是去除二进制数字中的前导0，因为在绝大多数情况下，a56爆大奖在线娱乐们使用到的整数，往往是比较小的。参考：小而巧的数字压缩算法：zigzag 在avro 阅读全文

posted @ 2021-05-21 13:57 tonglin0325 阅读(132) 评论(0) 推荐(0) 编辑

2021年3月4日

存储底层数据结构对比

摘要：该文章对比了常用的一些存储底层所使用的数据结构。 1.B+树 MySQL，MongoDB的索引使用的就是B+树 B+树在多读少写（相对而言）的情境下比较有优势。 B+树的主要优点： 1.结构比较扁平，高度低（一般不超过4层），随机寻道次数少； 2.数据存储密度大，且都位于叶子节点，查询稳定，遍历方便阅读全文

posted @ 2021-03-04 16:02 tonglin0325 阅读(14) 评论(0) 推荐(0) 编辑

2021年1月22日

ElasticSearch学习笔记——插件开发

摘要：参考 https://dzone.com/articles/elasticsearch5-how-to-build-a-plugin-and-add-a-lis https://github.com/chrisshayan/es-changes-feed-plugin https://blog.cs 阅读全文

posted @ 2021-01-22 15:50 tonglin0325 阅读(1012) 评论(0) 推荐(0) 编辑

2021年1月21日

HBase学习笔记——rowkey

摘要： 1.Airbnb rowkey设计案例在Airbnb的rowkey设计案例中，使用了hash法避免了写入热点问题，其中 Event_key标识了一条日志的唯一性，用于将来自Kafka的日志数据进行去重； Shard_id是将Event_key进行hash（可以参考es的路由哈希算法Hashing. 阅读全文

posted @ 2021-01-21 10:48 tonglin0325 阅读(283) 评论(0) 推荐(0) 编辑

Hive学习笔记——fetch

摘要：在美团点评的文章中，介绍了HiveSQL转化为MapReduce的过程 1、Antlr定义SQL的语法规则，完成SQL词法，语法解析，将SQL转化为抽象语法树AST Tree 2、遍历AST Tree，抽象出查询的基本组成单元QueryBlock 3、遍历QueryBlock，翻译为执行操作树Ope 阅读全文

posted @ 2021-01-21 00:19 tonglin0325 阅读(315) 评论(0) 推荐(0) 编辑

2021年1月7日

ElasticSearch学习笔记——ik分词添加词库

摘要：前置条件是安装ik分词，请参考 Elasticsearch学习笔记——分词 1.在ik分词的config下添加词库文件 ~/software/apache/elasticsearch-6.2.4/config/analysis-ik$ ls | grep mydic.dic mydic.dic 内容阅读全文

posted @ 2021-01-07 15:52 tonglin0325 阅读(722) 评论(0) 推荐(0) 编辑

2020年12月16日

Flink学习笔记——用户自定义Functions

摘要： Flink支持用户自定义 Functions，方法有2个 Ref https://ci.apache.org/projects/flink/flink-docs-release-1.12/zh/dev/user_defined_functions.html 1. 实现 MapFunction接口 c 阅读全文

posted @ 2020-12-16 17:28 tonglin0325 阅读(377) 评论(0) 推荐(0) 编辑

2020年12月14日

Flink学习笔记——Execution Mode

摘要： Flink有3中运行模式，分别是STREAMING，BATCH和AUTOMATIC Ref https://ci.apache.org/projects/flink/flink-docs-release-1.12/zh/dev/datastream_execution_mode.html 1.STR 阅读全文

posted @ 2020-12-14 16:27 tonglin0325 阅读(1681) 评论(0) 推荐(0) 编辑

2020年12月11日

Flink学习笔记——DataSet API

摘要： Flink中的DataSet任务用于实现data sets的转换，data set通常是固定的数据源，比如可读文件，或者本地集合等。 Ref https://ci.apache.org/projects/flink/flink-docs-release-1.12/zh/dev/batch/ 使用Da 阅读全文

posted @ 2020-12-11 17:43 tonglin0325 阅读(313) 评论(0) 推荐(0) 编辑

Flink学习笔记——DataStream API

摘要： Flink中的DataStream任务用于实现data streams的转换，data stream可以来自不同的数据源，比如消息队列，socket，文件等。 Ref https://ci.apache.org/projects/flink/flink-docs-stable/zh/dev/data 阅读全文

posted @ 2020-12-11 17:35 tonglin0325 阅读(292) 评论(0) 推荐(0) 编辑

2020年12月10日

Flink学习笔记——Environment

摘要： Flink有以下几种Environment 1. 批处理Environment，ExecutionEnvironment ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment(); 2.流处理Environme 阅读全文

posted @ 2020-12-10 20:06 tonglin0325 阅读(1551) 评论(0) 推荐(0) 编辑

Flink学习笔记——配置

摘要：在Flink任务中，需要加载外置配置参数到任务中，在Flink的开发文档中介绍了，Flink提供了一个名为 ParameterTool 的工具来解决这个问题 Flink开发文档: https://github.com/apache/flink/blob/master/docs/dev/applica 阅读全文

posted @ 2020-12-10 14:57 tonglin0325 阅读(995) 评论(0) 推荐(1) 编辑

2020年12月8日

论文阅读——Twitter日志系统

摘要： 1.业界公司数据平台建设规模 1.twitter Twitter关于日志系统的论文有如下2篇，分别是《The Unified Logging Infrastructure for Data Analytics at Twitter》和《Scaling Big Data Mining Infrast 阅读全文

posted @ 2020-12-08 19:39 tonglin0325 阅读(320) 评论(0) 推荐(0) 编辑

2020年12月1日

SpringBoot学习笔记——Redis Template

摘要： Springboot可以通过redis template和redis进行交互，使用方法如下可以参考这个系列的文章：【快学springboot】11.整合redis实现session共享【快学springboot】13.操作redis之String数据结构【快学springboot】14.操作阅读全文

posted @ 2020-12-01 13:49 tonglin0325 阅读(283) 评论(0) 推荐(0) 编辑

2020年11月20日

Hadoop学习笔记——配置文件

摘要：下载hadoop的原生版本，版本选择2.6.0，下载地址 https://archive.apache.org/dist/hadoop/common/hadoop-2.6.0/hadoop-2.6.0.tar.gz 解压后可以看到其中配置文件在 /etc/hadoop目录下解压后默认的配置文件都阅读全文

posted @ 2020-11-20 17:30 tonglin0325 阅读(385) 评论(0) 推荐(0) 编辑

2020年11月17日

Hive学习笔记——SerDe

摘要： SerDe 是Serializer 和 Deserializer 的简称，它提供了Hive和各种数据格式交互的方式。 Amazon的Athena可以理解是Amazon对标hive的一款产品，其中对SerDe的介绍如下 https://docs.aws.amazon.com/zh_cn/athena/ 阅读全文

posted @ 2020-11-17 11:04 tonglin0325 阅读(542) 评论(0) 推荐(0) 编辑

2020年11月16日

MapReduce中的OutputFormat

摘要： OutputFormat在hadoop源码中是一个抽象类 public abstract class OutputFormat<K, V>，其定义了reduce任务的输出格式 https://github.com/apache/hadoop/blob/master/hadoop-mapreduce- 阅读全文

posted @ 2020-11-16 14:54 tonglin0325 阅读(236) 评论(0) 推荐(0) 编辑

2020年10月10日

Filebeat的http endpoint input

摘要： Filebeat的input终于支持了http，可以使用post请求向filebeat的input传输数据，不过现在还是处于beta版本参考 https://www.elastic.co/guide/en/beats/filebeat/7.x/filebeat-input-http_endpoin 阅读全文

posted @ 2020-10-10 18:42 tonglin0325 阅读(1417) 评论(0) 推荐(0) 编辑

maven打包scala+java工程

摘要：在 scala和java混合编程的时候，需要添加一些额外的配置到pom中，才能将scala文件的class加到最终的jar中 <build> <pluginManagement> <plugins> <plugin> <groupId>org.scala-tools</groupId> <artif 阅读全文

posted @ 2020-10-10 10:25 tonglin0325 阅读(1101) 评论(0) 推荐(0) 编辑

2020年10月9日

使用thrift的java client调用python server

摘要：参考：Thrift 连接 Java 与 Python，附 Java 通用工厂方法上面这篇文章的例子是使用java client调用python server中的helloString方法来打印client传输过去的字符串 thrift文件，hello.thrift service Hello { 阅读全文

posted @ 2020-10-09 13:47 tonglin0325 阅读(611) 评论(0) 推荐(0) 编辑

2020年9月30日

MapReduce中的InputFormat

摘要： InputFormat在hadoop源码中是一个抽象类 public abstract class InputFormat<K, V> https://github.com/apache/hadoop/blob/master/hadoop-mapreduce-project/hadoop-mapre 阅读全文

posted @ 2020-09-30 11:31 tonglin0325 阅读(360) 评论(0) 推荐(0) 编辑

2020年9月17日

Ubuntu16.04安装protobuf

摘要： 1.proto2 1.protobuf的github地址 https://github.com/protocolbuffers/protobuf 去releases下载需要的版本 https://github.com/protocolbuffers/protobuf/releases 选择2.5.0 阅读全文

posted @ 2020-09-17 15:39 tonglin0325 阅读(1715) 评论(0) 推荐(0) 编辑

aws s3原理和常用命令

摘要： 1.概念 Amazon s3全称Amazon Simple Storage Service，是一个对象存储，不是一个file system，a56爆大奖在线娱乐在使用s3的时候，list dir会很慢 kv存储：从零开始写KV数据库：基于哈希索引比如如下的s3路径 s3://BucketName/Project/ 阅读全文

posted @ 2020-09-17 10:51 tonglin0325 阅读(1363) 评论(0) 推荐(1) 编辑

2020年9月13日

Ubuntu16.04安装openldap和phpldapadmin

摘要：安装openldap，参考： https://www.alibabacloud.com/blog/how-to-install-openldap-and-phpldapadmin-on-ubuntu-16-04_594318 /hzw97/p/11592 阅读全文

posted @ 2020-09-13 14:26 tonglin0325 阅读(1263) 评论(0) 推荐(0) 编辑

2020年8月18日

JVM调优常用命令

摘要： 1.查看java进程，jps命令可以列出正在运行的虚拟机进程 jps -l 1005373 sun.tools.jps.Jps 1000153 org.apache.flume.node.Application 2.查看flume进程java虚拟机的统计信息 jstat -gcutil 102847 阅读全文

posted @ 2020-08-18 16:54 tonglin0325 阅读(458) 评论(0) 推荐(0) 编辑

2020年8月14日

Nexus上传jar包

摘要：添加maven proxy 比如中央仓库 https://repo1.maven.org/maven2/ 比如cloudera的仓库 https://repository.cloudera.com/artifactory/cloudera-repos maven-central maven-clou 阅读全文

posted @ 2020-08-14 15:06 tonglin0325 阅读(173) 评论(0) 推荐(0) 编辑

tonglin0325.github.io

公告