Discuz! Board

 找回密码
 立即注册
搜索
热搜: 活动 交友 discuz
查看: 63|回复: 0

大数据开发知识汇总

[复制链接]

1万

主题

1万

帖子

5万

积分

管理员

Rank: 9Rank: 9Rank: 9

积分
58026
发表于 2020-10-20 11:42:43 | 显示全部楼层 |阅读模式

                    

                    

                    
                    
                    <p><strong><span style="color: rgb(255, 79, 121);"></span></strong></p><p style="white-space: normal;"><strong><span style="color: rgb(0, 209, 0);"></span></strong></p><p><strong><span style="color: rgb(0, 209, 0);">大数据开发需要学习的知识很多,目前基本包括如下知识Linux、Zebra、Hadoop、Flume、Hive、Hbase、Phoenix、Storm、Kafka、Scala、Spark</span></strong><br  /></p><p><img data-s="300,640" data-type="jpeg" src="http://mmbiz.qpic.cn/mmbiz_jpg/LKgI6UN8ElYFyKjMouXLEicn9Int2CLCUEvZibzialMJBrP0VoUaicIqia0BXR033VqIRVibld2CCM33sRxwq48DiclxA/0?wx_fmt=jpeg" data-copyright="0" class="" data-ratio="0.672463768115942" data-w="690" style="white-space: normal;"  /></p><p><strong><span style="color: rgb(255, 79, 121);">Linux:</span></strong><span style="color: rgb(51, 51, 51);font-family: arial, 宋体, sans-serif;font-size: 14px;text-indent: 28px;background-color: rgb(255, 255, 255);"></span></p><p><span style="color: rgb(51, 51, 51);font-family: arial, 宋体, sans-serif;font-size: 14px;text-indent: 28px;background-color: rgb(255, 255, 255);">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: rgb(0, 82, 255);">Linux是一套免费使用和自由传播的类Unix操作系统,是一个基于POSIX和UNIX的多用户、多任务、支持多线程和多CPU的操作系统。它能运行主要的UNIX工具软件、应用程序和网络协议。它支持32位和64位硬件。Linux继承了Unix以网络为核心的设计思想,是一个性能稳定的多用户网络<strong>操作系统</strong>。</span></p><p><span style="color: rgb(0, 82, 255);"><br  /></span></p><p><span style="color: rgb(255, 79, 121);">Zebra:</span></p><p><span style="color: rgb(0, 82, 255);">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;是早期用来处理大量数据的<strong>一个项目</strong>,该项目有多台一级引擎,每个一级引擎负责一部分数据,每个一级引擎对自己负责的那部分数据进行处理,统一发往二级引擎进行汇总的操作,最终二级引擎将数据存储到关系型数据库中。是下面知识的基础。</span></p><p><span style="color: rgb(0, 82, 255);"><br  /></span></p><p><span style="color: rgb(255, 79, 121);">Hadoop:</span></p><p><span style="color: rgb(0, 82, 255);">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Hadoop实现了一个分布式文件系统(Hadoop Distributed File System),简称HDFS。</span></p><p><span style="color: rgb(0, 82, 255);">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;HDFS有高容错性的特点,并且设计用来部署在低廉的(low-cost)硬件上;而且它提供高吞吐量(high throughput)来访问应用程序的数据,适合那些有着超大数据集(large data set)的应用程序。</span></p><p><span style="color: rgb(0, 82, 255);">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;HDFS放宽了(relax)POSIX的要求,可以以流的形式访问(streaming access)文件系统中的数据。</span></p><p><span style="color: rgb(0, 82, 255);">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Hadoop的框架最核心的设计就是:HDFS和MapReduce。HDFS为海量的数据提供了存储,而MapReduce为海量的数据提供了计算。</span></p><p><span style="color: rgb(0, 82, 255);"><br  /></span></p><p><span style="color: rgb(255, 79, 121);">Flume:</span></p><p><span style="color: rgb(0, 82, 255);">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Flume是Cloudera提供的一个高可用的,高可靠的,分布式的<strong>海量日志采集、聚合和传输的系统</strong>,Flume支持在日志系统中定制各类数据发送方,用于收集数据;同时,Flume提供对数据进行简单处理,并写到各种数据接受方(可定制)的能力。</span></p><p><span style="color: rgb(0, 82, 255);">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;当前Flume有两个版本Flume 0.9X版本的统称Flume-og,Flume1.X版本的统称Flume-ng。由于Flume-ng经过重大重构,与Flume-og有很大不同,使用时请注意区分。</span></p><p><span style="color: rgb(0, 82, 255);"><br  /></span></p><p><span style="color: rgb(255, 79, 121);">Hive:</span></p><p><span style="color: rgb(0, 82, 255);">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;hive是基于Hadoop的一个</span><strong><span style="color: rgb(0, 82, 255);">数据仓库</span><span style="color: rgb(0, 82, 255);">工具</span></strong><span style="color: rgb(0, 82, 255);">,可以将结构化的数据文件映射为一张数据库表,并提供简单的sql查询功能,可以将sql语句转换为MapReduce任务进行运行。 其优点是学习成本低,可以通过类SQL语句快速实现简单的MapReduce统计,不必开发专门的MapReduce应用,十分适合</span><span style="color: rgb(0, 82, 255);">数据仓库</span><span style="color: rgb(0, 82, 255);">的统计分析。</span></p><p><span style="color: rgb(0, 82, 255);"><br  /></span></p><p><span style="color: rgb(255, 79, 121);">Hbase:</span></p><p><span style="color: rgb(0, 82, 255);">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;HBase是一个分布式的、面向列的开源<strong>数据库</strong>,该技术来源于 Fay Chang 所撰写的Google论文“Bigtable:一个结构化数据的</span><span style="color: rgb(0, 82, 255);">分布式存储系统</span>”<span style="color: rgb(0, 82, 255);">。就像Bigtable利用了Google文件系统(File System)所提供的分布式数据存储一样,HBase在Hadoop之上提供了类似于Bigtable的能力。HBase是Apache的Hadoop项目的子项目。HBase不同于一般的关系数据库,它是一个适合于非结构化数据存储的数据库。另一个不同的是HBase基于列的而不是基于行的模式。</span></p><p><span style="color: rgb(0, 82, 255);"><br  /></span></p><p><span style="color: rgb(255, 79, 121);">Phoenix:</span></p><p><trans src="Apache Phoenix enables OLTP and operational analytics in Hadoop for low latency applications by combining the best of both worlds:" style="background: transparent;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</trans><trans src="Apache Phoenix enables OLTP and operational analytics in Hadoop for low latency applications by combining the best of both worlds:"></trans><trans src="Apache Phoenix enables OLTP and operational analytics in Hadoop for low latency applications by combining the best of both worlds:"><span style="color: rgb(0, 82, 255);">Apache的Phoenix使OLTP(联机事务处理过程)和Hadoop运行分析这些低延迟应用提供了以下两个优点:</span></trans></p><ul style="margin-left: 25px;" class=" list-paddingleft-2"><li><trans src="the power of standard SQL and JDBC APIs with full ACID transaction capabilities and"><span style="color: rgb(0, 82, 255);">使标准SQL和JDBC APIs能够具备完全的ACID(指数据库事务正确执行的四个基本要素的缩写)事务处理能力。</span></trans></li><li><trans src="the flexibility of late-bound, schema-on-read capabilities from the NoSQL world by leveraging HBase as its backing store"><span style="color: rgb(0, 82, 255);">使后期绑定和从以HBase为后台存储的NoSQL云数据库进行schema-on-read(读时模式)具备灵活性。</span></trans></li></ul><p><trans src="Apache Phoenix is fully integrated with other Hadoop products such as Spark, Hive, Pig, Flume, and Map Reduce."><span style="color: rgb(0, 82, 255);">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Apache Phoenix与其他Hadoop产品如&nbsp;Spark, Hive, Pig, Flume, and Map Reduce完全兼容。</span></trans><trans src="Apache Phoenix is fully integrated with other Hadoop products such as Spark, Hive, Pig, Flume, and Map Reduce."></trans><trans src="Apache Phoenix is fully integrated with other Hadoop products such as Spark, Hive, Pig, Flume, and Map Reduce." style="background: rgb(194, 217, 233);"></trans></p><p><trans src="Apache Phoenix is fully integrated with other Hadoop products such as Spark, Hive, Pig, Flume, and Map Reduce."><span style="color: rgb(0, 82, 255);"><br  /></span></trans></p><p><span style="color: rgb(255, 79, 121);">Storm:</span></p><p><trans src="Apache Phoenix is fully integrated with other Hadoop products such as Spark, Hive, Pig, Flume, and Map Reduce."><span style="color: rgb(0, 82, 255);">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;是Twitter开源的分布式实时<strong>大数据处理框架</strong>,最早开源于github,从0.9.1版本之后,归于Apache社区,被业界称为实时版Hadoop。随着越来越多的场景对Hadoop的MapReduce高延迟无法容忍,比如网站统计、推荐系统、预警系统、金融系统(高频交易、股票)等等,大数据实时处理解决方案(流计算)的应用日趋广泛,目前已是分布式技术领域最新爆发点,而Storm更是流计算技术中的佼佼者和主流。</span></trans></p><p><trans src="Apache Phoenix is fully integrated with other Hadoop products such as Spark, Hive, Pig, Flume, and Map Reduce."><span style="color: rgb(0, 82, 255);"><br  /></span></trans></p><p><span style="color: rgb(255, 79, 121);">Kafka:</span></p><p><trans src="Apache Phoenix is fully integrated with other Hadoop products such as Spark, Hive, Pig, Flume, and Map Reduce."><span style="color: rgb(0, 82, 255);">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;是一种高吞吐量的</span></trans><trans src="Apache Phoenix is fully integrated with other Hadoop products such as Spark, Hive, Pig, Flume, and Map Reduce."><span style="color: rgb(0, 82, 255);">分布式</span></trans><trans src="Apache Phoenix is fully integrated with other Hadoop products such as Spark, Hive, Pig, Flume, and Map Reduce."><span style="color: rgb(0, 82, 255);">发布<strong>订阅消息系统</strong>,它可以处理消费者规模的网站中的所有动作流数据。 这种动作(网页浏览,搜索和其他用户的行动)是在现代网络上的许多社会功能的一个关键因素。 这些数据通常是由于吞吐量的要求而通过处理日志和日志聚合来解决。 对于像</span></trans><trans src="Apache Phoenix is fully integrated with other Hadoop products such as Spark, Hive, Pig, Flume, and Map Reduce."><span style="color: rgb(0, 82, 255);">Hadoop</span></trans><trans src="Apache Phoenix is fully integrated with other Hadoop products such as Spark, Hive, Pig, Flume, and Map Reduce."><span style="color: rgb(0, 82, 255);">的一样的</span></trans><trans src="Apache Phoenix is fully integrated with other Hadoop products such as Spark, Hive, Pig, Flume, and Map Reduce."><span style="color: rgb(0, 82, 255);">日志</span></trans><trans src="Apache Phoenix is fully integrated with other Hadoop products such as Spark, Hive, Pig, Flume, and Map Reduce."><span style="color: rgb(0, 82, 255);">数据和离线分析系统,但又要求实时处理的限制,这是一个可行的解决方案。Kafka的目的是通过</span></trans><trans src="Apache Phoenix is fully integrated with other Hadoop products such as Spark, Hive, Pig, Flume, and Map Reduce."><span style="color: rgb(0, 82, 255);">Hadoop</span></trans><trans src="Apache Phoenix is fully integrated with other Hadoop products such as Spark, Hive, Pig, Flume, and Map Reduce."><span style="color: rgb(0, 82, 255);">的并行加载机制来统一线上和离线的消息处理,也是为了通过</span></trans><trans src="Apache Phoenix is fully integrated with other Hadoop products such as Spark, Hive, Pig, Flume, and Map Reduce."><span style="color: rgb(0, 82, 255);">集群</span></trans><trans src="Apache Phoenix is fully integrated with other Hadoop products such as Spark, Hive, Pig, Flume, and Map Reduce."><span style="color: rgb(0, 82, 255);">来提供实时的消费。</span></trans></p><p><br  /></p><p><span style="color: rgb(255, 79, 121);">Scala:</span></p><p><trans src="Apache Phoenix is fully integrated with other Hadoop products such as Spark, Hive, Pig, Flume, and Map Reduce."><span style="color: rgb(0, 82, 255);">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;是一门多范式的<strong>编程语言</strong>,一种类似</span></trans><trans src="Apache Phoenix is fully integrated with other Hadoop products such as Spark, Hive, Pig, Flume, and Map Reduce."><span style="color: rgb(0, 82, 255);">java</span></trans><trans src="Apache Phoenix is fully integrated with other Hadoop products such as Spark, Hive, Pig, Flume, and Map Reduce."><span style="color: rgb(0, 82, 255);">的编程语言,设计初衷是实现可伸缩的语言&nbsp;、并集成</span></trans><trans src="Apache Phoenix is fully integrated with other Hadoop products such as Spark, Hive, Pig, Flume, and Map Reduce."><span style="color: rgb(0, 82, 255);">面向对象编程</span></trans><trans src="Apache Phoenix is fully integrated with other Hadoop products such as Spark, Hive, Pig, Flume, and Map Reduce."><span style="color: rgb(0, 82, 255);">和</span></trans><trans src="Apache Phoenix is fully integrated with other Hadoop products such as Spark, Hive, Pig, Flume, and Map Reduce."><span style="color: rgb(0, 82, 255);">函数式编程</span></trans><trans src="Apache Phoenix is fully integrated with other Hadoop products such as Spark, Hive, Pig, Flume, and Map Reduce."><span style="color: rgb(0, 82, 255);">的各种特性。</span></trans>scala设计语法非常简洁,代码量非常少。</p><p><trans src="Apache Phoenix is fully integrated with other Hadoop products such as Spark, Hive, Pig, Flume, and Map Reduce."><span style="color: rgb(0, 82, 255);"></span></trans><br  /></p><p><br  /></p><p><span style="color: rgb(255, 79, 121);">Spark:</span></p><p style="margin-left:24px;"><span style="font-family:Calibri;font-size:14px;"></span></p><p><trans src="Apache Phoenix is fully integrated with other Hadoop products such as Spark, Hive, Pig, Flume, and Map Reduce."><span style="color: rgb(0, 82, 255);">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Spark没有直接执行代码,构建DAG图,根据DAG图开始分配task,分发到各个work进行执行。Spark内部把命令分成两类,transformation/action。遇到action命令时,才会把前面的命令一起来执行。批量执行,吞吐量高!</span></trans></p><p><br  /></p><p><trans src="Apache Phoenix is fully integrated with other Hadoop products such as Spark, Hive, Pig, Flume, and Map Reduce."><span style="color: rgb(0, 82, 255);">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<strong>&nbsp;Spark是Hadoop升级</strong>,互相补充。部分替代。Hadoop主要应用于离线处理,Spark相对实时处理(秒级别),Storm真正实时(亚秒级别)。</span></trans></p><p><trans src="Apache Phoenix is fully integrated with other Hadoop products such as Spark, Hive, Pig, Flume, and Map Reduce."><span style="color: rgb(0, 82, 255);"><br  /></span></trans></p><p><span style="font-family:宋体;font-size:14px;"><span style="font-family:宋体;"></span></span></p><hr style="max-width: 100%;color: rgb(62, 62, 62);font-size: 16px;white-space: normal;background-color: rgb(255, 255, 255);box-sizing: border-box !important;word-wrap: break-word !important;"  /><hr style="max-width: 100%;color: rgb(62, 62, 62);font-size: 16px;white-space: normal;background-color: rgb(255, 255, 255);box-sizing: border-box !important;word-wrap: break-word !important;"  /><hr style="max-width: 100%;color: rgb(62, 62, 62);font-size: 16px;white-space: normal;background-color: rgb(255, 255, 255);box-sizing: border-box !important;word-wrap: break-word !important;"  /><hr style="max-width: 100%;color: rgb(62, 62, 62);font-size: 16px;white-space: normal;background-color: rgb(255, 255, 255);box-sizing: border-box !important;word-wrap: break-word !important;"  /><hr style="max-width: 100%;color: rgb(62, 62, 62);font-size: 16px;white-space: normal;background-color: rgb(255, 255, 255);box-sizing: border-box !important;word-wrap: break-word !important;"  /><hr style="max-width: 100%;color: rgb(62, 62, 62);font-size: 16px;white-space: normal;background-color: rgb(255, 255, 255);box-sizing: border-box !important;word-wrap: break-word !important;"  /><hr style="max-width: 100%;color: rgb(62, 62, 62);font-size: 16px;white-space: normal;background-color: rgb(255, 255, 255);box-sizing: border-box !important;word-wrap: break-word !important;"  /><hr style="max-width: 100%;color: rgb(62, 62, 62);font-size: 16px;white-space: normal;background-color: rgb(255, 255, 255);box-sizing: border-box !important;word-wrap: break-word !important;"  /><hr style="max-width: 100%;color: rgb(62, 62, 62);font-size: 16px;white-space: normal;background-color: rgb(255, 255, 255);box-sizing: border-box !important;word-wrap: break-word !important;"  /><hr style="max-width: 100%;color: rgb(62, 62, 62);font-size: 16px;white-space: normal;background-color: rgb(255, 255, 255);box-sizing: border-box !important;word-wrap: break-word !important;"  /><hr style="max-width: 100%;color: rgb(62, 62, 62);font-size: 16px;white-space: normal;background-color: rgb(255, 255, 255);box-sizing: border-box !important;word-wrap: break-word !important;"  /><hr style="max-width: 100%;color: rgb(62, 62, 62);font-size: 16px;white-space: normal;background-color: rgb(255, 255, 255);box-sizing: border-box !important;word-wrap: break-word !important;"  /><hr style="max-width: 100%;color: rgb(62, 62, 62);font-size: 16px;white-space: normal;background-color: rgb(255, 255, 255);box-sizing: border-box !important;word-wrap: break-word !important;"  /><hr style="max-width: 100%;color: rgb(62, 62, 62);font-size: 16px;white-space: normal;background-color: rgb(255, 255, 255);box-sizing: border-box !important;word-wrap: break-word !important;"  /><hr style="max-width: 100%;color: rgb(62, 62, 62);font-size: 16px;white-space: normal;background-color: rgb(255, 255, 255);box-sizing: border-box !important;word-wrap: break-word !important;"  /><hr style="max-width: 100%;color: rgb(62, 62, 62);font-size: 16px;white-space: normal;background-color: rgb(255, 255, 255);box-sizing: border-box !important;word-wrap: break-word !important;"  /><p style="max-width: 100%;min-height: 1em;color: rgb(62, 62, 62);font-size: 16px;white-space: normal;background-color: rgb(255, 255, 255);box-sizing: border-box !important;word-wrap: break-word !important;"><span style="max-width: 100%;line-height: 25.6px;color: rgb(0, 128, 255);box-sizing: border-box !important;word-wrap: break-word !important;">请扫描下面二维码,关注该微信公众号,获取更多</span><span style="max-width: 100%;line-height: 25.6px;color: rgb(255, 41, 65);box-sizing: border-box !important;word-wrap: break-word !important;"><strong style="max-width: 100%;box-sizing: border-box !important;word-wrap: break-word !important;">生物医学工程</strong>专业及<strong style="max-width: 100%;box-sizing: border-box !important;word-wrap: break-word !important;">医工</strong><span style="max-width: 100%;line-height: 25.6px;color: rgb(0, 128, 255);box-sizing: border-box !important;word-wrap: break-word !important;">学习笔记</span>:</span><span style="max-width: 100%;color: rgb(0, 0, 0);box-sizing: border-box !important;word-wrap: break-word !important;"></span></p><p style="max-width: 100%;min-height: 1em;color: rgb(62, 62, 62);font-size: 16px;white-space: normal;line-height: 25.6px;background-color: rgb(255, 255, 255);box-sizing: border-box !important;word-wrap: break-word !important;"><span style="max-width: 100%;line-height: 25.6px;color: rgb(255, 41, 65);box-sizing: border-box !important;word-wrap: break-word !important;"></span></p><p style="max-width: 100%;min-height: 1em;color: rgb(62, 62, 62);font-size: 16px;white-space: normal;line-height: 25.6px;text-align: center;background-color: rgb(255, 255, 255);box-sizing: border-box !important;word-wrap: break-word !important;"><img class="" data-ratio="1" data-s="300,640" src="http://mmbiz.qpic.cn/mmbiz_jpg/LKgI6UN8ElZnh0jXF3PjbBz7BBPnYH7wbDgfc3E2r7REMuN2lQqfuVM38HZO8Zx0rN0OLKywrjoH3P7QqDY44Q/640?wx_fmt=jpeg" data-type="jpeg" data-w="430" style="box-sizing: border-box !important;word-wrap: break-word !important;visibility: visible !important;width: auto !important;" width="auto"  /></p><p><br  /></p>
               
回复

使用道具 举报

您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

Archiver|手机版|小黑屋|Comsenz Inc. ( 浙ICP备17000336号-1 )

GMT+8, 2025-3-11 04:20 , Processed in 0.072871 second(s), 33 queries .

Powered by Discuz! X3.4

© 2001-2017 Comsenz Inc.

快速回复 返回顶部 返回列表