spark jupyter_在z / OS和Jupyter上运行Spark:快速,灵活地分析大型机数据
spark jupyter
许多企业面临着扩大对用户的数据处理访问权限而又不影响关键任务交易应用程序环境的需求。 解决此问题的趋势是将数据从这些记录系统移至数据仓库。 将静止数据移动到镜像数据存储库以进行分析可能会产生代价高昂的副作用,例如昂贵的迁移工作负载,数据并发性和数据安全性。 或者,通过提取静态数据以隔离感兴趣的信息以进行下游分析,可以提供最佳结果。
IBM宣布 (IBM Announces)
Today, IBM announced the release of the IBM z/OS Platform for Apache Spark. This enterprise-grade, native z/OS distribution of the open source in-memory analytics engine, enables the analyzing of business-critical z/OS data sources (such as DB2, IMS, VSAM and Adabas, among others) in place — with no data movement.
今天,IBM宣布了针对Apache Spark的IBM z / OS平台的发布。 这种开放源代码内存分析引擎的企业级本机z / OS发行版本,可以在适当位置分析关键业务z / OS数据源(例如DB2,IMS,VSAM和Adabas等)。没有数据移动。
Designed to simplify the analysis of big data, this no-charge product allows enterprises to ingest data from diverse sources, including systems of record on the z/OS platform, and apply fast and federated large-scale data processing analytics using consistent Apache Spark APIs. By co-locating the data with the distillation and analysis jobs, companies can improve the time to value for actionable insights in a non-intrusive and cost effective manner.
该免费产品旨在简化大数据的分析,使企业能够从各种来源(包括z / OS平台上的记录系统)提取数据,并使用一致的Apache Spark API应用快速联合的大规模数据处理分析。 通过将数据与蒸馏和分析工作一起定位,公司可以以非侵入性和成本有效的方式缩短实现可行见解的价值评估时间。
GitHub工具 ()
In conjunction with this product announcement, IBM has created a new GitHub organization to promote the development of an ecosystem of tools around Spark on z/OS. IBM has seeded the new organization with two GitHub repositories derived from Project Jupyter, namely the Scala Workbench and Interactive Insights Workbench. These tools are centered around a reference architecture that is aimed at bootstrapping data scientists who lack access to mainframe data stores.
结合此产品公告,IBM创建了一个新的GitHub组织,以促进围绕z / OS上Spark的工具生态系统的开发。 IBM为这个新组织提供了来自Project Jupyter的两个GitHub存储库,它们是Scala Workbench和Interactive Insights Workbench 。 这些工具以参考体系结构为中心,该参考体系旨在引导缺乏对大型机数据存储访问权限的数据科学家。
The data wrangler uses the z/OS Platform for Apache Spark to distill and analyze systems of record and then place the results into a NoSQL data repository, conforming to tidy data principles. With the complex task of munging multiple datasets already handled, data scientists can focus their energies on exploratory and predictive analytics using R or Python within a Jupyter notebook. This reference architecture offers a flexible, extensible, and interoperable solution for a variety of use cases. The open source Apache Spark distribution for z/OS, without data abstraction services, can be downloaded directly from DeveloperWorks and deployed using this installation guide.
数据管理员使用z / OS Platform for Apache Spark提取和分析记录系统,然后将结果放入符合整洁数据原理的NoSQL数据存储库中。 通过处理已经处理的多个数据集这一复杂任务,数据科学家可以在Jupyter笔记本中使用R或Python将精力集中在探索性和预测性分析上。 该参考体系结构为各种用例提供了灵活,可扩展和可互操作的解决方案。 可以直接从DeveloperWorks 下载不带数据抽象服务的z / OS的开源Apache Spark发行版,并使用此安装指南进行部署。
下一步 (Next Steps)
The full package, including data abstraction, can be ordered at no-charge from Shopz. If you’re really interested in using Spark on the mainframe but feel that you need help getting
可以从Shopz免费订购包括数据抽象在内的完整软件包。 如果您真的对在大型机上使用Spark感兴趣,但是觉得需要帮助
Client-facing, strategy and development engineer responsible for architecting, implementing and running next-generation cloud applications on Bluemix, SoftLayer and private clouds.
面向客户的战略和开发工程师,负责在Bluemix,SoftLayer和私有云上设计,实施和运行下一代云应用程序。
spark jupyter