/
Big Data Open Source Software Big Data Open Source Software

Big Data Open Source Software - PowerPoint Presentation

lindy-dunigan
lindy-dunigan . @lindy-dunigan
Follow
392 views
Uploaded On 2017-03-31

Big Data Open Source Software - PPT Presentation

and Projects ABDS in Summary IV Layer 5 Part 2   Data Science Curriculum March 5 2015 Geoffrey Fox gcfindianaedu httpwwwinfomallorg School of Informatics and Computing ID: 531875

storage cloud vmware google cloud storage google vmware service integrated code project http releaseopenstack azure amazon security dns compute

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Big Data Open Source Software" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Big Data Open Source Software and ProjectsABDS in Summary IV: Layer 5 Part 2 

Data Science CurriculumMarch 5 2015

Geoffrey Fox

gcf@indiana.edu http://www.infomall.orgSchool of Informatics and ComputingDigital Science CenterIndiana University BloomingtonSlide2

Functionality of 21 HPC-ABDS Layers

Message Protocols:Distributed Coordination:Security & Privacy:Monitoring:

IaaS Management from HPC to hypervisors

: Part 2DevOps: Interoperability:File systems: Cluster Resource Management: Data Transport: A) File managementB) NoSQL

C) SQL

In-memory

databases&caches / Object-relational mapping / Extraction ToolsInter process communication Collectives, point-to-point, publish-subscribe, MPI:A) Basic Programming model and runtime, SPMD, MapReduce:B) Streaming:A) High level Programming: B) Application Hosting FrameworksApplication and Analytics: Workflow-Orchestration:

Here are 21 functionalities. (including 11, 14, 15 subparts)

4 Cross cutting at top

17 in order of layered diagram starting at bottomSlide3

OpenStack SubProjects March 2015

16 OpenStack Capabilities http://www.openstack.org/software/roadmap/ OpenStack Compute (code-name Nova) - integrated project since Austin release

OpenStack Networking (code-name Neutron) - integrated project since Folsom releaseOpenStack Object Storage (code-name Swift) - integrated project since Austin releaseOpenStack Block Storage (code-name Cinder) - integrated project since Folsom release

OpenStack Identity (code-name Keystone) - integrated project since Essex releaseOpenStack Image Service (code-name Glance) - integrated project since Bexar releaseOpenStack Dashboard (code-name Horizon) - integrated project since Essex releaseOpenStack Telemetry (code-name Ceilometer) - integrated project since the Havana releaseOpenStack Orchestration (code-name Heat) - integrated project since the Havana releaseOpenStack Database (code-name Trove) - integrated project since the Icehouse releaseOpenStack Data Processing (code-name Sahara) - integrated project since the Juno releaseNew capabilities under development for Juno release and beyond:Bare Metal (Ironic)Queue Service (Zaqar

)

Shared file system (Manila)

DNS Service (Designate)Key Management (Barbican)Slide4

FutureGridIaaS request popularity by yearSlide5

OpenNebulahttp://en.wikipedia.org/wiki/OpenNebula

http://opennebula.org/ Apache License.OpenNebula orchestrates storage, network, virtualization, monitoring, and security technologies to deploy multi-tier services (e.g. compute clusters) as virtual machines on distributed infrastructures, combining both data center resources and remote cloud resources, according to allocation policies

The toolkit includes features for integration, management, scalability, security and accounting. It also claims standardization, interoperability and portability, providing cloud users and administrators with a choice of several cloud interfaces (Amazon EC2 Query, OGF Open Cloud Computing Interface and vCloud) and hypervisors (Xen, KVM and VMware), and can accommodate multiple hardware and software combinations in a data

centerGood system which strongly promoted in Europe but little used in USA where eclipsed by OpenStackSlide6

CoreOShttp://en.wikipedia.org/wiki/CoreOS

https://coreos.com/Open Source Linux distribution aimed at DockerCoreOS is a fork of Chrome OS, by the means of using its software development kit (SDK) freely available through Chromium OS as a base while adding new functionality and customizing it to support hardware used in serversCoreOS

is an open source lightweight operating system based on the Linux kernel and designed for providing infrastructure to clustered deployments, while focusing on automation, ease of applications deployment, security, reliability and scalability. As an operating system, CoreOS provides only the minimal functionality required for deploying applications inside software containers, together with built-in mechanisms for service discovery and configuration sharingSlide7

VMware vCloud, ESX, ESXi

VMware ESX http://en.wikipedia.org/wiki/VMware_ESX is an enterprise-level computer virtualization product offered by VMware. ESX is a component of VMware's larger offering, VMware Infrastructure, which adds management and reliability services to the core server product. VMware recommends that deployments running the earlier ESX architecture migrate to the newer ESXi hypervisor architecture.

VMware ESX and ESXi are VMware's enterprise software Type 1 hypervisors for guest virtual servers; they run on host server hardware without an underlying operating system.vSphere

http://en.wikipedia.org/wiki/VMware_vSphere uses VMware’s ESXi hypervisor adding management (as in OpenStack)Note desktop VMware Workstation is a type 2 hypervisorVMware has historically been a software vendor focused on virtualization technologies. It entered the cloud IaaS market when it launched the VMware vCloud Hybrid Service (vCHS) into general availability in September 2013. http://en.wikipedia.org/wiki/VCloud This allows customers to migrate work on demand from their "internal cloud" of cooperating VMware hypervisors to a remote cloud of VMware hypervisors

.

This is called cloud burstingSlide8

Amazon, Azure, Google CloudsGartner has a “magic quadrant” summarizing public clouds 28 May 2014 http://

www.gartner.com/technology/reprints.do?id=1-1UKQQA6&ct=140528 Note Amazon is way ahead!Google with GCE (Google Compute Engine) is just starting IaaS. Previously it offered PaaS with Google App EngineMicrosoft has recently expanded Azure but still catching upSlide9

Amazon Web Services AWS Compute: Elastic Compute Cloud (EC2) offers multitenant, fixed-size and

nonresizable, Xen-virtualized VMs without autorestart. Single-tenant VMs are available via Dedicated Instances. There are special options for HPC, including graphics processing units (GPUs). AWS does not have any formal private cloud offerings, though it is willing to negotiate such deals (such as its deal for the U.S. intelligence community cloud).Storage: VM storage is ephemeral. Persistence requires VM-independent block storage (Elastic Block Store). There is an option for SSDs, as well as storage performance guarantees (Provisioned IOPS). Object-based storage (Simple Storage Service [S3]) is integrated with a CDN (

CloudFront), there is an option for long-term archive storage (Glacier), and AWS offers its own cloud storage gateway appliance.Network: AWS offers a full range of networking options. Complex networking and IPsec VPN is done via Amazon Virtual Private Cloud (VPC). Third-party connectivity is via partner exchanges (AWS Direct Connect).

Security: RBAC (Role based Access Control) is per-element, with customer-defined roles and exceptional control over permissions. AWS has obtained many security and compliance-related certifications and audits.Slide10

Google Compute EngineGoogle has been operating App Engine since 2008, but did not enter the IaaS market until the general-availability launch of GCE in December 2013.

Compute: GCE offers multitenant, fixed-size and nonresizable, KVM-virtualized VMs, metered by the minute. Provisioning is exceptionally fast (typically under 1 minute).Storage: VM storage is persistent, and there is also VM-independent block storage. All block storage is encrypted.Network: Third-party private connectivity is not supported. Customers cannot bring their own private IP addresses (although this need may possibly be addressed by GCE's Advanced Routing features). There is no back-end load balancing.

Security: RBAC permissions apply to the whole account.Google's strategy for Google Cloud Platform centers on the concept of allowing other organizations to "run like Google" by taking Google's highly innovative internal technology capabilities and exposing them as services that other companies can purchase. Consequently, although Google is a late entrant to the

IaaS market, it is primarily productizing existing capabilities, rather than having to engineer those capabilities from scratch. It will therefore be able to advance its offering more rapidly than most competitorsSlide11

Microsoft AzureThe Azure business was previously strictly PaaS with a Windows and

.Net focus, but Microsoft launched Azure Infrastructure Services (which include Azure Virtual Machines and Azure Virtual Network) into general availability in April 2013, thus entering the cloud IaaS market.Compute: Azure VMs (Linux or Windows) are fixed-size, paid-by-the-VM, and Hyper-V-virtualized; they are metered by the minute.Storage: Block storage ("virtual hard disk") is persistent and VM-independent. Object-based cloud storage is integrated with a CDN.

Network: There is no support for complex network topologies. Third-party connectivity is via partner exchange (Azure ExpressRoute).Security: Virtual network topology limitations prevent useful deployment of most security-related virtual appliances, such as a perimeter intrusion detection/prevention system (IDS/IPS). RBAC uses Azure Active Directory, but permissions are whole-account.Slide12

Google Cloud DNS & Amazon Route 53Google Cloud DNS

Authoritative DNS server available as a service in Google CloudThe service is efficient, fault-tolerant and available globallyThis service can be used by the user hosted services in Google Cloud or from third party applicationshttps://developers.google.com/cloud-dns/what-is-cloud-dns Amazon Route 53Authoritative DNS server available as a service in Amazon AWS

Provides a fault-tolerant, very fast DNS service.Similarly to Google Cloud DNS this service can be used by the hosted services in Amazon Cloud or from third party applicationsThe service is available in all continents except Africahttp://aws.amazon.com/route53

/