Recently Azure announced Data Lake Gen 2 preview. As far as I know the main difference between Gen 1 and Gen 2 (in terms of functionality) is the Object Store and File System access over the same data at the same time. Other differences would be the price, available location etc. Can anyone explain what are the other key differences between Gen 1 and Gen 2?
Azure Data Lake Gen 1 vs Gen 2
azureazure-data-lake
Related Solutions
Web Roles give you several features beyond Web Apps (formerly Web Sites):
- Ability to run elevated startup scripts to install apps, modify registry settings, install performance counters, fine-tune IIS, etc.
- Ability to split an app up into tiers (maybe Web Role for front end, Worker Role for backend processing) and scale independently
- Ability to RDP into your VM for debugging purposes
- Network isolation
- Dedicated virtual IP address, which allows web role instances in a cloud service to access IP-restricted Virtual Machines
- ACL-restricted endpoints (added in Azure SDK 2.3, April 2014)
- Support for any TCP/UDP ports (Web Sites are restricted to TCP 80/443)
Web Apps have advantages over Web Roles though:
- Near-instant deployment with deployment history / rollbacks
- Visual Studio Online, github, local git, ftp, CodePlex, DropBox, BitBucket deployment support
- Ability to roll out one of numerous CMS's and frameworks, (like WordPress, Joomla, Django, MediaWiki, etc.)
- Use of SQL Database or MySQL
- Simple and fast to scale from free tier to shared tier to dedicated tier
- Web Jobs
- Backups of Web Site content
- Built-in web-based debugging tools (simple cmd/powershell debug console, process explorer, diagnostic tools like log streaming, etc.)
With the April 2014 and September 2014 rollouts, there are now some features common to both Web Apps and Web Roles (and Worker Roles), including:
- Staging+production slots
- Wildcard DNS, SSL certificates
- Visual Studio integration
- Traffic Manager support
- Virtual Network support
Here's a screengrab I took from the Web Sites gallery selection form:
I think Web Apps are a great way to get up and running quickly, where you can move from shared to reserved resources. Once you outgrow this, you can then move up to Web Roles and expand as you need.
The easiest way to think of Data Lake is to think of this large container that has like a real lake with rivers coming into the river you never know where the rivers are coming from (or what "type" of river). Azure Data Lake was introduced to make big data easy for developers, data scientists, and analysts to store data of any size. It removes the complexities of ingesting and storing all your data while making it faster to get up and running with big data. Data Lake is able to stored the mass different types of data (Structured data, unstructured data, log files, real-time, images, etc. ) and to blend that together, to correlate many different data types. The key thing here is as we are moving from traditional way to the modern tools (like Hadoop, Cassandra, NoSQL DB, etc). Azure Data Lake includes three services:
- Azure Data Lake Store, a no limits data lake that powers big data analytics
- Azure Data Lake Analytics, a massively parallel on-demand job service
- Azure HDInsight, a full managed Cloud Hadoop and Spark offering
Azure Data Lake Store is like a cloud-based file service or file system that is pretty much unlimited in size. We can run services on top of the data that's in that store. So you could use Hadoop or Spark in an HDInsight cluster, or you could use the Azure Data Lake analytic service, which is a complement to the Azure Data Lake Store. And what that service will let you do is to run jobs that effectively query the data you have stored in the Azure Data Lake store and generate output results.
Best Answer
Basically, think of gen2 as a superset of gen1 plus all of the best parts of blob storage: tiers, HDFS and object store API's and presumably the ability to efficiently handle the management of over 35K files and efficiently dealing with many small sizes and more trickle write type operations.. plus its cheaper.
I'm trying to get some clarity on a few specifics but not finding much in the meantime try these links:
https://azure.microsoft.com/en-us/blog/a-closer-look-at-azure-data-lake-storage-gen2/
https://docs.microsoft.com/en-us/azure/storage/data-lake-storage/introduction