Big Data - NoSql & Cloud Platforms


Overall, there are numerous tools and approaches that are being developed to meet the dynamic needs of a business. It is the responsibility of technical teams to be informed on these tools in order to be competitive for the greater good of a company. It is very important for a data scientist to be familiarized in big data technology approaches and tools as they’re constantly evolving. Areas I found interesting include NoSql methods, and Cloud based virtualization.
Structured data has been typically aligned with relational database management systems. Now with the increase in unstructured data, companies have had to evolve in how they maintain their data. Having Nosql options for big data storage and information retrieval queries enables a new method of development within big data. Nosql allows users to query through data with greater broadness and ease.
Companies like LinkedIn, Facebook or Amazon can utilize NoSql approaches as they have multiple webpages being deployed continuously and thousands or millions of users accessing their applications at the same time. The load inbalances and data bottlenecks are bound to be created with the immense amount of data being generated by todays devices. As noted by the web statistics report in 2014, there about 3 billion people who are connected to the world wide web and the amount of time users spend on the web is close to 35 billion hours per month and is increasing gradually. With NoSql, companies can store have a distributed data flow that allows unstructured data to be processed and analyzed within different databases; Graph, Key-Value pairs, Columnar and Document.
Some of the benefits of NoSql are Speed, Scalability and data Availability. Some of the benefits and challenges of NoSql include learning the tools (Hbase, Couchbase, Cassandra, MongoDb) that may take time in gaining effective skills. With a combination of other tools such as Python and Apache Spark, NoSql approaches could be facilitated from a user perspective.
Another area of interest is Cloud based virtualization for data applications. Virtualization has allowed a new cloud-based approach to application and tools linked to user hardware. Monitoring tools such as sensors or and event logs can now store information and replicate it across multiple networks within milliseconds. Virtual server environments such as IBM's Data Science Experience, are allowing users to run big data tools while the user does not need to mind the physicality of hardware expense in the same manner that we once did.
Virtualization of big data tools benefits the Software Development Life Cycle greatly because a developer can now store their programs and test them on different cloud environments. Some challenging areas for cloud-based approaches include; Security, network necessity, integration and speed. It is difficult to configure/customize cloud applications because they could require high level administrative rights to their local servers. This is typically not enabled to many users for security reasons.
Cloud based approaches in combination with NoSql methods are gaining popularity among the big data world. These new ways of building and deploying applications are increasing the speed at which services are being delivered to the ultimate focus point; the user.
References:
DeZyre. (2015, 03 19). NoSQL vs SQL- 4 Reasons Why NoSQL is better for Big Data applications. Retrieved from DeZyre.com: https://www.dezyre.com/article/nosql-vs-sql-4-reasons-why-nosql-is-better-for-big-data-applications/86
Elmonema, M., Nasrb, E., & Geitha, M. (2017, 03 10). Benefits and challenges of cloud ERP systems – A systematic literature review. Retrieved from Science Direct: http://www.sciencedirect.com/science/article/pii/S2314728816300599

Comments

  1. Thank you Teju, new articles coming soon.

    ReplyDelete
  2. This comment has been removed by the author.

    ReplyDelete

Post a Comment

Popular Posts