Online Tutoring on Managing DRM Issues with Machine Learning

I. Introduction

Companies today are constantly inundated by Big Data problems – the sheer volumes of data posing a huge challenge for running businesses effectively, the only choice being is to manage and leverage the potential in these large data pool with efficient data mining. Data mining is a process that companies adopt to discover patterns in large data sets, thus turning the raw data into meaningful pieces of information that can grow the business (Twin, 2020). Discovering patterns provides valuable insights on consumer behaviors, helping the company to re-strategize. For data mining to be effective, the data collection must be effective, along with data warehousing and consequent data processing (Twin, 2020). It also helps develop machine learning models, which are increasingly found to improve data resource management (DRM) across organizations. Most DRM issues can be solved with machine learning.

AirBnB, the vacation broker, is one such company boosting its business growth in leaps and bounds with effective data mining and machine learning, thus taking the start-up space by storm. This document endeavors to explore if machine learning can help address DRM issues in organizations by drawing a practical reference to the AirBnB business case. The research is based on an extensive literature review, its following results and discussions.

i. Overview of the Business Issue

Data is at the heart of AirBnB’s business, without which it is crippled. This is more because AirBnB is entirely an online business thriving on Big Data. Each day, the company creates 20 TB data and archives about 1.4 petabytes of data (DeZyre, 2020). Naturally therefore, managing this huge data pool has always been an issue for the company. Not just customers, every day the company is bombarded with huge data volume also from its hosts, locations, and for the rental demands.

Traditional data warehousing generally generates daily end-of-day totals, but cannot show interim data. This is serious data loss, which an online-based company like AirBnB cannot afford. As a solution, they began effective data mining and consequently developed machine learning models. Taking it another leap forward, AirBnB launched Zipline, its data management platform, which solved its enterprise-level DRM issues for machine learning (Koidan, 2019). [Refer to Appendix for details of the case.]

Before Zipline, AirBnB’s ML team spent about 60% of their time to collect and script transformations for ML functions; with Zipline this effort sees substantial reduction from months to only a few days (Simha & Hoh, 2018). Zipline ensures online-offline data consistencies, data quality, effective data monitoring, improved data search and integration with end-to-end workflow (Koidan, 2019).

ii. Research Approach

Using the AirBnB case as a practical reference, this report aims to find out answers to the following research question:

Can machine learning manage data resource management issues within organizations?

The research approach to find answers to the above research question is based entirely on literature review. The literature review has a two-pronged approach in this document –

To gain understanding about the extent to which organizations suffer from DRM issues, and
To analyze and discuss how companies are using effective data mining and machine learning to solve DRM issues and drive business growth.

Therefore, the literature review will be undertaken in two parts. One, to research and study the extensiveness of DRM issues in today’s organizations, which organizations suffer more from inefficient DRM and how these impact their profitability and business opportunities. Second, to explore more business cases of machine learning application in organizations to solve DRM issues.

The scope of review will not include investigative techniques like focus group discussions, interviews with company management or surveys and questionnaires. Instead, the results and observations will rest totally on the varied literature review – both the primary research and secondary research sources explored online.

II. Literature Review

A deluge of data is only natural in a digital age. According to one estimate, 2.5 quintillion bytes of data is produced every day from various sources (Mund, 2016). Indeed, data is generated every second, from everywhere and through all kinds of devices. This data deluge often poses significant challenges for organizations in terms of collecting, sorting, managing and interpreting these data to get real value out of it (Mund, 2016). Managing these numerous data challenges is simply referred to as data resource management (IGI Global, n.d.). With effective data resource management or DRM, organizations can “describe, interpret, and forecast and provision economic and business activities” and can also “decide for the next direction” (Saleh, et. al., 2018, p.1383).

To help manage DRM issues, today’s companies deploy varied smart techniques towards data mining and data analytics of huge volumes of data flowing in from various sources. While some organizations rely on business intelligence software (PAT Research, n.d.), others depend on artificial intelligence and machine learning to unlock the value embedded in large data sets (Koidan, 2019). The machine learning algorithms help tap market trends, outliers and business boosters in a prompt and seamless manner, so that organizations save vital time while still being able to derive valuable information from the data deluge (The European Business Review, 2020). According to Saleh, et. al. (2018), the challenge is much higher as 90% of data that is produced every day is unstructured data (generating in the form of images, audios, videos, email messages, etc). This mix of structured and unstructured data leads to DRM issues such as data storage, data mining and data analysis. Established data managing technologies are constantly failing to keep pace with the immense volume of data generated daily. The better the data mining and analysis in an organisation, more informed its decision-making (Chen, et. al., 2013).

These DRM challenges are not industry-specific, although DRM is critical to industries dealing with healthcare, energy, catastrophe forecasts, insurance, economic improvements, manufacturing, banking, etc. (Ramageri & Desai, 2013; Yi, et. al., 2014). There is also significant DRM opportunities and challenges for retailers who are using data mining tools to understand and predict trends, analyse customer behaviors and apply target marketing (Ramageri & Desai, 2013). Most retailers find it hard to identify the appropriate customers for product campaigns. Data mining comes of major use to them. The case of AirBnB, the online vacation rental, also reflects how companies overcome DRM challenges with efficient data mining and machine learning models (Koidan, 2019).

Interestingly, DRM issues existed as long back as in the later part of the 20^th century as well. This is evidenced in the work of Rabinovitcg (1999) where the author discussed how a Utah-based department store chain ZCMI undertook data mining initiatives for customer data integration into several merchandising systems and deriving business value. Xu and Cheung (1997) discussed the case of a fund-management firm, LBS Capital Management Inc.1, that used genetic algorithms, neural networks and efficient data systems to handle portfolios to the tune of USD600 million. Data mining technologies have also been used by numerous other companies like the American Greetings, Procter and Gamble, Walmart, Coke, Macys West, Pepsi, Penske Logistics, etc. (Betts, 2002; Ramageri & Desai, 2013).

Data mining has also benefitted healthcare by improving infection control, hospital ranking, identification of high-risk patients, etc. (Biranbaum, 2004). In manufacturing companies, data mining helps predict machine failures, thus saving maintenance costs (Bergmann, 2012). In the financial sector, data mining has been found to prevent credit card frauds, predict market trends, manage successful customer relations, etc. (Preethi & Vijayalakshmi, 2017). Therefore, the potential of data mining has long been explored across sectors, which also indicates that DRM issues existed for decades and technologies to manage these issues are still evolving. The future of DRM seems to rest with machine learning and artificial intelligence that will be fundamentally based on a robust data mining infrastructure.

DRM issues in organizations are varied. Depending on the data lifecycle, Akerkar (2014) and Zicari (2014) categories these issues into (a) data challenges, (b) process challenges and (c) management challenges.

Sivarajah, et. al. (2017) studied 227 articles on data management issues in companies and observed that data mining and data cleansing (part of process challenges) appear to be most significant DRM issue in organizations today, as about 43% of these articles mentioned the importance of data mining in a world of maximum unstructured data. Unless patterns are discovered in the heavy load of structured and unstructured data, valuable insights will not emerge for any company to take data-driven decisions. However, machine learning is more evolved from data mining – it drives value to another level by learning from the patterns and thus predicting future trends. As Davies (2018) rightly observed, data mining acts as the information source which machine learning banks on. It learns from the trained datasets (established by the data mining process) and predicts the outcomes. The algorithms are repeatedly fed, and then the computational intelligence offers near-perfect predictions that help in important decision-making for the company.

Today, the importance and application of machine learning technology has increased manifold. Data mining is no longer enough to manage data in organizations. Not only big companies like Microsoft, Google, Netflix, Twitter or Amazon, but even the small businesses are considering machine learning as an essential way to make sense of all the data (7wData, 2016). Forbes Insights and another predictive marketing solutions provider called Lattice have found that 86% businesses using machine learning algorithms for 2+ years have witnessed almost 50% increase in their marketing ROI (7wData, 2016). Machine learning, according to Agresta (2019), is the future of data management. The author noted that most organizations of the 21^st century would aim to apply machine learning and artificial intelligence technologies into their processes for improving data quality and DRM processes. They will attempt to scale up their DRM infrastructure based on machine learning technologies and without spending substantially on hiring data resource personnel (scientists, analytics, etc.).

AirBnB has already grown into a USD 25.5 billion company today, by unlocking machine learning opportunities (DeZyre, 2020). The company wanted to improve the guest user experience when they come to the site to book accommodation by maximizing the probability that a stay request would be readily accepted by a host. This could have only been done by matching algorithms better. The primary challenge was to factor in a user’s unique booking preferences along with accommodation rankings and geographical nearness (LuluQ, 2017). With logistic regression, AirBnB mapped stay requests and host preferences based on historical data and came up with a preference coefficient. Consequently, the company saw a 4% immediate increase in request acceptance, thus adding value to both host and guest experiences on the AirBnB platform. Machine learning helped the company launch AirBnB Experiences in 2016 with 500 Experiences across 12 cities globally (Grbovic, et. al., 2019). By the end of 2018, AirBnB Experiences had over 20,000 active Experiences in 1,000+ cities worldwide.

While data mining tools and techniques are long in use since decades now, machine learning as a step forward from data mining is a relatively new concept and not all companies have unlocked its potential yet to drive business growth. As of 2019, companies are observed to be in the ‘Growth’ phase of machine learning, as reported by Disbudak (2019). The best practices are still evolving and many companies have not yet adopted it to simplify processes. The LinkedIn survey conducted by Disbudak (2019) found that 54% of the companies surveyed had already executed machine learning technologies, but 28% were still considering it to scale up business. The survey also observed that 35% of early-phase and mature-phase adopters of machine learning clearly acknowledge to have realized improved customer support and consequent business growth. Although machine learning technologies can be implemented in a variety of organizational projects such as natural language processing (NLP), planning and exploring, machine vision, handling and control, etc., these are found to be mostly implemented in information processing, which is nothing but making sense of the Big Data (Disbudak, 2019). One interesting finding in the survey was that the start-up businesses were exploiting the maximum benefits from machine learning. It is no longer a benefit accessed only by giant companies, the new generation machine learning platforms ensure access to companies of all sizes and sectors. In fact, smaller businesses seemed to be more ahead than their bigger counterparts in machine learning developments because they leveraged its potentials from the very start of their operations (Disbudak, 2019).

[citationic]