Processes of Data Mining

APUS

Data Mining

Data Mining in the simplest terms, is a process of simplifying large data to generate valuable information or decluttering large amount of data to only extract useful information. Though the word “Data Mining” was not uttered until in 1980’s, when the term was trademarked by HNC to protect their product “DataBase Mining Workstation”, the concept of data mining was already established by Thomas Bayes in 1763. When Bayes published a theorem called Bayes theorem of looking at relationship between the current and prior probabilities. With the invention of computers, the theorem became a step stone to data mining as it allowed to understand reality based on the probabilities that are estimated. 1980’s was also the era which used algorithms to learn about relationships and what they mean by studying the data (Li, n.d.).

Get Help With Your Essay

If you need assistance with writing your essay, our professional essay writing service is here to help!

Essay Writing Service

Data Mining is a process that involves three elements, studying the relationship of the data, using machine to mimic human like intelligence and programed machines which makes predictions after analyzing the data. Processing large data, a decade ago would have taken months and enormous human efforts, as it involves separating and filtering the data, finding relationships between common data and analyzing it in order to help make an informed decision. With statistics, Artificial Intelligence and Machine Learning combined mining data has gone beyond providing fast, easy and auto analysis of large data. Now the more complex the data is the more value can be extracted from it which is why the data is collected from various sources (SAS Institute Inc., n.d.).

Data Sources

Data can come from many sources but colleting reliable data is important to avoid wasting time and storage on unnecessary and unreliable data. Today it’s easy to find a good source of data as many e-business use data collected through their own e-commerce website, for others it’s the open source for collecting quality data. Valuable data cannot exist without collecting quality data, which is why there are five main sources of Big Data: Media, Cloud, Web, IoT (Internet of Things) and Database. These sources not only provide continues data but reliable data which can help learn the current market and its trends to help create effective business strategies. (Joshi, 2017)

Media being the most popular source for collecting useful data as it can tell what the consumers are interested in and if there are changes in the trend in current market. Media also provides live data which can help make effective business decisions which are related to the current market. Some popular media sources are Facebook, Google, YouTube, Twitter, Instagram, etc., which provide measurable insights on the user interactions. Since the media does not have social or geographical limits it’s the quickest way for the e-Commerce to learn their consumer base, interest, and new and emerging trends to come up with business strategies. (Joshi, 2017)

Cloud computing has set new standards for collecting and storing data. Because it provides greater advantages over the traditional way of storing data anyone from a large corporation to an individual has become custom to using cloud. Storing data on the cloud can be cost effective and provides reliability and availability. When you have important data, you want the data to be available 24/7/365 day and with the convenience of accessing from anywhere through any device. Today’s cloud storages are powerful for storing structured data as well as unstructured data providing flexibility and scalability for business operations to work efficiently. Some of the most popular cloud storages are offered by big names like Amazon, Google, Microsoft, Rackspace and Qubole. (Qubole, n.d.)

Web is probably the most cost-efficient way to collect data as the data is constantly fed by internet users. As simple as you connecting to the internet and searching for product or news on the web browser is proving insights to anyone looking for trending data. Web is a great source for collecting data if you are a start-up company with limited capital to conduct market research. Without having to build your own data structure the open web provides with quality data. Website like Wikipedia is a good example of how web provides quick, live and free data. (Joshi, 2017)

Another way to collect large amount of live data is through Internet of Things (IoT). You driving to a supermarket, movie theater or a restaurant using your mobile device provides great insights to where you spend your time and your interests along with how much time you spend at a particular location. Data that is generated by devices connected to sensors doesn’t require much efforts while providing live accurate data. For example, IoT provides real-time precise data that can help a business restock their selves when their selves are low on products. The type of device which can provide real-time data are medical devices, mobile devices, home appliances, meters, video games, smart watch and the Point of Sale (POS) system in a supermarket. (Joshi, 2017)

Databases that are used today are hybrid, a combination of modern and traditional database structure to acquire a large amount of data. This hybrid database provides best of both modern and traditional database structures with cost efficiency while providing performance. The best fit for business intelligence purposes which is why they have been and continue to be popular in serving government operations. Some of the known databases which include range of data sources are Microsoft Access, Oracle, DB2 or Relational DataBase Management System and SQL. (Joshi, 2017)

When looking at what type of data is collected there are two forms in which the data is collected, quantitative data and qualitative data. Quantitative data is in numerical form like statistics or percentages, and they are easy to represent through the graph or chart. Example of quantitative data is a survey which asks you to rate a movie on a scale of 1 to 5. The data is in numerical form and provides statistical answer in knowing how the movie is. Qualitative data is descriptive data like color of eyes or how healthy your child is. An example of qualitative data is when the professor provides feedback on your assignment based on whether you followed instructions, used proper grammar and submitted the assignment in APA format. (University of Minnesota, n.d.)

Data Mining Tools

Collecting large data is not worth much unless analyzed properly. There are several tools which can help organize data into categories and display visuals to help understand the data at a high level as well as break it down to show low level details. There are three categories of data analyzing tools: Traditional Business Intelligence, Self Service Analytics and Embedded Analytics. Traditional Business Intelligence tools only provide recurring reports. The Self Service Analytics on the other hand gives its user control over what data they want to view by allowing the user to perform queries and pull reports. The Embedded Analytics provides relevant data to the task or department like Human Resources System (HR) or Customer Relationship Management System (CRM). Some of the top analytic tools used today includes Google Analytics, Tableau, Looker, Solver, Dataiku and KNIME just to name a few. (Bell, 2018)

Importance of Data Mining

Big Mining is used by business or organizations to help make actionable business decisions and come up with strategies by analyzing the data. Collecting Big Data provides value towards business operations, finding solutions and coming up with business strategies along with reducing cost and efforts. If stored and utilized properly, Big Data can be promising to keeping business operations running towards the longevity of the company. Big Data with the right analyzing tool and experience data analyst can show trends, cut cost, help business efficiencies, and help increase revenue. (Talend Team, 2018)

Data Mining Techniques

Due to the amount and type of data there are many techniques introduced to mine data. It is important to know the difference between these techniques so the best technique can be selected for the need of your business as well as the solution you are seeking for the problem your business if facing. Because the purpose of data mining is to find solution and help the business make informed decisions, there are numerus techniques being used based on the nature of the business. In this paper I will only discuss the top five most common techniques, which are analysis of classification, association rule learning, anomaly detection, clustering analysis, and regression analysis (Sharma, 2015).

Find Out How UKEssays.com Can Help You!

Our academic experts are ready and waiting to assist with any writing project you may have. From simple essay plans, through to full dissertations, you can guarantee we have a service perfectly matched to your needs.

View our academic writing services

Classification analysis can help gather information from both data and metadata, that is relevant as well as important. This technique helps categorize the data into what is called classes meaning the data is segmented into different classes. Let’s take Outlook for example, each email that is received by Outlook goes through a set of algorithms to categorize the email to identify which folder to place it in. For instance, the algorithm identifies based on its characteristics if the email is legitimate or it needs to be place in a spam folder. This technique is somewhat similar to clustering, but clustering analysis collects data that is similar and groups the data. The main purpose of clustering analysis is to discover groups of data. in groups. For example, an insurance company can use this technique to identify customers whose claim costs are higher than a certain percentage (Sharma, 2015).

The purpose of association rule learning is to identify dependencies that exists between the variables in the database. This technique is popular in the retail industry as it helps predict customer behavior by identify hidden trends in the database with the help of machine learning. Retailers use the information collected by this technique to design their product catalog, analyzing the shopping carts, clustering of the products on their website and configuring layouts of their stores. Similarly, regression analysis also identifies and analyze variable relationship with the difference of helping understand how the value of a variable would change if the independent variable changes. Both analysis techniques are used to predict or forecast trends based on current data (Sharma, 2015).

Looking at anomalies in a data can prove to be great advantage especially in identifying a problem. Finding anomalies in the data critical for any business or an organization. It indicates that something out of ordinary or out of pattern has happened and throws a red flag. It helps to bring attention to an important situation which needs your attention. Let’s take money counting machines at a financial institute for example, when a bundle of hundred-dollar bills are scanned through the machine and there is a ten-dollar bill in the stack the machine informs about the fraud. Anomaly detection is a very useful technique that can help detect fraud, intrusions, faults, disturbances as well as help monitor health systems or detect events in a security system (Sharma, 2015).

Data Mining and Society

There is no doubt how useful data mining is and it has proven great benefits across any given filed whether it is healthcare, e-commerce, government, education systems or financial institutes. First and foremost, it can help prevent fraud at very large scale. Let’s take government run programs like Social Security and Medicare. There is no debate that government agencies are the primary targets to fraud. For years people has be using false information to claim money from the government. With the help of data mining and analysis techniques the government is able to prevent fraudulent activities like the Medicare fraud of 2012 which reviled $452 million being falsely claimed by people (Jones, 2013).

Other agencies like the first responder can also take a great advantage of data mining to provide timely services to the community. With the help of GIS mapping, faster and more efficient disaster recovery is possible. For example, by mining climate related data we can predict weather patterns as well as the impact of a natural disaster. Gaining important information like the severity of the natural disaster, time, location and size the first responders can design a strategy to help evacuate the area and prepare for the aftermath of the disaster. There are countless number of way data mining is and can help better our society by providing safety, improving efficiencies in the way the organizations work and predicting future tends and helping utilize resources effectively (Jones, 2013).

Data mining also provides great benefits at an individual level with the help of “open-data”, which makes important information that can help everyone public. If we take healthcare system for example, because the medical care cost information is publicly available it helps individuals make an informed decision when choosing a medical provider. For students planning their future is possible because educational data is publicly available. An employer can make an informed decision when offering healthcare plans to its employees since the insurance information is freely available. Because of data mining we have society where democratic culture has taken a flare and helped build a just and fair society (Jones, 2013).

On the same page, where data mining is help create a fair society it has also promoted discrimination, unethical practices to gather data and fraud. When we think of negative impact of data mining, we instantly think data breaches; but there are far worse side effects of data mining than someone stealing data. Profiling for example, is not only against ethics but it can have serious affects on the society. There is a large number of data that is collected without the knowledge or consent of the users. When the user is communicating using an online channel, searching or purchasing a product or participating in any online activities their personal as well as online behavioral data is collected to no or little knowledge to them and a profile is created based on this information. Data brokers then sell this data to other agencies for profit making individuals vulnerable to fraud (Redden, 2017).

Every business wants to make profit and some companies would forget to draw line and don’t realize when they have invaded their consumers privacy to increase profits. Some agencies use consumers personal data to identify individual shopping trends to suggest product or services that can cause embarrassment. Credit card companies for example can peek at their customers credit card transactions to see if they paid a marriage consular to change limits on the card. Insurance companies on the other hand does same thing, based on the age and gender they set the insurance policies regardless whether you are a good driver or not. We already know that the insurance companies have set rates based on the area you live in (Redden, 2017).

Use of data mining tools has also raised concern in the judicial system as well. An investigation done in 2017 showed how using data mining tools to set bail amounts are discriminating against color and gender of an individual. For many agencies gender makes a big difference and one study found how Google Ads was showing job advertisements based on user’s gender. When companies are sorting job applicants using data mining tools it filtered the applicants also based on their health reported a data scientist. And then there is always a concern of large number of individual negatively impacted if their personal data is compromised. With the wrong intentions data mining can do as much harm as it can benefit (Redden, 2017).

Conclusion

From being just a theorem data mining has come a long way. There is no doubt data mining is one of the best advancements in the leading technologies. Countless advantages have been experienced by the society which are still helping improve our day to day life. What would have been impossible just a decade ago is now not only possible but with a greater efficiency. It also important to notice the side effects that are caused by data mining and how it makes us realize even with such advance technologies we have a long way to go in creating a fair and just society. With new, better and faster technologies emerging we can only hope to see even greater benefits coming form data mining.

References:

Bell, S. (2018, April 30). Top 15 Data Analytics Tools. Retrieved February 7, 2019, from import.io: https://www.import.io/post/top-15-data-analytics-tools/
Jones, R. (2013, October 02). 3 positive side effects of data mining in Washington. Retrieved from edq.com: https://www.edq.com/blog/3-positive-side-effects-of-data-mining-in-washington/
Joshi, N. (2017, November 26). Top 5 sources of big data. Retrieved January 6, 2019, from Allerin: https://www.allerin.com/blog/top-5-sources-of-big-data
Li, R. (n.d.). History of data mining. Retrieved from hackerbits.com: https://hackerbits.com/data/history-of-data-mining/
Qubole. (n.d.). Big Data Cloud Computing & Databases. Retrieved February 6, 2019, from Qubole: https://www.qubole.com/resources/big-data-cloud-database-and-computing/
SAS Institute Inc. (n.d.). Data Mining. Retrieved from sas.com: https://www.sas.com/en_us/insights/analytics/data-mining.html
Sharma, P. (2015, September 08). Top 5 Data Mining Techniques. Retrieved from infogix.com: https://www.infogix.com/top-5-data-mining-techniques/
Talend Team. (2018, November 29). The Future of Big Data. Retrieved February 5, 2019, from Talend: https://www.talend.com/resources/future-big-data/
University of Minnesota. (n.d.). Qualitative or Quantitative Data? Retrieved February 6, 2019, from Cyfar: https://cyfar.org/qualitative-or-quantitative-data

Share this: Facebook Twitter Reddit LinkedIn WhatsApp

Cite This Work

To export a reference to this article please select a referencing style below:

Processes of Data Mining

Author

References:

Cite This Work

Essay Writing
Service

Get Academic Help Today!