AUTHOR
LINK: https://www.idosr.org/wp-content/uploads/2022/04/IDOSR-JAS-71-29-40-2022..pdf
ABSTRACT
- This review paper comprehensively detailed the methodologies involved in data
analysis and theevaluation steps. It showed that steps and phases are the two main
methodological parameters to be considered during data assessment for data of high
qualities to be obtained.It is reviewed from this research that poor data quality is always
caused by incompleteness, inconsistency, integrity and time-related dimensions and the
four major factors that causes error in a dataset are duplication, commutative entries,
incorrect values and black entries which always leads to catastrophe. This paper also
reviewed the types of datasets, adopted techniques to ensure good data quality, types
of data measurement and its classifications.Furthermore, the Kaggle site was used as a
case study to show the trend of data growth and its consequences to the world and the
data bankers. It is then deduced that low data quality which is caused as a result of
errors during primary data mining and entries leads to wrong results which bring about
the wrong conclusions. It was advised that critical data quality measures should be
adopted by the data bankers such as Kaggle before uploading the data into their site to
avoid catastrophe and harm to humans.Finally, the outlined solutions as reviewed in
this paper will serve as a guide to data bankers and miners to obtain data of high
quality, fit for use and devoid of a defect.
Keywords: Accuracy, Data Bank,Data Quality, Dataset, Defect, fit for use, Kaggle
PUBLISHED
2022-11-21
HOW TO CITE
Val Hyginus U. Eze,Martin C. Eze, Chibuzo C. Ogbonna, Valentine S.Enyi,Samuel A. Ugwu, Chidinma E. Eze (2022). Review of the Implications of Uploading Unverified Dataset in A Data Banking Site (Case Study of Kaggle) . IDOSR JOURNAL OF APPLIED SCIENCES 7(1) 29-40.
ISSUE
SECTION
Article