探索宝石数据集
在这一步中,我们将熟悉宝石数据集及其结构,为后续的分析奠定基础。
首先,确保你以 hadoop
用户身份登录,可以在终端中运行以下命令:
su - hadoop
现在让我们从创建一个示例开始。将以下命令行复制到终端中以创建我们的示例文件。
mkdir -p hadoop/gemstone_data
cd hadoop/gemstone_data
echo "gem_id,gem_name,color,hardness,density,refractive_index" > gem_properties.csv
echo "1,Ruby ,Red ,9.0 ,4.0,1.77" >> gem_properties.csv
echo "2,Emerald ,Green ,8.0 ,3.1,1.58" >> gem_properties.csv
echo "3,Sapphire,Blue ,9.0 ,4.0,1.76" >> gem_properties.csv
echo "4,Diamond ,Colorless,10.0,3.5,2.42" >> gem_properties.csv
echo "5,Amethyst,Purple ,7.0 ,2.6,1.54" >> gem_properties.csv
echo "6,Topaz ,Yellow ,8.0 ,3.5,1.63" >> gem_properties.csv
echo "7,Pearl ,White ,2.5 ,2.7,1.53" >> gem_properties.csv
echo "8,Agate ,Multi ,7.0 ,2.6,1.53" >> gem_properties.csv
echo "9,Rose ,Pink ,7.0 ,2.7,1.54" >> gem_properties.csv
echo "10,CatsEye,Green ,6.5 ,3.2,1.54" >> gem_properties.csv
echo "gem_id,application" > gem_applications.csv
echo "1,Fire Magic " >> gem_applications.csv
echo "2,Earth Magic " >> gem_applications.csv
echo "3,Water Magic " >> gem_applications.csv
echo "4,Enhancement Magic" >> gem_applications.csv
echo "5,Psychic Magic " >> gem_applications.csv
echo "6,Lightning Magic " >> gem_applications.csv
echo "7,Illusion Magic " >> gem_applications.csv
echo "8,Strength Magic " >> gem_applications.csv
echo "9,Love Magic " >> gem_applications.csv
echo "10,Stealth Magic " >> gem_applications.csv
现在我们已经在 gemstone_data
目录中,让我们花点时间查看该目录的内容:
ls
当你浏览目录时,你会看到这两个文件,每个文件都专注于宝石数据的不同方面。gem_properties.csv
深入探讨了宝石的物理特性,而 gem_applications.csv
则提供了关于它们各种魔法用途的见解。
为了更深入地了解我们的数据集,让我们查看其中一个文件的前几行:
head -n 5 gem_properties.csv
结果应如下所示:
gem_id,gem_name,color,hardness,density,refractive_index
1,Ruby ,Red ,9.0 ,4.0,1.77
2,Emerald ,Green ,8.0 ,3.1,1.58
3,Sapphire,Blue ,9.0 ,4.0,1.76
4,Diamond ,Colorless,10.0,3.5,2.42
此命令显示了 gem_properties.csv
文件的前五行,让你对其结构和内容有了初步了解。