Pages

Friday, January 27, 2023

FDS/Food_Display_Table


Link of Excel Dataset : https://1drv.ms/x/s!ArXyLhCCu5vFgzVKbXpfzMrOY_je?e=qUXywq

Link to Colab Notebook : https://colab.research.google.com/drive/1OZawdTfZDJeWsRS5RhoXFG4T95MEd7zt?usp=sharing

 

The Data Set

In this tutorial, we will use a Excel file we created from Kaggle data. You can download the file by using above Link .

!pip install xlrd==1.2.0

We will use this data set to find the number of calories for the Food Display Table visualize movies with highest Fats and Calories and calculate statistical information about the Food Table. We will be analyzing and exploring this data using Python and pandas, thus demonstrating pandas capabilities for working with Excel data in Python.

Read data from the Excel file

We need to first import the data from the Excel file into pandas. To do that, we start by importing the pandas module.

import pandas as pd

We then use the pandas’ read_excel method to read in data from the Excel file. The easiest way to call this method is to pass the file name. If no sheet name is specified then it will read the first sheet in the index (as shown below).

 

excel_file = 'Food_Display_Table.xlsx'

Food_Display_Table = pd.read_excel(excel_file)

Here, the read_excel method read the data from the Excel file into a pandas Data Frame object. Pandas defaults to storing data in DataFrames. We then stored this Data Frame into a variable called Food_Display_Table.

Pandas has a built-in DataFrame.head() method that we can use to easily display the first few rows of our Data Frame. If no argument is passed, it will display first five rows. If a number is passed, it will display the equal number of rows from the top.

Food_Display_Table. Head()

 

Food_Display_Table_subset_columns = pd.read_excel(excel_file, usecols=[1,2,3,4,5,6])

Food_Display_Table_subset_columns.head()

 

Food_Display_Data_sheet1 = pd.read_excel(excel_file, sheet_name=1, index_col=0)

Food_Display_Data_sheet1.head()

We can check if this concatenation by checking the number of rows in the combined DataFrame by calling the method shape on it that will give us the number of rows and columns.

 

Food_Display_Table.shape

 

(2014, 26)

We can also use the Excel File class to work with multiple sheets from the same Excel file. We first wrap the Excel file using ExcelFile and then pass it to read_excel method.

If you have more than one Sheet in your excel file then you can concat the sheet by using following codes.

xlsx = pd.ExcelFile(excel_file)

Food_Display_Table_sheets = []

for sheet in xlsx.sheet_names:

   Food_Display_Table_sheets.append(xlsx.parse(sheet))

Food_Display_Table = pd.concat(Food_Display_Table_sheets

 

sorted_by_gross = Food_Display_Table.sort_values(['Calories'], ascending=False)

Since we have the data sorted by values in a column, we can do a few interesting things with it. For example, we can display the Food Table by Calories

In Excel, you’re able to sort a sheet based on the values in one or more columns. In pandas, you can do the same thing with the sort_values method. For example, let’s sort our Food Display Table DataFrame based on the Gross

sorted_by_gross = Food_Display_Table.sort_values(['Calories'], ascending=False)

sorted_by_gross["Calories"].head(10)

723  1667.40                                                                    1283 1283.52                                                                 60   1069.20                                                                    1501 1024.92                                                                1458 976.10                                                                  1103 818.37                                                                    59   801.90                                                                    1062 798.64                                                              1285 792.54                                                                  1051 785.40 Name = Calories dtype: float 64

We can also create a plot for the top 10 Food Items by calories. Pandas makes it easy to visualize your data with plots and charts through matplotlib, a popular data visualization library.

And the Result will look like this.

import matplotlib.pyplot as plt

We will draw a bar plot where each bar will represent one of the top 10 Food. We can do this by calling the plot method and setting the argument kind to barh. This tells matplotlib to draw a horizontal bar plot.

 

sorted_by_gross['Calories'].head(10).plot(kind="barh")

plt.show()

 


Reading a subset of columns

Although read_excel defaults to reading and importing all columns, you can choose to import only certain columns. By passing parse_cols=[1,2,3,4,5,6] we are telling the read_excel method to read only the first columns till index six or first seven columns (the first column being indexed zero).

 

Food_Display_Table_subset_columns = pd.read_excel(excel_file, usecols=[1,2,3,4,5,6])

Food_Display_Table_subset_columns.head()

We need to first identify the column or columns that will serve as the index, and the column(s) on which the summarizing formula will be applied. Let’s start small, by choosing Food_Code as the index column and Calories as the summarization column and creating a separate DataFrame from this data.

Food_Display_Table["Solid_Fats"] = Food_Display_Table["Calories"] - Food_Display_Table["Saturated_Fats"]

This excel sheet consist a total of 5 Rows and 26 Columns.

Food_Display_Table.head()

Display_Name

Portion_Default

Portion_Amount

Portion_Display_Name

Factor

Increment

0

Sour cream dip

1

0.25

cup

0.25

0.25

1

Ice cream, regular

1

1.00

cup

1.00

0.25

2

Ice cream, rich

1

1.00

cup

1.00

0.25

3

Soft serve ice cream

1

1.00

cup

1.00

0.25

4

Ice cream bar, chocolate covered

1

1.00

bar (bar = 3 fl oz)

1.00

0.50

 

5 rows × 26 columns

Let’s use the sort_values method to sort the data by the new column we created and visualize the top 10 Food by Added_Sugars.

sorted_Food_Display_Table = Food_Display_Table[['Added_Sugars']].sort_values(['Added_Sugars'], ascending=[False])                      sorted_Food_Display_Table. Head(10)['Added_Sugars'].plot.barh()                plt.show()


We need to first identify the column or columns that will serve as the index, and the column(s) on which the summarizing formula will be applied. Let’s start small, by choosing Food_Column as the index column and Calories as the summarization column and creating a separate DataFrame from this data.

Food_Display_Table_subset=Food_Display_Table[['Food_Code', 'Calories']]  Food_Display_Table_subset.head()

 

calories_by_Food_Code = Food_Display_Table_subset.pivot_table(index=['Food_Code'])                                              calories_by_Food_Code.head()

 

Calories

Food_Code

7258

28.560

7259

23.520

16126

50.400

16129

101.625

23559

109.060

We can use this pivot table to create some data visualizations. We can call the plot method on the DataFrame to create a line plot and call the show method to display the plot in the notebook.

calories_by_Food_Code.plot()                                                plt.show()

 


Food_Display_Table_subset = Food_Display_Table[['Food_Code','Calories']]  Food_Display_Table_subset.head()

 

Food_Code

Calories

0

12350000

133.65

1

13110100

267.33

2

13110120

368.52

3

13110200

347.73

4

13120100

169.12

We saw how to pivot with a single column as the index. Things will get more interesting if we can use multiple columns. Let’s create another DataFrame subset but this time we will choose the columns, Grains, Vegetables and Calories.

Food_Display_Table_subset=Food_Display_Table[['Grains''Vegetables''Calories']]                                             Food_Display_Table_subset.head()

 

Grains

Vegetables

Calories

 

 

 

0

0.04799

0.0407

133.65

1

0.00000

0.0000

267.33

2

0.00000

0.0000

368.52

3

0.00000

0.0000

347.73

4

0.00000

0.0000

169.12

 

Exporting the results to Excel

If you’re going to be working with colleagues who use Excel, saving Excel files out of pandas is important. You can export or write a pandas DataFrame to an Excel file using pandas to excel method. Pandas uses the xlwt Python module internally for writing to Excel files. The to excel method is called on the DataFrame we want to export. We also need to pass a filename to which this DataFrame will be written.

Food_Display_Table.to_excel('output.xlsx')

!pip install xlsxwriter

We need to be able to make our output files look nice before we can send it out to our co-workers. We can use pandas ExcelWriter class along with the XlsxWriter Python module to apply the formatting.

We can do use these advanced output options by creating a ExcelWriter object and use this object to write to the Excel file.

writer = pd.ExcelWriter('output2.xlsx', engine='xlsxwriter')Food_Display_Table.to_excel(writer, index=False, sheet_name='report')            workbook = writer.bookworksheet = writer. Sheets['report']

 Finally, we save the output file by calling the method save on the writer object.

 writer.save()

 


No comments:

Post a Comment