Pandas for reading an excel dataset. In this article, you are going to learn python about how to read the data source files if the downloaded or retrieved file is an excel sheet of a Microsoft product. We can read an excel file using the properties of pandas. It is necessary to import the pandas packages into your python script file. The below examples will help you in understanding how to read an excel file.
Consider the below simple excel sheet having the name “Data.xlsx” and “Data” as its sheet name and run the example codes to see different ways for reading an excel file. Note that you must provide the exact location path of a file located in your system drive or directory in the program code as shown in the below examples. Create a similar excel file with the name “Data.xlsx” and specify the sheet name as “Data” for your execution as shown in the below picture.
Import the pandas package for reading the excel files. Use “import pandas as pd” statement in your python script. We can use the method “pd.read_excel()” for reading an excel file by accessing the properties of the pandas library. Pass the file name and its path location with “.xlsx” file type as parameter for “pd.read_excel()” method.
Example 1
import pandas as pd df=pd.read_excel("C:\\Users\\admin\\Desktop\\Data.xlsx") print(df) Gender Height Weight lbs 0 M 5 97 95-117 1 F 6 132 144-176 2 F 5 112 90-110 3 M 6 185 160-196
This example produces the above result after reading the data present in the excel file by assigning the index values for each row.
Example 2
import pandas as pd df=pd.read_excel("C:\\Users\\admin\\Desktop\\Data.xlsx",sheet_name="Data") print("Column headings:") print(df.columns) Column headings: Index(['Gender', 'Height', 'Weight', 'lbs'], dtype='object')
In this example, we have used the parameter called “sheet_name”. The sheet name is the name of the sheet which exists after opening an excel file. This “read_excel()” method will read only the data which is present in that specific sheet name. In this case, the sheet name is “Data”. In this example, we are printing only the column names which are present in the excel sheet.
Now let us consider reading the specific column data available in an excel file. Run the below example code to see the result.
Example 3
import pandas as pd df=pd.read_excel("C:\\Users\\admin\\Desktop\\Data.xlsx",sheet_name="Data") gender = df['Gender'] height = df['Height'] weight = df['Weight'] lbs = df['lbs'] print("Column: Gender\n",gender) print("Column: Height\n",height) print("Column: Weight\n",weight) print("Column: lbs\n",lbs) Column: Gender 0 M 1 F 2 F 3 M Name: Gender, dtype: object Column: Height 0 5 1 6 2 5 3 6 Name: Height, dtype: int64 Column: Weight 0 97 1 132 2 112 3 185 Name: Weight, dtype: int64 Column: lbs 0 95-117 1 144-176 2 90-110 3 160-196 Name: lbs, dtype: object
In this example, we are reading the entire data of specific column names. We have implemented “df[‘Gender’]” to read all the rows in “Gender” column, “df[‘Height’]” to read all the rows in “Height” column, “df[‘Weight’]” to read all the rows in “Weight” column and “df[‘lbs’]” to read all the rows in “lbs” column name.
Consider the below example for reading the data in excel file by using the array index with iterations.
Example 4
import pandas as pd df=pd.read_excel("C:\\Users\\admin\\Desktop\\Data.xlsx",sheet_name="Data") print("Gender column:") for i in df.index: print(df['Gender'][I]) print("Weight column:") for i in df.index: print(df['Weight'][I]) print("Height column:") for i in df.index: print(df['Height'][I]) print("lbs column:") for i in df.index: print(df['lbs'][I]) Gender column: M F F M Weight column: 97 132 112 185 Height column: 5 6 5 6 lbs column: 95-117 144-176 90-110 160-196
In the above example, we have implemented for loop for reading specific columns from an excel file. Each row is read by iterating the for loop followed by its column index locations.
The below example 5 explains about reading only a specific row of a particular column in an excel file. The row is read by specifying only the particular index location of the column name.
Example 5
import pandas as pd df=pd.read_excel("C:\\Users\\admin\\Desktop\\Data.xlsx",sheet_name="Data") gender = df['Gender'] print("Gender at index 3:",gender[3]) weight = df['Weight'] print("Weight at index 1:",weight[1]) height = df['Height'] print("Height at index 0:",height[0]) lbs = df['lbs'] print("lbs at index 2:",lbs[2]) Gender at index 3: M Weight at index 1: 132 Height at index 0: 5 lbs at index 2: 90-110
From the above example, we can observe that “gender[3]” will display the string “M” for having the index location as 3. The “weight[1]” will display the number “132” for having the index location as 1, similarly, “height[0]” will display number “5” for having the index location as 0 and “lbs[2]” will display “90-110” for having index location as “2”.