The ecosystem in our planet is getting worse day by day. Not only the environmental change but also the extinction of many species result in detrimental effects to the ecosystem. Many people are not aware of the importance of wildlife since they think animals and plants have no influence on their lives. However, in fact, wildlife plays a key role in improving the quality of human life. For example, a variety of wildlife species can promote the biodiversity which benefits human health and maintains ecosystem's functionality. Furthermore, many medicines that address popular health issues such disorders and heart diseases are the productions of chemicals from plants and animals.
Since many species are at risk of extinction, it is extremely important to have appropriate conservation plans to protect our planet's ecosystem.
This project presents data mining on United States Wildlife to understand which regions need to be prioritized in conservation. In addition, the most endangered species family will be dicussed.
# import some libraries
from pandas import DataFrame
import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from functools import reduce
import plotly.graph_objects as go
import urllib.parse
from urllib.parse import urljoin
import re
from wordcloud import WordCloud
from PIL import Image
import squarify
import warnings
warnings.filterwarnings("ignore")
pd.set_option('precision', 0)
df = pd.read_csv('CleanWildlife.csv', '\t', index_col=0) # read csv file
df.head(10) # checking the first 10 rows in our dataset to ensure everything is fine
All columns are fine. The species without listing status and listing date are common in the USA. They may not face the risk of being extinct at the moment.
In this section, we will see the number of each species group in the United States
group = df.groupby("Taxonomic Group").size() # count the number of species by taxonomic group
group.sort_values(ascending = True, inplace = True) # ordering
count_group = pd.DataFrame({"Group":group.index, "Count":group.values}) # grouping the counts and group names in a df
plt.figure(figsize=(10,7)) # set figure size
# using bar plot to see the number of species in each taxonomic group
plt.barh(y=count_group.Group, width=count_group.Count, color ="dodgerblue")
plt.title('Number of species by group in the United States\n', fontsize ='x-large')
plt.ylabel('Taxonomic Group', fontsize ='large')
plt.xlabel('Number of species', fontsize='large')
plt.show()
As seen from the graph, the most popular taxonomic groups in United States are very familiar: plants, insects, snails, fishes, mammals and birds.
The group with least species are Hydroids and Annelid Worms. Hydroids are very small animals and most of them live in salt water such as marine environment. Annelid Worms adapt to various ecologies but their most living states are aslo marine environment. Another similarity between Hydroids and Annelid Worms is that both species are invertebrates.
Other groups with small number of species such as corals, sponges, conifers and cycads also prefere marine environments or seaside surfaces.
def count(data):
"""
Function to count the number of species in each state in the United States
data: dataframe that will be used to count
The function return a dataframe with two columns: state codes and number of species
"""
split_char = ","
state = [] # empty list to store states
for i in range (0,len(data)):
words = data.ix[i,"Region"] # get state codes
try:
temp = words.split(split_char) # try to split the string by "," each element in temp is a state code
for item in temp:
item = item.replace(" ","") # remove whitespace
state.append(item) #save the state codes into the empty list we created above
except: # if we cannot split, the region is NaN so we pass it
pass
codelist = pd.Series(state).drop_duplicates().tolist() #get a list of state codes in our dataframe
count = []
for item in codelist:
count.append(state.count(item)) # count the number of species in each state
# combine state code and counts of species into a dataframe
state_count = pd.concat((pd.DataFrame(codelist).T, pd.DataFrame(count).T)).T
state_count.columns = ("Code","No. of Species") # colnames
return state_count
count_species = count(df) # count the number of species by states
# use plotly to make Choropleth Map
# I learned from plotly: https://plot.ly/python/choropleth-maps/
fig = go.Figure(data=go.Choropleth(
locations=count_species["Code"],
z = count_species["No. of Species"].astype(float),
locationmode = 'USA-states',
colorscale = "blues",
colorbar_title = "Number of Species",
marker_line_color = "white"
))
fig.update_layout(
title_text = "Species Distribution by U.S States",
geo_scope='usa')
The region that have most diverse ecosystem is Carlifornia, and Hawaii is ranked second.
df_end = df[df.Status == "Endangered"] # subset a dataframe with endangered species
df_end.index = range(len(df_end.index)) # reset the index
df_end.head(10) # have a look at the data frame
count_species_end = count(df_end) # count the number of endangered species by states
# use plotly to make Choropleth Map
fig = go.Figure(data=go.Choropleth(
locations=count_species_end["Code"],
z = count_species_end["No. of Species"].astype(float),
locationmode = 'USA-states',
colorscale = "portland",
colorbar_title = "Number of Species",
marker_line_color = "white"
))
fig.update_layout(
title_text = "Endangered Species Distribution by U.S States",
geo_scope='usa')
It is interesting that the place with most endangered species is Hawaii and California is at the second place. The taxonomic groups with least numbers of species may struggle to survive and grow, thus, they would be very rare and be potentially listed as endangered. In the previous section about taxonomic group, we see that these groups prefer marine environments. Therefore, Hawaii which is surrounded by seas would be the best home for them and has more endangered species than California.
From this section, we can conclude that Califorina and Hawaii should receive wildlife conservation priority.
In this section, we will see which families have highes numbers of endangered species.
#retrieve all the words in column Family and save into string
string = " ".join(fname for fname in df.Family)
# Create and generate a word cloud image
wordcloud = WordCloud(collocations=False, width=3000, height=2000) # collocations = False to avoid double words
wordcloud.generate(string)
# Display the generated image
fig = plt.figure(figsize = (40, 30), facecolor = 'k', edgecolor = 'k')
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis("off")
plt.show()
The family with most species is Asteraceae which is a sunflower family. Some families with high number of species such as Fabaceae, Brassicaceae, Scrophulariaceae and Polygonaceae are also flowing plants.
In terms of animals, invertebrate families have the most variety in species. As we can see from the wordcloud picture, some examples of animal families are Hydrobiidae (snails), Unionidae (mussels) and Carabidae (ground beetles). In vertebrates, Cyprinidae (a family of freshwater fishes) is the family with the highest number of species.
string = " ".join(fname for fname in df_end.Family)
# Create and generate a word cloud image
wordcloud = WordCloud(collocations=False, width=3000, height=2000)
wordcloud.generate(string)
# Display the generated image
fig = plt.figure(figsize = (40, 30), facecolor = 'k', edgecolor = 'k')
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis("off")
plt.show()
Plant famliy with most endangered species is Asteraceae. Asteraceae not only has the highest number of species in the United States but also is the plant family that requires conservation priority.
Although Unionidae does not have as many species as Asteraceae, this invertebrate family has a comparable number of endangered species to Asteraceae. We can also see that Cyprinidae and Hydrobiidae are the second and the third animal families with most number of endangered species, respectively. The reason would be that animals such as fishes, mussels and snails are often used to feed other types of animals, and many of them possess useful ingredients for human health. Therefore, they are often the targets of fishers.
The analysis from this section suggests that we need to have appropriate policies to regulate people in fishing animals to prevent some underwater animal families from extinction.
In this final section, I will do little analysis about the research of U.S Fish and Wildlife Serivice which is the organization I scraped my data from.
df_listed = df[df.Status != "Not Listed"]
status = df.groupby("Status").size()
status.sort_values(ascending = True, inplace = True)
count_status = pd.DataFrame({"Status":status.index, "Count":status.values})
size = status*100/status.values.sum()
labels = count_status["Status"].tolist()
colors = ['yellow','yellowgreen','darkturquoise','violet','forestgreen',
'chocolate','mediumspringgreen','dodgerblue','gold','orangered','lavender']
status.plot.pie(shadow = False, labels = None, radius = 1, colors=colors,
wedgeprops={"edgecolor":"gray",'linewidth': 0.5, 'antialiased': True}, figsize = (10,10))
plt.title('Status Distribution', fontsize = 'xx-large')
plt.xlabel('')
plt.ylabel('')
plt.legend(bbox_to_anchor = (1,1), loc = 'best', labels=['%s, %1.2f %%' % (l, s) for l, s in zip(labels, size)])
plt.show()
According to U.S Wildlife and Fish Service, nearly 15% of the species in the United States are endangered. It is a high proportion and we need to have conservation plans to prevent further issues to the ecosystem.
date = df.groupby("First Listed").size()
year =[]
for i in range(1970,2025,5):
year.append(i)
year = df["First Listed"].drop_duplicates().dropna()
year = year.tolist()
plt.figure(figsize=(20,10))
date.plot(linewidth = 2)
plt.xticks(year, rotation = 30, ha = 'right')
plt.title('Number of Species Found per Year\n', fontsize ='xx-large')
plt.xlabel('Year', fontsize ='x-large')
plt.ylabel('Number of Species', fontsize ='x-large')
plt.show()
df_date = df[df["First Listed"].notnull()] #save the species that have listing dates into a dataframe
df_date["First Listed"] = df_date["First Listed"].astype(int) #make sure year column is shown as int
#plot
plt.figure(figsize=(20,10))
sns.countplot(x='First Listed', hue='Type', data=df_date)
plt.xticks(rotation = 30, ha = 'right')
plt.title('Number of Species Found per Year\n', fontsize ='xx-large')
plt.xlabel('Year', fontsize ='x-large')
plt.ylabel('Number of Species', fontsize ='x-large')
plt.legend(bbox_to_anchor = (0,1))
During 1970s, there were many vertebrate species found in the United States. From the end of 1970s to 2000, more flowering plants species were found and successfully grown in the USA. In recent years, there have been fewer species discovered and most of them are animals.
colors = ['crimson','darkturquoise','violet','orangered',
'gold','mediumspringgreen','dodgerblue']
date_st = df_date["Status"].value_counts() #count the number of listed species by status
date_st = pd.DataFrame({"Status":date_st.index, "Count":date_st.values}) #convert the series into a dataframe
labels = ["Endangered", "Threatened", "Recovery", "Not Listed","Extinction","Threatened*","Review"]
# create plot using squarify
plt.figure(figsize = (20,10))
squarify.plot(sizes=date_st["Count"], label= labels, alpha=.75, color = colors, text_kwargs={'fontsize':15})
plt.axis('off')
plt.title("Listed Species Status Distribution\n ",fontsize=20)
plt.show()
As seen from the graph, the majority of listed species are endangered and threatened, which is in our expectation. These listed species require long time research to be indentified, hence, they were added to the U.S Wildlife list later than others. Some species without listing dates may be foreign species that United States import from other countries and regions.
In summary, Hawaii and California are two states with most numbers of endangered species. Thus, there should be conservation priorities in these regions to maintain their biodiversity as well as prevent plants and animals living there from extinction. Another remarkable finding is the alarming condition of underwater families such as Unionidae (mussels), Hydrobiidae (snails) and Cyprinidae (freshwater fishes). Since they are favorite target of fishers, their population is dwindling and some species in these families are at risk of extinction. We need appropriate regulation about fishing to protect them.