CPS4921, 5921-01/Spring 2019 Knowledge Discovery and Data Mining Homework 2 Dr. Huang
• This is an individual homework
• Write programs to solve the following problems. Your results should be accessible through the web page
at http://eve.kean.edu/~xxxx/CPS4921/HW2.html or http://eve.kean.edu/~xxxx/CPS5921/HW2.html
• Please set permission mode to 705 for all your files (PHP, HTML, Python, etc).
• You need to submit the homework through the class website. http://imc.kean.edu/students
• You cannot hardcode the output. Your programs should read the input data from the database, do the
calculation and display results on the browser.
• You can use Google Charts, JpGraph or other graph library.
• Questions are based the tables in datamining database. Only use the valid records - ZipcodeType is
'STANDARD', LocationType='PRIMARY' and EstimatedPopulation cannot be null.
1. Similarity (population + AvgWage) between states using distance methods based on your State_info
table. Convert your results into a 2-D plots. X axis is the population and a unit is 1 million. Y axis is the
average income and a unit is $5,000. Write a program to compute and display the following distances
in matrix format between the data points, and the numbers should be aligned
1.1. ________ (10 points) Based on the table Zipcode_info , create a table State_info with state,
average wage, and total population in your own database. Your table should have 51 rows: 50
states + DC. Display your State_info result in a TABLE format on the browser.
1.2. ________ (10 points) the Euclidean distance
1.3. ________ (10 points) the City block distance
1.4. ________ (5 points) the Supernum distance
1.5. ________ (5 points) Which two states are the most similar? Please explain your conclusion.
2. Line chart on the browser and correlations: Based the close price in Historical_prices table with
symbol='IBM' and 'MMM' and date >= '1990-01-01' and date <='2012-12-31', and GDP from the
table and column US_GDP_quarter (change_chained) in the datamining database
2.1. ________ (10 points) Use a SQL statement to create a view vIBM_MMM (year, month, IBM_price,
MMM_price) that contains the year and month, average close prices of IBM and MMM for the
year/month. Your view should directly access the Historial_price table to have the latest data.
2.2. ________ (10 points) Use Google Chart or other JavaScript package to display the data in a line
chart format. There should be 3 lines on a single chart – IBM and MMM prices, and the GDP. The
1
st and 2nd line are IBM and MMM stock price, respectively. The 3rd line is the GPD line.
2.3. ________ (5 points) The chart should have ticks, labels to indicate the chars and its size should be
500 (W) x 300 (H) pixels.
2.4. ________ (10 points) Find the local maximum and local minimum values with window size 5 (2 at
each side) for each line - MMM and IBM stock prices, and GDP.
2.5. ________ (10 points) Detect and list the periods (with start and end date) of the bull market (“up”
trend) and the bear market (“down” trend) for each line that each might have several bull and
bear markets.
2.6. ________ (10 points) Your programs should read the data from the US_GDP_quarter table in the
database and automatically calculate and display the 3 correlation values: IBM vs. MMM, IBM vs
GDP, MMM vs GDP.
2.7. ________ (5 points) Please explain your conclusion and reason under the chart whether we can
use IBM and MMM stock prices as the recession indicator or not. Please refer to the following
articles to see when the recessions were.
? GDP line chart: https://datahub.io/core/gdp-us
? Recessions: https://fred.stlouisfed.org/series/GDP
? Study: https://seekingalpha.com/article/4008116-stock-market-leading-recession-indicator
Q2
Lab 3, Assignment 1, and Question 6, where a Python class for Vehicle was created. Include to the class attributes kilometers per liter (distance travelled in one liter of fuel) and capacity of the fuel tank (amount of fuel in the tank). The class should also contain methods to ask the user for the capacity of the fuel tank and the kilometers per liter. Include a method to calculate the distance that the car can travel. Create a separate test module where instances of the class are created, and the methods are tested. Create 5 instances of this class and display the distances each car can travel in descending order.