CPS4921 Knowledge Discovery and Data Mining

CPS4921, 5921-01/Spring 2019 Knowledge Discovery and Data Mining Homework 2 Dr. Huang

• This is an individual homework

• Write programs to solve the following problems. Your results should be accessible through the web page

at http://eve.kean.edu/~xxxx/CPS4921/HW2.html or http://eve.kean.edu/~xxxx/CPS5921/HW2.html

• Please set permission mode to 705 for all your files (PHP, HTML, Python, etc).

• You need to submit the homework through the class website. http://imc.kean.edu/students

• You cannot hardcode the output. Your programs should read the input data from the database, do the

calculation and display results on the browser.

• You can use Google Charts, JpGraph or other graph library.

• Questions are based the tables in datamining database. Only use the valid records - ZipcodeType is

'STANDARD', LocationType='PRIMARY' and EstimatedPopulation cannot be null.

1. Similarity (population + AvgWage) between states using distance methods based on your State_info

table. Convert your results into a 2-D plots. X axis is the population and a unit is 1 million. Y axis is the

average income and a unit is $5,000. Write a program to compute and display the following distances

in matrix format between the data points, and the numbers should be aligned

1.1. ________ (10 points) Based on the table Zipcode_info , create a table State_info with state,

average wage, and total population in your own database. Your table should have 51 rows: 50

states + DC. Display your State_info result in a TABLE format on the browser.

1.2. ________ (10 points) the Euclidean distance

1.3. ________ (10 points) the City block distance

1.4. ________ (5 points) the Supernum distance

1.5. ________ (5 points) Which two states are the most similar? Please explain your conclusion.

2. Line chart on the browser and correlations: Based the close price in Historical_prices table with

symbol='IBM' and 'MMM' and date >= '1990-01-01' and date <='2012-12-31', and GDP from the

table and column US_GDP_quarter (change_chained) in the datamining database

2.1. ________ (10 points) Use a SQL statement to create a view vIBM_MMM (year, month, IBM_price,

MMM_price) that contains the year and month, average close prices of IBM and MMM for the

year/month. Your view should directly access the Historial_price table to have the latest data.

2.2. ________ (10 points) Use Google Chart or other JavaScript package to display the data in a line

chart format. There should be 3 lines on a single chart – IBM and MMM prices, and the GDP. The

1

st and 2nd line are IBM and MMM stock price, respectively. The 3rd line is the GPD line.

2.3. ________ (5 points) The chart should have ticks, labels to indicate the chars and its size should be

500 (W) x 300 (H) pixels.

2.4. ________ (10 points) Find the local maximum and local minimum values with window size 5 (2 at

each side) for each line - MMM and IBM stock prices, and GDP.

2.5. ________ (10 points) Detect and list the periods (with start and end date) of the bull market (“up”

trend) and the bear market (“down” trend) for each line that each might have several bull and

bear markets.

2.6. ________ (10 points) Your programs should read the data from the US_GDP_quarter table in the

database and automatically calculate and display the 3 correlation values: IBM vs. MMM, IBM vs

GDP, MMM vs GDP.

2.7. ________ (5 points) Please explain your conclusion and reason under the chart whether we can

use IBM and MMM stock prices as the recession indicator or not. Please refer to the following

articles to see when the recessions were.

? GDP line chart: https://datahub.io/core/gdp-us

? Recessions: https://fred.stlouisfed.org/series/GDP

? Study: https://seekingalpha.com/article/4008116-stock-market-leading-recession-indicator

Q2

Lab 3, Assignment 1, and Question 6, where a Python class for Vehicle was created. Include to the class attributes kilometers per liter (distance travelled in one liter of fuel) and capacity of the fuel tank (amount of fuel in the tank). The class should also contain methods to ask the user for the capacity of the fuel tank and the kilometers per liter. Include a method to calculate the distance that the car can travel. Create a separate test module where instances of the class are created, and the methods are tested. Create 5 instances of this class and display the distances each car can travel in descending order.

Answer Detail

Get This Answer

Invite Tutor