CAB432 Cloud Computing Cloud Project Specification

Java Programming Assignment Help, Java Homework Help, Java Questions

CAB432 Cloud Computing CloudProject Specification

Note:

The due date for this assignment is in week 14 of semester. Later that week or during the exam period - by arrangement - we require that you meet with us to demonstrate your work. In particular, you must explain the scaling capabilities of your application, and answer questions as appropriate.

Introduction:

In the first assessment for this unit you were required to integrate a range of cloud based services within a docker container. These projects were of varying degrees of sophistication, but were tightly limited in that they did not require you to address scaling issues.

The focus of this assessment task is to explore a cloud application that will exercise ideas of scaling, load balancing and resource provisioning. Our approach is to build on the first assignment and the practical sessions. We do not require that you use your earlier experience as a mashup developer, but there is certainly scope for doing so. In earlier lectures, I indicated that we would mandate node.js as the webserver for this assessment. This remains the case unless you can make a very strong argument. Feel free to try, but do not work with another server without talking to me first. There will inevitably be variation in your architectures - and the amount of work actually done by node - depending on your choice of scenario in any case. In most other respects you have complete flexibility.

There are several basic scenarios on offer, with variation in the manner in which the computational demand is created. In each case, we will expect that you will use some graphical library to help display the progress of your tasks - we have one in mind,https://d3js.org/,but others are available and some of the tasks may be self-documenting in this respect, notably in the case of ray tracing or similar. In some cases, such as the people density application, this might take the form of a map, regularly updated. In other cases, it might be more prosaic. And you are free to propose your own application, as long as there is sufficient computational load on offer.

The requirements are outlined in more detail below. This specification is not intended to be the final word on the data, analysis and display APIs. I have done a basic search to ensure some basic level of plausibility, and a number of these alternatives have been executed incredibly well in previous years. I expect that the class will share source materials without any particular fear, as there is plenty of work to go around if you wish to differentiate the assignments. For now, let us consider some of the key aspects of the assignment.

Scaling:

The crucial aspect of the assignment is that you demonstrate the elastic scalability of your application. The approach must scale robustly in response to the variations in demand inherent in your problem. Computational demand might vary based on:

The number of concurrent "queries" being processed across a Twitter feed

[Ideally the variation in the flow of a twitter feed too, but this is far less reliable]

The number of cameras being considered in a webcam application

The resolution in a rendering application.

The number of requests to a server as generated by a massive number of interested customers (or as simulated by a load testing application)

Some other data feed or computational task as suggested by the students

As you have now understood from the lectures and prac exercises, the game here is to generate instances sufficient to cover the expected load and to fire up others or terminate those which exist whenever it is necessary to maintain responsiveness.

There must be performance -related KPIs for your application, and once the demand is such that the current number of application server instances fails to meet these requirements, you must add further instances to overcome the violations and restore compliance. As you write your code, we expect that you will approach this in stages, regardless of your application. Some of these may blur, but don’t be in any great hurry:

1. Single instance - no scaling

2. Multiple instance - no scaling

3. Multiple instance - manual scaling

4. Multiple instance - automated scaling

You have seen also how to create an instance and configure it for your needs, and then to save it as an image that can be used to create other instances. This is the approach you must take. Note that it will not be possible to obtain full marks if item 4 is not completed successfully. Each of the clouds available have APIs which support automated monitoring, creation and destruction of instances.

Some Scenarios:

We will give some basic ideas on the scenarios that we are supporting. Note that the last of these is open- ended in any case, and you should feel free to propose anything that fits the guidelines, as long as you have it approved by us. Note that the first task has been the default for some years now, and has now been used with some success - and occasionally with more limited success - since 2013. As discussed elsewhere, we have major concerns at the moment over its viability, and we will only recommend it if we can establish a reliable data source, probably based on the archive at QUT’s Digital Observatory: https://www.qut.edu.au/institute-for-future-environments/facilities/digital-observatory.

However, the description gives something of the flavour and we anticipate that you can use it to inform your own proposals and also to flesh out some of the others.

Scenario 1: Twitter [TO BE CONFIRMED]: The task will involve tracking a selection of search terms and performing a range of analytics on top of the feed. This will include additional term extraction (using an NLP library such as NLTK or natural

( https://github.com/NaturalNode/natural ) and sentiment analysis (see http://www.laurentluce.com/posts/twitter- sentiment-analysis-using-python- and-nltk/ for a basic intro to this). In this case, the scaling comes from increasing message volume as you broaden the range of terms in the search. Note also that computational load can be added readily by increasing the comparisons undertaken, using semantic searches such as the interface to wordnet, and of course correlating events as described below.

Social platforms like Twitter attract a large numbers of users who use these platforms (among other things) to exchange and broadcast messages (text, links, photos, videos, etc.). As a result, these platforms produce a huge number of events, originating from their collective user base. Making use of this data beyond the core social networking purpose has been the subject of a number of studies and commercial projects. For instance, disaster management applications may use social network feeds to contextualise the situation. That is, citizens "report" relevant incidents (like flooded roads, fallen-over trees, power outages, fires, etc.) on social networks, which are then filtered and matched against other entities such as places (i.e., geographical location), incident categories (e.g., fire, flooding, etc.) and incident severity.

One cautionary tale here lies in dependence on location tagging. Forget it - it will never work as not enough people geotag their tweets. You can, however, do location identification based on the text itself, with mentions of New York or London or wherever, but you should not depend on any sort of geotagged identities.

The other caution lies in the selection of libraries. Students used natural in previous years without any great ill effect, but the project does appear fairly static, so please let us know if there are major issues. It is possible to call NLTK (python) from node, or to provide it as a service - http://ianhinsdale.com/code/2013/12/08/communicating-between-nodejs-and- python/ provides a brief (and now very old) guide and I am sure that there are others. But there are complications here better avoided if one can.

A slightly more formal description follows:

Scenario 1 is a simple Cloud-based query processor based on Twitter messages. Its core principle is to let its users enter "queries", specifying multiple "hash tags" (i.e., keywords that are manually emphasised by the authors of Twitter users). Once submitted to the application, a query may become a "live filter" applied against the inflow of Twitter messages. Any message which passes through the filter defined by the query (i.e., contains the stated hash tags and comes from the specified region) shall then be displayed on the screen. A query shall remain "active" (i.e., continue to filter incoming messages) until it is manually revoked by the end user of the application.

We here require that you use two or more APIs to ‘do some work’ on the results. Ideally we would like you to use NLTK or a similar toolkit to extract fresh entities from the related tweets, and also to perform sentiment analysis on the tweets provided. The links above and below some idea of how to do this sort of work. The application itself may be a simple page web site. You may base this on whatever technology you wish, but Twitter Bootstrap is a useful default. You will most likely use a JS visualisation library such as d3js to show the changes in sentiment, topic and tag clouds.

Some sources:

The primary data source is (or at least was) of course the Twitter public REST API V1.1 which may be found here: https://dev.twitter.com/rest/public

The NLTK may be found here:http://www.nltk.org/

Natural may be found here: https://github.com/NaturalNode/natural

NLP-Compromise is here: https://github.com/nlp-compromise/nlp_compromise

Twitter bootstrap may be found here:http://getbootstrap.com/

The d3js libraries may be found here:http://d3js.org/ and you may use the tutorials there or at https://www.dashingd3js.com

Note that if you see the wrapper for the Stanford core it is probably not you want here - that is for more sophisticated NLP research and processing, but of course if you are interested and can master it in time, please feel free to work with it.

Scenario 2: Webcams: There are a large number of public webcams available on the net. In this scenario, you are to capture the feed of a set of them, and use a computer vision library such as http://opencv.org/ to count the number of faces in a frame. Now this will take some tweaking to be plausible, but it is evidently not too tough in principle and probably 10 or 12 groups have done this one over the past few years. There is a reasonable discussion here: http://answers.opencv.org/question/2702/simply-count-number-of-faces/, and we would expect you to have a map showing the people density at places monitored by the webcams. This will provide some decent processing, and we may scale up based on increasing the number of data sources.

Our requirements are really little more specific than stated above. This is all about the scaling. But please be aware that libraries such as openCV do have their own overheads and you do not want to get hung up on those at the expense of the mark-laden cloud-related work of the assignment. There is also some variation in the quality of the language bindings for some of these libraries and services. That community is *strongly* C++ dominated, and other libraries and adaptations seldom have the same level of maturity and active maintenance associated with the core work. That said, I have capstone students working with this in Python very successfully. Either way, this is something also to bear in mind. In other words, have a decent look at things first before committing, and make sure that your programming expertise matches the available libraries.

There are quite a number of open (or at least fairly open) web cam APIs, though you may need to use a lot of them.

Resources:

QLD Traffic APIs: https://data.qld.gov.au/dataset/131940-traffic-and-travel-information- geojson-api

Also search on “public webcam feeds” but please use only those likely to comply with the QUT ITS usage rules…

The OpenCV project: http://opencv.org/ - this offers bindings for Java and Python.

As before, use d3js or similar to display the results - there is a map library available.

Scenario 3 and 4: Rendering; Open Proposals: This first of these is based on rendering a graphical image at a variety of different scales. Most, we expect, would rely on an open engine such as Blender. However, we do not have a clear example that would be really exciting and so are suggesting that people with experience in this domain should make their own proposal. We may similarly offer some large scale image analysis sets that can fit the bill neatly while doing some useful computation for colleagues. Scenario 4 is entirely open ended - you propose something, and we agree or we do not. We are very happy for people to consider proposals based on the open data sets available as part of Gov Hack and other initiatives. This is very strongly encouraged if it can be made to suit the assignment. The portal is here:

http://portal.govhack.org/datasets.html

However, be aware that these are often static, and your task may be complicated by the need to work out how to scale things. More queries? More analysis? Possibly, but you need to think this through, propose it and have it reality checked. You also need to think very carefully about the architecture: where is the computation pool? Where is the load balancer? What triggers scaling? Do I use a queuing system?

For each of these, please give us a rough outline as soon as possible, and certainly before Tuesday, October 9 or early in Week 11. We can refine this as needed.

We now consider the overall structure of the assignment.

Some Basics:

The Key Learning Objectives of this assignment lie in this elastic scalability of the process and the tasks are set up to support this objective. While we are interested in the creativity of the applications, and encourage this, don’t get carried away. Look at where the marks lie.

The following guidelines apply:

The application itself will usually appear as a simple single page web site. You may base this on whatever technology you wish, but Twitter Bootstrap (http://getbootstrap.com/) is a useful default. The site must be flexible enough and architected sensibly enough to support variations in scale and processed data

The application will in general be implemented on top of Cloud IaaS service (e.g., AWS EC2, Azure) where you manually install the application server (usually node.js).

You are free to use other services provided by the infrastructure (e.g., the load balancer, storage services, management services, etc.), but in general we don’t allow PaaS offerings as these do too much of the work for you.

Your application *must* be deployed to a scalable public cloud infrastructure, but you may choose which one to use.

There must be no cost to me or to QUT for use of your service - so if you wish to use an API that attracts a fee, you may do so, but it is totally at your expense.

The focus of the assignment is on scalability. In provisioning, therefore, you must choose the smallest available instances - micro instances in Amazon terminology, with similar configurations for Azure - and launch as many of these as are needed to make the application cope with the demand. DO NOT under any circumstances provision a massive scale individual server instance. This is expensive and completely defeats the purpose of the exercise.

Persistence is a required aspect of the assignment but it is not necessary to store the entire history of a dynamic application such as that based on Twitter. I expect that you will have sufficient storage to maintain a window of recent events. The storage chosen depends on the latency acceptable and this is something you must justify. But you really must use a persistence service of some kind and explain your choices.

YOU are responsible for switching off all instances at the end of each session so as not to attract usage charges.

On the client side, we expect that you will again use javascript, but you may choose others as you wish. You should base your work on a standard web page layout. You may find Twitter Bootstrap (http://getbootstrap.com/) a good starting point, or you may roll your own based on earlier sites that you have done, or through straightforward borrowing from free css sites available on the web. Your work will be marked down if it doesn’t look professional, but won’t attract fantastic extra marks for beauty. Simple and clean is fine. Cluttered pages with blinking text reminiscent of the 1990s are not.

Please get in touch if you need to clarify any of this.

The Process:

The cloud project is in general a paired assignment and I would expect that you will be forming pairs by early in the first week after the holidays if you have not done so already. I will assist with this process as needed through the #findmeapartner channel on Slack. Some people will inevitably wish to undertake the assignment individually. This will be considered on a case by case basis, but:

No special consideration will be given to people who do the assignment individually.

The specification is intended for two people, and this general scope will be the benchmark.

You must ask me about this ASAP and certainly before October 9.

There is a lot of flexibility in the specification, and we will again ensure that you are producing something that really is worth 50% of a 12 credit point unit. So there will be two checkpoints to make sure that you are on track, at which time you will be given clear and

unambiguous feedback on whether your proposal is up to scratch. The pre-submission requirements below should be seen as drafts of the final report to be submitted as part of the assessment. This is hard to manage online, so please bring a hardcopy to the pracs for quick feedback. This will help take the load off the emailed versions which become very hard to keep track of.

In respect of the application itself, the process must be broken down into defined stages.

While many are possible, we would see the main steps as follows:

1. Accessing and learning to use the relevant APIs

2. Simple deployment of a basic application to a single instance, supporting basic operations and reporting.

3. Adding other API work to the application - especially visualisation.

4. Exploring scaling behaviour manually - including deployment of additional instances.

5. Automating the scaling behaviour using service APIs

6. Managing overload using event queues or similar strategies.

There is some variation possible in this, but do NOT attempt to undertake these at once. In particular, the queuing could be considered earlier, but there are clearly defined partitions between single instance deployment and then manual and ultimately automated scaling.

Lost Password