DQ1 Data Mining, Time Complexity, and Algorithms
In Appendix C, you read about different programming control structures used to write pseudocode and actual computer algorithms, such as if statements, while and for loops, and function calls. For this discussion, assume you work for a data mining company and your job is to write a program to find information on various Web sites pertaining to sales of the Lenovo X200. After your algorithm finds this data, more complex analysis will be done to extract more meaningful information from the data.
Your algorithm is going to scan different sites and search for the character string “Lenovo X200.” Assume you decide to use an algorithm similar to Text Search (see algorithm 4.2.1 on page 178 of your text for an explanation of what this is). If the algorithm finds a site that contains the string (that is, Lenovo X200), assume that it then stores all data or all the text on that particular site into a storage area.
To understand this problem fully, answer the following questions:
What is data mining?
What is a character string?
What is the worst case run time of this algorithm in terms of p, m, t, n (that is, what is O)?
How long do you think it will take this algorithm to run? Note the time complexity as O (run time in terms of n).
Assume that each Web site, on average, has character strings of length 10,000 and that the length of the character string “Lenovo X200” is 11. How many computations will the algorithm need to make per site?
Why is speed and the analysis of algorithm speed so important?
Review the Discussion Participation Scoring Guide prior to posting.
Read your peers’ posts and respond to two. Did you arrive at the same time complexity calculation? Explain why or why not.
DQ2 Practice Problem Set Review
This discussion allows you to work with your peers to complete and understand the assigned problem set for this unit. Remember, two initial posts and two response posts are required. Further posts are optional and recommended:
For the first post, select a problem from this unit’s problem set, write it out fully, solve it fully, and post it.
The second post can be a problem you cannot solve or another fully solved problem from the problem set. If there is a problem in the problem set that you are confused about, write it out fully and show any work that you have started. Note that you are stuck and ask for help. If you are able to solve every problem in the set without difficulty, post at least one other fully solved problem from the set as your second post.
The third and fourth posts are responses. Response guidelines are provided below and in every discussion.
Take advantage of this discussion area to work together as a class on the problem set. Post as many problems as you can and review as many of your peers’ posts as you can. Ask questions. Offer answers. If you are stuck on a problem, post your question. Working as a team will help each person gain a better understanding of the problem set and the concepts covered in this unit. Although the problem sets are not graded they will help you prepare for the quizzes in this course.
The third and fourth posts, the response posts, are your chance to help your peers. Explore the posts made by your peers and find people who are stuck. Help at least two peers through the problems they are stuck on. If you are unable to locate a peer who is stuck, you may instead choose a peer post, write out your solution to the same problem, and compare solutions and methods. The goal of this area is for the entire class to work together as a group on the entire problem set. You are also expected to complete and understand the problem set on your own in preparation for the quizzes and final exam.