csrins

Student. Teacher. Softsmith.

Archive for August, 2006

PCLinuxOS Magazine Inaugural Issue Released

Posted by csrins on August 31, 2006

The September 2006 introductory issue of PCLinuxOS Magazine is now available!!!

Further information available at mag.mypclinuxos.com

How to get it?

  • Download it from mag.mypclinuxos.com
  • Use Synaptic in PCLinuxOS to install the readable version.

Creative Commons License
This post is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 2.5 License.

Posted in Distros, GNU/Linux, PCLinuxOS | Leave a Comment »

TQM [Total Quality MIS-Management]

Posted by csrins on August 30, 2006

openSUSE 10.1 is a good release with a showstopper bug. The package manager bug is not resolved in a satisfactory manner, and woe be on the poor soul who tries to follow the instructions on the opensuse site. S/he will very likely find her/him/self in the position of the hapless washerperson in the story of the donkey, the village washerman, and the villagers.After sucking in almost a 100 MB worth of data, the recommended procedure doesn’t yield any results whatsoever. You’re much better off following the instructions here and here too

To their credit, the openSUSE download page contains a BEFORE YOU INSTALL READ THIS: set of instructions. But it’s more like a try this or that or maybe that or … wait… try something else kind of adventure. What the good doctor ordered was a lucid explanation, not a plethora of experiences.

However, what would have been exemplary was an add-on download to rectify the package installer problem to accompany new installs. If there’s not too little documentation, then there’s too much noise. Either is not a good thing.

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 2.5 License.

Posted in Distros, GNU/Linux, Opinion, Reviews, SUSE | Leave a Comment »

Data Warehousing and Mining – Unit 8 – Review Questions

Posted by csrins on August 30, 2006

Data Warehousing and Mining – Unit 8 – Core Topics

  1. Explain the concepts of classification, estimation, and prediction with suitable examples.
  2. Define a classification problem. Explain the two-phase implementation of the classification problem.
  3. Discuss the three methods used to solve classification problems.
  4. What is overfitting? Why should one watch out for overfitting in classification?
  5. Explain the categorization of classification algorithms.
  6. What is meant by missing data? How does it affect the classification problem during the training phase and the classification process?
  7. Explain the different metrics for measuring the performance of classification tools. How can classification performance be compared to information retrieval?
  8. Write a short note on the operating characteristic curve and confusion matrix.
  9. What is regression? Explain poor fit for linear regression.
  10. Explain in brief the two approaches to the regression classification problem.
  11. Compare and contrast the division and prediction approaches to regression.
  12. Explain nonlinear and logistic regression. What can you say about the probability of observing given values in a class?
  13. Write a short note on Bayesian classification.
  14. Discuss the appropriateness of similarity measures for classification.
  15. Define the classification problem for distance based algorithms.
  16. Present and discuss a simple distance-based algorithm for classification.
  17. Present an algorithm for and explain classification using k-nearest neighbors.
  18. Define a decision tree in the context of the classification problem.
  19. What are the advantages and disadvantages of decision trees for classification?
  20. Present and explain a simple algorithm for the naive approach to building a decision tree.
  21. Discuss in brief the different issues faced by decision tree algorithms.
  22. What factors determine time and space complexity of decision tree algorithms?
  23. Write a short note on the ID3 technique to building a decision tree.
  24. Write a short note on the C4.5 algorithm.
  25. Write a short note on the C5.0algorithm.
  26. Write a short note on the CART algorithm.
  27. Write a short note on the SPRINT algorithm.
  28. Explain in brief the different steps involved in solving classification problems using neural nets.
  29. What issues should be considered in using neural networks for classification problems?
  30. Discuss the relative merits and demerits of neural networks for classification.
  31. Present and explain an algorithm illustrating propagation of a tuple.
  32. Present and explain an algorithm for NN learning.
  33. Present and explain an algorithm illustrating the principle of backpropagation.
  34. Present and explain the algorithm for incremental gradient descent.
  35. Write a short note on radial basis function.
  36. What are perceptrons? Explain in brief.
  37. Explain the principle behind rule-based algorithms for classification.
  38. Present and explain a simple algorithm to generate classification rules from a decision tree.
  39. Present and explain a simple algorithm to generate classification rules from a neural net.
  40. Present and explain the 1R algorithm.
  41. Present and explain the PRISM algorithm.
  42. What is meant by combination of multiple classifiers? Explain in brief.

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 2.5 License.

Posted in Data Mining, Data Warehousing and Mining, Education, M.Sc. Computer Science | Leave a Comment »

GNU/Linux Server Distributions

Posted by csrins on August 30, 2006

Trying to learn/setup a server using GNU/Linux distributions is an educational and fun exercise, if not compellingly required for any particular commercial solution.

So, what are a few distributions?

Heading over to www.distrowatch.com is a good starting point.

Or to give a short list here (in no particular order of preference):

  1. The Community ENTerprise Operating System
  2. Ubuntu 6.06 LTS Server

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 2.5 License.

Posted in CentOS, Distros, GNU/Linux, Ubuntu | Leave a Comment »

Data Warehousing and Mining – Unit 7 – Review Questions

Posted by csrins on August 25, 2006

Data Warehousing and Mining – Unit 7 – Introduction to Data Mining

  1. What is data mining? How does data mining differ from traditional database access?
  2. Discuss in brief the characterization of data mnining algorithms.
  3. Briefly explain the various tasks in data mining.
  4. What is Knowledge Discovery in Databases? How does it relate to data mining?
  5. Write a short note on the KDD process.
  6. What is visualization? Discuss in brief the different visualization techniques.
  7. Discuss the evolution of data mining as a confluence of disciplines.
  8. What issues should be addressed by data mining algorithms and products? Why are they relevant?
  9. Discuss the need for metrics in data mining.
  10. Data mining can often have far reaching social implications. Discuss this statement.
  11. Discuss in brief important implementation issues in data mining.
  12. Distinguish between the KDD process and data mining.
  13. Discuss how database and OLTP systems are related to data mining.
  14. Write a short note on the ER model. What advantage does it offer over a trivial DBMS?
  15. Define a KDD object. What is its relation to a data mining operator?
  16. Explain the notion of set membership. Distinguish between fuzzy and traditional set membership.
  17. Explain the concept of fuzzy classification.
  18. What is Information Retrieval? How can the effectiveness of an IR system be measured?
  19. Explain the concepts of precision and recall for an information retrieval system.
  20. Write a short note on decision support systems.
  21. Write a short note on dimensional modeling. Explain the concept of aggregation hierarchies.
  22. Write a short note on the star schema. Explain with a suitable example.
  23. What is snowflaking?
  24. Write a short note on indexing. Distinguish between bitmap indices and join indices.
  25. What is a data warehouse? How does it relate to data mining?
  26. Write a short note on OLAP systems.
  27. Explain in brief the OLAP operations: slice, dice, roll up, drill down, visualization.
  28. What are the drawbacks of conventional web search engines?
  29. Distinguish between statistical inference and exploratory data analysis.
  30. Write a short note on machine learning. What is supervised and unsupervised learning?
  31. Explain the concept of pattern matching.
  32. Explain the relationship between a fuzzy set membership function and classification using the problem of assigning grades to students in classes where outliers exist.
  33. Distinguish between parametric and nonparametric models.
  34. Define the following terms:
    1. point estimation
    2. bias of an estimator
    3. unbiased estimator
    4. mean squared error
    5. squared error
    6. confidence interval
    7. root mean square
    8. root mean square error
    9. jackknife estimate
    10. maximum likelihood estimate
  35. Present the Expectation Maximization (EM) algorithm utilizing the Maximuj Likelihood Estimate.
  36. Write a short note on models based on summarization.
  37. Define and explain Bayes Theorem.
  38. What is hypothesis testing? Explain the concepts of null hypothesis and alternative hypothesis.
  39. Write a short note on regression and correlation.
  40. Define similarity. List the commonly used similarity measures.
  41. What is meant by distance or dissimilarity measures? Elaborate.
  42. Define a decision tree and a decision tree model.
  43. Present and discuss a prediction technique using decision trees.
  44. Define the following terms:
    1. neural network
    2. neural network model
    3. artificial neural network
    4. activation function
  45. What is an activation function? Explain the different types of activation functions.
  46. Write a short note on genetic algorithms.
  47. Define a genetic algorithm. Present and explain a genetic algorithm.
  48. Discuss the merits and demerits of genetic algorithms.
  49. Given the following set of vlaues {1,3,9,15,20}, determine the jackknife estimate for both the mean and standard deviation of the mean.

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 2.5 License.

Posted in Data Mining, Data Warehousing and Mining, Education, M.Sc. Computer Science | Leave a Comment »