Index - Computer Science & AI

November 30, 2012

2 Game Theory meanderings

1. What would "cheating" look like in Cooperative Game Theory?

2. "Mechanism design" is sometimes called reverse game theory (see WIkipedia http://en.wikipedia.org/wiki/Mechanism_design) Could this help my earlier policy design problems?

OK, and a 3rd.... when an agent needs to change their strategy (maybe due to a phase transition caused by the 80/20rule) then pattern matching is involved because this is what helps the agent know when to change, when they detect the new pattern. It's all coming together!!!

Posted by Frozone Permalink on November 30, 2012 10:32 AM | Comments (0)
categorized under Computer Science & AI




September 17, 2011

Cookie statistical distributions

Has this ever happened to you: You need to perform a certain function, and you assume a tool exists to do it for you. After all, it is a fairly obvious operation. But then you start looking around, and it seems like nobody knows what you are talking about!

My last entry was about the likeness matrix. I want to be able to say, "Give me a statistical distribution of the people who are like Christy. That way, I can take the people who are in the highest standard deviation of those most like her. Or, I could take the lowest ones at the other extreme standard deviation and have the people most UNLIKE her. I want to examine the distribution to figure out if Christy is a generally "a-likeable" person or if she is kind of an oddball and there are very few people."

Obviously, I don't have very much experince with statistics. This ability to take objects associated with certain points in the curve is not easy to do. I can't find any tools that let me do this!

Posted by Frozone Permalink on September 17, 2011 01:46 PM | Comments (0)
categorized under Computer Science & AI




September 15, 2011

Collaborative filtering and frequentist statistics

I am trying to wrap my head around the various methods of making an inference and each's appropriateness to my own field.

How do frequentist statistics relate to collaborative filtering? In collaborative filetering, you *need* data from others in order to draw out your results. I guess in the case of collaborative filtering, the "evidence" that you are drawing from is usage evidence. In some cases, like movie recommendations, it's people's ratings. That's not really observation at all, is it? It's a set of preferences.

Anyway, I do have to acknowledge that in my work I need a certain critical amount of data before the output can be expected to be useful. Right? Or, can my system work without data?

I think in my model I just have to realize that there are different phases in the system.... at the start, there is not much data, but over time it can accumulate some. And maybe for 1 learning object, the fact that there is no data might not be significant at all, because there is enough stuff out on the internet that 1 measly little resource doesn't make a difference, and we can guarantee that it will pick up data at some point anyway.

Posted by Frozone Permalink on September 15, 2011 11:10 AM | Comments (0)
categorized under Computer Science & AI




August 13, 2011

"2 to the n" vs nCr

vs.

Recently (see: Compare each item to Every Other), I figured out that if you have a set, for example, S = {a, b, c, d, e, f}, and you want to compare each item to each other item {(a,b), (a,c), (a,d),...} then you can use . A few months ago, I wrote 2 to the n, and I concluded that it must mean the number of ways you can pick a subset of N. Why 2? Because each placeholder can only have 2 values, like 0 or 1. N is the number of placeholders.

Today, I was like, "Um, so what's the difference? Oh, maybe one is just a special case of the other?!?!"

And, THEN, I was all like

Then I assumed that the must be fore when you want the number of items in the sublists to be FIXED, and the is for when you just want to know how many possible subsets, regardless of how many there are in the set, like {(a,b), (a,b,c), (a), ...}.

But I realized that's wrong because ALWAYS has N spots and it is not variable either. So my earlier phrasing, "the number of ways you can pick a subset of N" is not quite accurate. It should be "the number of ways you can fill N placeholders with 2 possible values." And that is more like Ahh, Permutations because order DOES matter with 2 to the N.

I am not sure if I learned anything. But I have gone so far away from my original reason for tracking down these calculations that it does not really matter anymore. I'm sure I'll be back again when I remember why I was trying to understand in the first place.

Posted by Frozone Permalink on August 13, 2011 12:30 PM | Comments (0)
categorized under Computer Science & AI




August 06, 2011

Compare each item to Every Other

Hi, I have a really silly question.

Suppose I have a set = {a, b, c, d, e, f}. I want too compare each item to every other item, and assign a value. For example:

(a, b) = 46
(a, c) = 51
(a, d) = 40
(a, e) = 25
(a, f) = 41
(b, c) = 47
(b, d) = 38
(b, e) = 34
(b, f) = 2
(c, d) = 22
(c, e) = 26
(c, f) = 14
(d, e) = 8
(d, f) = 54
(e, f) = 13

I just picked the numbers randomly. But what is that called when you explode a set out into pairs?

Oh, I know! Just as I was typing this, I was like, "Permutations and combinations?"

So, yeah, I think this is a combination. And then just now I remembered something I learned in high school. Who would have ever thought this would be useful some fifteen years later?

There were 15 different pairs. There were 6 in the set, and since I wanted pairs, it was Six Choose Two, or,

... which of course equals 15. Because:

Weee!

I will tell you why I wanted to do this. I have a set of people. So {a, b, c, d, e, f} are people. Alice, Bob, Christina, David, Elizabeth and Franklin.

I want to represent how "alike" each person is to the other, using a number.

Next, I am putting this party of people in different contexts and I am comparing them in different situations.

Then I want to compare the situations with each other by saying "Oh, in this case, the people were all very similar to each other, but, in that case the people were all very different!"

I think that statistical variance should do the trick. Taking my numbers from before, I can put them into a new distribution: {46, 51, 40, 25, 41, 47, 38, 34, 2, 22, 26, 14, 8, 54, 13}. You can calculate the variance over any set of numbers. It's just:

But I can do that on a calculator. Or go into Excel and type in =VAR(46,51,40,25,41,38,34,2,22,26,14,8,54,13) into the formula and push Enter. The answer is 268.4175824.

Of course, before you start to compare the sets with each other you should probably normalize them. Um, but I won't get into that now.

Anyway, I need to create a pool of numbers using Java and then grab the variance. I don't know how to do that!!! Gonna go web search, cya l8erz.

Posted by Frozone Permalink on August 06, 2011 01:54 PM | Comments (0)
categorized under Computer Science & AI




March 10, 2011

Stable and Unstable equilibria

Recently, I learned the difference between Stable and Unstable equilibria.

First, you have to understand equilibrium: This is something that you are measuring. For example: The percentage of various categories of people in a population over time.

The stability of the equilibrium refers to when something happens that causes the percentages in the various categories to go out of whack. A stable equilibrium will return to what it was before after a certain period of time. An unstable equilibrium will be whacked off course and does not return to the original state.

What are the guiding forces that determine whether a system returns to equilibrium or goes ajar? Well, each category of the population has to have certain inflow and outflow rates. The parameters to the rate calculations are how the percentages of the population change w.r.t. the circumstances of the world.

Posted by Frozone Permalink on March 10, 2011 01:32 PM | Comments (0)
categorized under Computer Science & AI




January 30, 2011

Theory is convenient

Another reason I like theoretical work is that it's convenient. If I wanted to work on something applied - where coding is involved - then you have to have your development environment, usually a subversion, a web server, a means of uploading the files, restarting the webserver or tomcat if you need, the compiler or interpreter, configuration files, etc. -- API specifications, jar files, it never ends. All of these things are helpful and are usually quite convenient in themselves, but the complexity of the whole turns into a barrier. I may have my development environment set up on my computer at work, but when an idea strikes me at home I can't get into the code quickly or easily, unless I have set up my home computer to tap into the subversion or web server etc. remotely. But there is no freaking way I have time to toy around on the computer to streamline these things.

But theory -- ahh, theory -- all you need is a piece of paper and a notebook. You could argue that you may also need to access the work of others -- papers, blog entries, etc -- which requires a computer also, and internet connection, and social media, and some database application like Papers to organize your personal library, and so on.

I guess either way it is difficult to work when you don't have access to a computer, or, your life situation prevents you from sitting down with peace and quiet for more than 2-5 minute intervals.

But I still like theory.

Posted by Frozone Permalink on January 30, 2011 04:36 PM | Comments (0)
categorized under Computer Science & AI




January 23, 2011

Causal Loop Diagrams

Causal Loop Diagrams are awesome. (wikipedia link) I think that these can be a really, really powerful tool to express complex systems. However, they are making my brain hurt. But I have to know them for a class I am taking, and I'm likely going to be using them for my term project. I am glad to have this chance to force myself to really work hard on these and try to develop an intuition for them. But right now they seem really difficult to grasp, and I'm kind of scared that I might not be able to grasp them in time.

I might be wrong, but this is how I understand them so far. A directed edge is always labelled with a + or _. The positive means that the 2 variables either "both go up" or "both go down". The negative means an inverse relationship. If the value of the first variable goes up, the second will go lower than it otherwise would have been. Similarly, the negative means that if the first one goes down, then the value of the second one will get higher than it otherwise would have been.

Posted by Frozone Permalink on January 23, 2011 04:49 PM | Comments (0)
categorized under Computer Science & AI




November 16, 2010

Equally thrilling in mathematical modelling

It's funny that I posted Wild Thrills the other day about discovering a new passageway through the virtual terrain in World of Warcraft.

You see, I had exactly the same feeling again today - a Wild Thrill - while I continued my readings (first mentioned in this post, indicator functions) of Stochastic Modeling: Analysis & Simulation by Barry L. Nelson.

And how could reading the chapter entitled "Basics" in this book bring me a Wild Thrill, you ask? Well, it's because I had a moment where I realized that this book was trying to teach me THE SAME CONCEPTS that I had learned in a knowledge representation class I took in 2007-08. They looked different, but ah-HA! I say, they were the same concepts. I didn't realize it until I had picked up the book for like the 7th or 8th time. But I eventually realized it. And this is what caused the Wild Thrill.

And now I will explain the concepts that I have now encountered in 2 different presentation formats.

  1. Law of Total Probability
  2. Joint probabilities can be re-written as products of Conditional probabilities and Marginal probabilities

Law of Total Probability

In my own words, (which may be wrong!) a Marginal Probability is one without any conditionals. For example, P(A). A Conditional Probability is one where you have "givens", for example, P(A|B).

Now that I have explained those 2 definitions, the next part will make more sense. The Law of Total Probability is a way of breaking down a Marginal Probability when you have some evidence. You can also think of it as flipping around the numerator in Baye's Rule. You know:

... and you can re-write the numerator:

What does this mean? Pretend you are trying to solve a puzzle about P(A) so you are trying to work on as much info around P(A) as you can. Now, pretend that you have a whole bunch of evidence (a.k.a. observations) about the event, B. I don't fully understand all the steps, but it is my understanding that your data about P(B) combined with the Law of Total Probability will help you find out about P(A).

Joint probabilities can be re-written as products of Conditional probabilities and Marginal probabilities

The only thing I really had to say about this one was about the syntax. If I may refer to a line in Nelson's book, equation 3.13:

From what I understood, using the line above, the author was trying to convey the same idea as the one I started in this older entry, CPTs and CPDs. If you look at that graph -- the one with P(Y) and P(X|Y) -- the joint distribution would be where you multiply all the nodes together, i.e. P(Y) * P(X|Y). I think that the line above is trying to do the same thing, namely, multiply together all of the probabilities together.

I think that Nelson's book was trying to show that you can decompose joint probabilities ex. P(X,Y) into the products of all the nodes in the causal graph, i.e. P(Y)P(X|Y), which in turn you can decompose the marginal, P(Y), assuming you had some data for say another event V, it would be


Conclusion


I STILL haven't figured out how the hell these concepts can be applied in any practical way in my research. But that is why I am going to keep reading Nelson's book.

Posted by Frozone Permalink on November 16, 2010 07:29 AM | Comments (0)
categorized under Computer Science & AI




October 31, 2010

indicator functions

This entry is about The Indicator Function, which is a new-to-me but pretty basic mathematical/statistical tool. I filed it under "ComputerScience & AI" which I thought was a crazy broad term in itself, but maybe I should think about going even broader, like "Math & Stats". Anyway.

Let's say we have a pool of numerical data S = {s1, s2, .... si}. The indicator function basically binaries the set. (Where I use "to binary" as a verb.) The indicator function takes a constraint as an argument, and the output is basically the data set again, only a "0" if the data point fails the constraint and a "1" if it meets it.

In the book I'm reading (Stochastic Modeling: Analysis & Simulation by Barry L. Nelson) the Indicator Function is presented alongside the concept of a histogram. A histogram is basically a bar graph where the Y-axis shows frequency of occurrence and each bar shows the event or interval being measured.

The output of the indicator function may be viewed as the Y-mark for that column in a histogram, where the column is the entire data set S.

Actually, I think I screwed that up. When I said "S" I think that was actually supposed to be a subset... Let me use a different example. Let's take a data set U = {u1,u2,....ui}. Then you would divide the entirety of U up into like 10 different sets. You'd apply the indicator function separately to each subset. For each subset, count the number of 1s returned by the indicator function. Each subset of U is a bar on a histogram, and the count of the number of 1s is the Y-value.

Posted by Frozone Permalink on October 31, 2010 09:47 AM | Comments (0)
categorized under Computer Science & AI




August 05, 2010

Blindness to novelty

I continue to read Thomas Kuhn's The Structure of Scientfic Revolutions.  Recently, I enjoyed his presentation of a study as an analogy to the circumstances under which scientific discoveries are made.  The study (citation needed) asked participants to identify playing cards: ex. two of diamonds, queen of spades.  The trick was that sometimes a non-standard card was shown, for example a three of diamonds that is coloured black (n.b. most people would know that a three of diamonds would normally be coloured red).  The researchers looked at how often participants failed to identify an abnormal card.

The analogy to science is that a scientist is typically highly educated and trained in certain techniques and methods, and has developed a sense of prediction.  The scientist's notion of what they expect to see as a result of an experiment can cause them to fail to observe an anomoly, like those who failed to notice the black three of diamonds.

I thought this was really cool.  I tried to put myself in the scientist's shoes.  Unfortunately, I don't have a good enough handle on my field to have a notion of what to expect in an experiment.  I wouldn't even have any foundations with which to begin experimenting.  Maybe this will come someday.

Next, Kuhn talks about the scientist's preconditioning as necessary for new discoveries, even if this preconditioning contributed to the failure to notice anomalies in the first place.

Posted by Frozone Permalink on August 05, 2010 06:41 AM | Comments (0)
categorized under Computer Science & AI




August 03, 2010

Example of Game Theory application

On my break I skimmed the latest issue of Data Mining and Knowledge Discovery, and this paper jumped out at me:

A game-theoretic framework to identify overlapping communities in social networks by Wei Chen, Zhenming Liu, Xiaorui Sun & Yajun Wang.

I am learning a lot about the theory and it was cool for me to recognize game theory mapped to an application (i.e. identifying overlapping communities in a social network).

Next time I have an opportunity, I want to study the implementation of strategy in this application, and discover how these researchers used equilibrium.

Posted by Frozone Permalink on August 03, 2010 02:43 PM | Comments (0)
categorized under Computer Science & AI




The Structure of Scientific Revolutions

I am reading The Structure of Scientific Revolutions by Thomas Kuhn, 1962 and am about a quarter through.

The book provides an abstraction of Science and shows the shape of its growth. I don't think I could have read this book 10 years ago because I found that most of my "oh, cool!" moments happened when relating my research life with the material. (and 10 years go I was an undergraduate student who hadn't really discovered research yet.) It's like my own little story was fitting into a big picture somewhere! Below on number 3 I said to myself, "ohh, that's me!"

So far, I learned about 3 types of fact gathering in science.
1) determination of significant fact
2) relating fact to theory
3) work to articulate theory, resolve ambiguity

I think that Science is a beast of its own, almost with its own life force. This book is helping me understand it a little better.

A quote in this book that popped out for me was Francis Bacon: "Truth emerges more readily from error than from confusion."

Posted by Frozone Permalink on August 03, 2010 06:46 AM | Comments (0)
categorized under Computer Science & AI




July 26, 2010

Why is formal AI interesting?

While standing on a children's play structure at the park this morning, watching my kid spin the ship's wheel, I surveyed the expanse of the park from my vantage point: I could see the soccer field, the baseball diamond, and the fences to people's houses over yonder. And I thought, "So why am I so interested in formal AI, anyway?"

Why formal AI, instead of staying "free" and just trying application after application in a creative and scientifically rigorous manner?

I figured, formal AI is interesting to me because it allows you to share with others.

Granted, if you are doing non-formal work, you can still talk about your results and your approach qualitatively and quantitatively.

But, to re-use another's work, if all you have is a narrative of their own experiment and results, then you have a lot of work to do to re-contextualize their approach to your own area.

However, if they have left you a formalization, then it's already "contextualized" into mathematics, which can be re-applied anywhere. If someone else has designed the mathematical machinery, then you can re-use it without having to re-contextualize it. And THIS is technology.

In other words, I'd say that I'm interested in formal AI because I feel it makes a greater contribution to humankind.

Posted by Frozone Permalink on July 26, 2010 04:29 PM | Comments (0)
categorized under Computer Science & AI




July 17, 2010

Utility comparison

I was just thinking that Cooperative Game Theory would need some way of comparing the utility between two agents and "merging" it. For example, given a set A = { a_1, a_2, ..., a_n} possible next actions, which a_j would give the highest utility for both, whether we cooperate vs act independently.

And how do you define "cooperation"? Wouldn't each a_j have a "type"? Like, how are the agents acting upon the environment, how does it change? The listeners on the utility function would pick this up, I guess. Now Minsky is influencing me... we'd have a set of listeners for each agent's utility functions, and the two agents could have listeners of common "types", and these same types are the ones attuned to the different kinds of actions. I'm still working through a Koller paper that explains how Event variables are what links a utility function to agent decisions (i.e. actions, I guess).

What could you do with this system given this new dimension? Can you predict certain things with more accuracy? Be more flexible in a certain way? The real winner would be if some successful application could be shown.

Posted by Frozone Permalink on July 17, 2010 02:08 PM | Comments (0)
categorized under Computer Science & AI




July 10, 2010

Nash equilibria, II

Earlier, I asked, "Why is everyone trying to compute Nash Equilibria?"

I still don't understand the big deal. But maybe I'm a little closer. In my own words, I understand that a Nash equilibrium is a pre-defined strategy (set of rules) for every player such that they are best responses to every other strategy played by other players. So, it's basically a pre-definition of best strategies for all players.

How could you pre-compute such a thing unless you pre-define the set of possible moves available to each player? I worry that what you have left, i.e. what you have precomputed, is no longer interesting. Are there any applications that show Nash equilibria are worthwhile? And, Would this be useful for cooperative games?

During the computation of strategy, the machine has to lay out the steps and then figure out which sequence was the most beneficial. Is this related to the mysterious term floating around in my head, "policy refinement"?

Also, apparently a Nash equilibrium is a type of solution concept. I am thinking that a solution concept might just be a more general set of strategies -- not necessarily maximizing utility for anyone. So "strategy" is kind of a misleading word if applied here because it implies intent to increase utility.

Also, maybe I'm wrong in my understanding that Nash equilibria are usually precomputed. Maybe I'll send another email to someone I know who is good at this stuff.

Posted by Frozone Permalink on July 10, 2010 03:10 PM | Comments (0)
categorized under Computer Science & AI




June 27, 2010

Foiled by a footnote

I was picking through a paper and was unpacking the meaning of this:

I understood that the vector x was shorthand for x1, x2, ... Xn in a game with n players and the {0,1} must have been because the paper was about a binary game (i.e. each player has 2 possible moves). The power to n was obvious because of my previous experience. But I was stuck on that 3. What the hell did it mean???

Imagine my laughter when I realized it was a footnote!

Update: as I read further, I suspect the {0,1} may not be an action choice, but a utility payoff.

Posted by Frozone Permalink on June 27, 2010 02:08 PM | Comments (0)
categorized under Computer Science & AI




June 19, 2010

Potentials, moving on

The reason I wanted to compare Probability Distributions with Probability Density Functions is that I wanted to build up a definition of a "potential" or "factor". The reason why I wanted to know what a "potential" was is that it came up in a recent talk I attended. See, I believe that a "potential" is related to a "probability density function". I don't understand what a potential is, and I don't understand what a probability density function is, but I *do* understand probability distributions. So I was trying to build up from what I knew already.

Unfortunately, I have reached the point of diminishing returns, so I will pack up and put this on the shelf for now. Maybe I will stumble upon a resource later that will shed light on my questions.

Before potentials, I was working on an analysis of cooperative game theory, and trying to identify a mathematical model for "process" where the uncertainty is not necessarily tied to the state transitions, but still in the cooperating agent's choice of next-action.

(Would that even be useful? I'm not confident I can justify the usefulness of my research path. But I don't care. It's MY hobby, damn it. heh)

Posted by Frozone Permalink on June 19, 2010 08:38 PM | Comments (0)
categorized under Computer Science & AI




Trouble with Probability Density Functions

Probability Distributions and Probability Density Functions (PDFs) have similar names but they are different things. I started writing this entry with the intention of illustrating the difference between the two. But I'm having a really hard time.

So far, I have observed that regular probability distributions, when shown on a Cartesian graph, usually have a y-axis scaling from 0 to 1, a probability value. I show an example in this entry, Some wobbly steps forward...the math. On the other hand, the PDFs seem to show a percentage value on the y-axis.

I can't tell you anything else right now, so I'll keep reading.

Update: it looks like the PDF might be the integral of a new thing, a cumulative distribution function. Also, I have stumbled on 2 sources now that claim a PDF is actually just a smoothed out version of the probability distribution itself. Here's one of the sources. I could be misunderstanding. To me, histogram= the usual way of visualizing a probability distribution.

Posted by Frozone Permalink on June 19, 2010 05:09 PM | Comments (0)
categorized under Computer Science & AI




June 18, 2010

Regression

In the many science blogs I read, folks often talk about doing regressions for their work. I am not familiar with this technique and I want to understand what people are working on. So, a while back, I read an awesome book, Head First Statistics by Dawn Griffiths. Unfortunately, my maternity leave ended before I got to the chapter about regression. I have been meaning to go back, but it's 10 months later and I have not yet done so.

Recently on my Twitter, the following fell into my lap. This is one of the best explanations of Regressions I have ever seen! Although I still want to work though my own example on pencil and paper, reading this was a big step.

Data mining with WEKA, Part 1: Introduction and regression, hosted on IBM.com.

From now on, instead of thinking "that regression thinggy", I can think, "some mapping from independent variables (input) to a dependent variable (output)". Wahoo!

Posted by Frozone Permalink on June 18, 2010 12:50 PM | Comments (0)
categorized under Computer Science & AI




June 09, 2010

Hybrid Bayesian Networks

I went to a talk today. It was thrilling because it was the first time since my class (2008) where I heard an actual face to face human being speak about probabilistic inference.

The talk was about Hybrid Bayesian Networks, named so because the Event variables are a mixture of continuous and discrete. Everything I have worked with before has been discrete only. I am aware that inference with continuous variables involves integration instead of summation, and that the words Gaussian and Kalman are thrown around, but I don't understand them well enough to be able to explain them to you right now.

The speaker was Dr. Prakesh Shenoy from the University of Kansas.

Here are the questions I had, but was too chicken to ask. Also some of them Icould very well answer myself, if only given access to my old notes and 20 minutes of quiet time.

1) What are 'potentials' again?
2) why did you have to truncate your functions?
3) why is finding a marginal helpful to answering a probabilistic query? Do I remember correctly that a marginal is an unconditioned variable without givens?
4) in what sense do you mean "deterministic" about your causal networks?
5) how does a solution from a hybrid Bayesian net compare to a query response from a regular discrete one?
6) what is a spline?
7) I understand that Taylor series are used to produce something, could you explain that again?
8) what is a probability density function?

So, yeah. I feel excited and thrilled about being able to think about computation in this way and the possibilities it opens up. But I also feel dumb because am asking a LOT of very BASIC questions, and it would take many years of study to be able to converse with the hot shots. And I am so busy and have so many responsibilities that it doesn't seem intuitive that I would have capacity to do this.

Posted by Frozone Permalink on June 09, 2010 06:01 PM | Comments (1)
categorized under Computer Science & AI




June 05, 2010

CPTs and CPDs

A CPT is a Conditional Probability Table and a CPD is a Conditional Probability Distribution. They are the same thing, it's just that one term appears in some papers while the other term appears in others. I am more used to CPT, so that is what I will use.

Here is an example of a CPT, image just below this paragraph. The x-axes are labelled, "x=rainbow", "x=tree" and "x=sparkles". The y-axes are referring to the event, "There is a Mystery", and the values are "y=0, nobody solves" and "y=1, somebody solves". It was a fluke that my X and Y fell in agreement Cartesian convention; I should have used different variable names!


Fig 1. An example of a Conditional Probability Table (CPT). Note, the variable X is the same as in, Some wobbly steps forward... the math. To learn how I made this diagram, check out a similar one in LiveScribe dot paper. I assure you, my actual handwriting is much nicer.

I will explain how the CPT works. A CPT records the probability values of that event under different circumstances. Normally, I encounter CPTs in sets: given an influence diagram, each event node has a CPT. So, the example CPT above might go with an influence diagram like this:


Figure 2: Causal network for events X and Y.

In Figure 2, there are 2 event nodes: Y and X, and they are respectively labelled P(Y) and P(X|Y). The reason X has the more complicated label is that the causal link points towards X (i.e. Y causes X). The CPT for X is Figure 1, and the CPT for Y is not shown.

Notice how the rows add up to 1. This is because it's the CPT for "X" and the distribution of all possible values of X have to add up to 1. The columns don't have to add up to 1 because the Y values are just "givens". The full probability distribution for Y, with all values adding up to 1, would be in the CPT for Y.

I actually have a whole bunch more to say about CPTs. But I don't have time to write anything else.

See you later.

Posted by Frozone Permalink on June 05, 2010 11:36 AM | Comments (0)
categorized under Computer Science & AI




May 30, 2010

Normal vs Exensive: Two forms of game representation

Today, I learned about 2 ways to represent a game. I don't know if these are the only ways, but they seem to be dominant. If I were to build a system using a game representation, which would I choose??? I am inclined to the tree (extensive form) because I like graph theory, and a tree seems closer to me to a graph than the matrix. But upon second thought, you could represent a matrix using a graph, too. So I don't know. I would have to learn more about the positives and negatives of each representation.

A normal-form game representation takes the form of a matrix. Basically the rows (call them r1, r2, r3...) are all labelled with moves performed by PlayerA, and columns (c1, c2, c3... ) are labelled with moves performed by PlayerB, and the cell shows the consequence or overall "output" (ex. points scored) when those 2 actions happen together. Add more players by adding more dimensions to the matrix.

In the figure below, the mangled subtext in the x-axis labels read, "Player B does move x" and "Player B does move y". The value of all cells is ("delta utility), or, "change in utility"; a crude placeholder until I can figure out how to define this properly.

An extensive-form game uses a game tree. At the root of the tree, the first player makes a move. Draw a "branch" for each possible move that the player is permitted to make. At the next player's turn, branch each of those out for all possible moves that the second player is allowed to make. And so on.

Posted by Frozone Permalink on May 30, 2010 09:26 PM | Comments (0)
categorized under Computer Science & AI




Nash equilibria

Within Game Theory, Nash equilibria are used, somehow, to compute balance between players in a game. (according to my best understanding so far)

The point of this post is to ask the question: WHY would you want to compute this?

I can only guess that if you're building a computer chess game, you want to make sure the computer opponent is somehow an equal opponent, not too hard or not too easy, otherwise the game would not be enjoyable.

But I fail to see how Nash equilibria would be useful for my work. I think this is because I haven't finished mapping the various components of game theory to my domain.

Posted by Frozone Permalink on May 30, 2010 09:24 PM | Comments (0)
categorized under Computer Science & AI




May 19, 2010

Filtering, Prediction, Smoothing

Filtering, Prediction, Smoothing... these are AI techniques that involve time. Not sure how they fit with my previous entry, yet.

- Collecting feedback over time / optimization vs. Directly wanting to influence direction over time. What is this interplay? I am especially interested in the mathematical notation.

Posted by Frozone Permalink on May 19, 2010 02:18 PM | Comments (0)
categorized under Computer Science & AI




May 01, 2010

Amusing semantics

These are the kinds of things that amuse me.

Consider the phrase, Bayesian Online Learning.

I recently attended a conference about teaching and learning online, with a strong theme of using social media. Reading "Bayesian Online Learning" caused me to think of an online course about things Bayesian. Immediately, the AI person in me recognized that "Online Learning" has nothing to do with education. "Learning" refers to Maching Learning and "Online" means that the algorithm is useful at runtime, rather than using precomputation and then running stale.

The double meanings of these terms often cause me problems as I read paper titles and abstracts. I really do work in 2 fields.

Posted by Frozone Permalink on May 01, 2010 07:25 AM | Comments (2)
categorized under Computer Science & AI




March 06, 2010

Formal constructs for projection

This entry is about how meaning in language comes from more than the words being spoken. Background information like previous experience and context impact the meaning communicated by a given word.

My daughter said, "né-né" when I was feeding her crackers and dry cheerios. I was certain she meant "raisins!" because it was snack time. I believe she was telling me what she felt like eating at that moment.

Later, my daughter said, "né-né" after dumping a bucket of blocks on the floor. I was certain she meant "empty!" because she says "né-né" when taking toys out of a bag or we finish taking all of the dishes out of the dishwasher.

The same word - "né-né" - means different things depending on the situation. My daughter communicates clearly with me, but the meaning communicated is not due only to the syllables that she creates with her voice.

I think this is one of the challenges that machines face when trying to interpret language. We don't know how to make machines aware of this context: our previous experience, the current situation.

I'm challenged because I want to track down some formal constructs for these things. I made good headway with decision theory. I'm still not done with game theory. Another challenge is to identify all the possible "X theories" and perform proper analysis and extensions of them. Not sure if this is going to happen.

Posted by Frozone Permalink on March 06, 2010 09:27 AM | Comments (0)
categorized under Computer Science & AI




January 31, 2010

On Nondeterminism

I figure there are two types of Nondeterminism. Suppose that a state transition from action A results in a set N of possible next-states. If the set N has >=2 elements, we say that this is a Nondeterministic situation.

In the first type of Nondeterminism, you are totally aware of all the possibilities in N. In other words, N is enumerable.

In the second type of Nondeterminism, you know that N has >=2 elements, but you have no idea what they are. I would call this a non-enumerable set.

Does there exist terminology to describe these two types of Nondeterminism?

EDIT: Hmm, maybe bounded and unbounded nondeterminism?? See http://www.win.tue.nl/~mousavi/mnez.pdf

Posted by Frozone Permalink on January 31, 2010 09:36 PM | Comments (0)
categorized under Computer Science & AI




January 04, 2010

Oops, slide

Oops. In my previous entry I said I would link to a set of slides from a class presentation. But I didn't. I will go back and correct the entry. Here are the slides.

I was trying to identify the uncertainty in my problem, then apply Game Theory as an attempt to address it. I pinpointed uncertainty as a guess on "which subset" of X is an appropriate reflection of the best known options to present to the user, where X is the set of all possible actions that the user could take.

This diagram is relevant because it helps to illustrate X, and why focusing on a subset of X is important.

I wish to credit the image of the brain from the Wikimedia commons, available here. The original image was created by Patrick J. Lynch, medical illustrator.

My series of slides shows how learning works. In the first step, you are looking at new ideas for the first time. Different parts of your brain fire up depending on whether you are reading or listening, shown in the first slide.

The second slide shows how as we listen to new ideas, our mind sparks all the most similar ideas it can think of. Basically your mind is trying to "attach" new ideas to existing ideas. So as you are receiving input on new ideas, your mind is doing a retrieval on related ideas that it could potentially make connections with.

The third slide shows that the next step is for the person to "output" their new ideas. They describe what they just learned, or work on an activity, or do an assignment, or draw a picture, or something. This causes sparks in the front of the brain to go off.

The next step (not shown) is for the learner to wait for the input and feedback on their projection. You "test" your new knowledge by watching what happens as you "output" it into the world and you wait for reinforcement. If you get positive feedback, you are on your way towards learning it well. Negative feedback causes confusion or makes you want to re-attempt your trial to see if you can learn it right and adjust your theories.

So, this subset of X that I have been talking about is the set of all opportunities that the learner could use to "output" and test their theories. The AIED system would know what the learning goals are, and be able to "hear" the learner as they are testing their theories, and then present feedback to show the learner that they are "doing it right" or help show them what they ARE doing right vs. if they have a misconception or something.

Posted by Frozone Permalink on January 04, 2010 07:31 PM | Comments (0)
categorized under Computer Science & AI




January 03, 2010

Pen scratches: elementary game theory application

I enjoyed reading this Introduction to Game Theory (from Holy Cross Liberal Arts College in Massachusetts, USA) because it did a good job of explaining a mixed strategy game, and I think that this might map to the process modeling problem I've been working on. In Game Theory, I think that the process model I'm looking for is the Strategy.

In my understanding, a pure strategy means that your decisions are always based on the same criteria. A mixed strategy means that the basis for your decisions changes with your circumstances.

If you can enumerate your set of strategies, then you can spread them over a probability distribution. That is, the probability that you will select one of the strategies from the set is 1. Each strategy will have its own probability of being selected, and the sum of the probabilities of each strategy is 1.

Let me practice the proper mathematical way of representing this. Let the set S = {s1, s2, ... sn } where each strategy is represented by an integer. The name of the set S is represented by an upper-case letter S. Each member of the set S is represented by a lower-case letter S with a unique numerical subscript.

Let P(sn) represent the probability that the agent will apply the particular strategy as they make their next decision. So,

This is REALLY basic, but, I am new at this so I felt it was necessary to state all of this explicitly.

As I identified earlier, my main problem with my attempt at applying Decision Theory is that the whole point is to work around the uncertainty about which sn is going to happen next. Normally we assign each P(sn) according to our best guess, updating as we go. This works really well for the right kinds of problems.

But my problem isn't exactly this shape. My uncertainty is not about P(sn). Err, I'm not even sure what the uncertainty is in my problem. Let's brainstorm.

- anticipate user actions
- anticipate user goals
- guess on the user's experience, what happened in their head, which ideas they processed, and defining utility according to what we can sniff about what they experienced and whether it followed our mechanics for significant learning experiences

And here I will present a slide that I made for a class presentation once. Except I didn't use it because it was only a 10 minute presentation and I cut this out because it wasn't directly related to class material.

Right. I remember talking about the third point in a more mathematical way in this other entry.

In English, I would say that my uncertainty is that I want to "guess at the ideas that stuck in the learner's head". That is not technically reasonable. So a better, more precise model would be to say that my uncertainty is that I want to "guess at the set of next-actions the user will take". Because I (I, as the AIED system) don't care what is in the user's head. Well, I do. But what they are thinking is not as relevant to the teaching environment as the set of actions that they might want to take next.

In that older post I talked about a set X. Using that same X, let's talk about it some more. Let X be the set of all possible actions that the user can take within this learning environment. (And that is a pretty huge set considering the possibilities in learning environment these days....! I'm thinking that you're in a 3d environment where you can warp around to different spatial locations, each corresponding to subject areas, an environment where you could be in the process of building something, in the middle of a discussion, working collaboratively with friends, composing an essay, etc...) So that is X. And X could be multi-dimensional, based on the system's preceptors. (user model sniffers like keystroke listeners, browsing history trackers, whatever.)

And I wanted to look at the problem where my uncertainty is surrounding the state. We don't know what the state is. And we don't know what is going to happen next, because the user - or the other player in the game - is going to influence the environment which in turn influences us.

So how does X relate to the players in our Game? Recall the notation established in this other post that our players are LearnerAGent and the EnvironmentAGent, respectively: .

Also, will be influenced by the set S, i.e. this agent will be following this set of strategies, but does not know which one.

(Digression: Well, maybe the learner *could* know and that might actually be an interesting study... maybe it has been done already... to observe the effect on students according to whether they are told that they're being guided through a particular teaching mode/strategy, or if they're just led blindly and maybe they figure out the teaching strategy and maybe they don't, either way it would be neat to observe effectiveness and memory retention of course content.)

Anyway. I sense that baby is going to wake up soon so I just want to conclude where I'm trying to go, so I can pick up next time.

I identified the reason for my difficulty in applying decision theory to my problem.

I adopted some basic terms from game theory and am in the process of laying down some notation for my problem. My long(ish) term goal is to reformulate the application of probability in a new way (i.e. 3rd bullet above). *Almost* there but alas I am being torn away from my desk right now. But I am grateful for the time I did have and I made the progress that I did!!

Love Frosty

Posted by Frozone Permalink on January 03, 2010 01:19 PM | Comments (0)
categorized under Computer Science & AI




July 23, 2009

Basic Machine Learning (Statistical Learning Theory)

I was waiting in the doctor's office this morning, reading an excellent introductory paper on statistical learning thoery. (Did I mention that I love the Papers app on my iPod Touch? Oh, yes, I did.) My problem deals with a lot of data, so I wanted to take some steps to educate myself about machine learning, just in case that field can offer some valuable problem-solving tools.

Some applications of machine learning in my area that I can think of include:

  • detection of students' misconceptions - where clusters are steriotypes of how groups of other students have made similar mistakes in similar situations, and the input is learning object usage data
  • selection of a teaching strategy - where clusters are teaching strategies that have worked for similar learners in the past, and the input is zillions of observations.

Anyway, I made it to page 3 of the paper before I tripped for the first time. (I'm getting better!) I was having trouble picturing the joint probability distribution P on , where X is your set of observations - the data you want to organize into clusters - and Y is the set of categories into which things can be clustered. I think of as a big matrix where a value (x,y) denotes the value "x" being classified into category "y".

I believe that this JPD must assign a probability to each (x,y), i.e. the probability that the value 'x' is classified under 'y'. In other words, a likelihood of classification match for every (x,y).

So, am I dealing with bivariate data here because I have 2 variables? Those are usually represented visually with scatter plots. But I still wish to graph the probability. And I somehow don't think this is bivariate data because I'm not trying to show a correlation between my 2 variables. But then again, I'm working with instances of P((x,y)); something like P(x) would be meaningless.

I have some studying to do.

But anyway, I decided that the best way for me to visualize P would be a 3-dimensional diagram. Think of a square base, like a pyramid. This square is the matrix I talked about above. (I'm sorry it's confusing now because the X,Y labels on my matrix don't correspond to typical Cartesian labels, where X would mean across, Y would mean height and Z would mean depth...). Anyway, on my diagram, the height of the pyramid would be the probability value of 'x' being classified into 'y'. The vertical axis is of course ranged over [0,1] because it shows a probability value.

I wish I could make a graph for you. My math skillz suxxorz. Imagine a pizza chef flipping a circle of pizza dough over their head. When the dough lands over their fist it makes kind of a mushroom shape. Now imagine that they use their fingers to make lots of peaks, so that the mushroom is like a mountainous range. That's what I'm imagining. My grapher software wants me to define a function of the z-dimension as being over the cartesian-x-dimension and cartesian-y-dimensions. I have no idea where to even start. Eep!

But anyway, I think I've talked myself through a visualization to run with as I skim over the rest of the paper.

Posted by Frozone Permalink on July 23, 2009 03:00 PM | Comments (0)
categorized under Computer Science & AI




July 19, 2009

Pausing to reflect

As my maternity leave comes to an end (I go back to work in middle of August) I wanted to take some time to reflect on this past year (well, 11 months) and to get a coarse-grained feeling of where I've gone, research-wise.

I don't think I could describe my work as either a "depth" or a "breadth" of focus. Although I covered a wide variety of topics, I feel like I went surprisingly deep in some strong directions. I learned a lot about decision theory and I got a chance to attempt to apply it and look at it from different directions.

Early in my maternity leave, I listened to the complete lecture set (podcast) of Introduction to Psychology from MIT OpenCourseware. This has helped me a LOT because some of my research overlaps with educational psychology and some cognitive science, and now I feel like I have SOMETHING for those branches to attach to. This lecture series also introduced me to Steven Pinker's books -- I am now one of his fans, and am excited about buying more books. I watched all of Pinker's TED talks.

Sometime in January or February of this year, I "discovered" the blogosphere. I would have never guessed how valuable connections to other people in similar situations can be. Yay, networks! I also started using Facebook and Twitter more. I have a feeling that these things will get ditched again once I go back to work, but at least I have tried them out. Maybe I will keep using them a little. :) Most importantly, I discovered how to use these tools to connect myself to the research world -- following other researchers on Twitter, subscribing to journal feeds in my RSS Reader, and the world of research blogging (a.k.a. open notebook science).

I'm still at a total loss when it comes to conference publications; I have no idea how people stay up-to-date with these. I guess you just have to learn the names of the conferences and how often they are held, and then do a web search to try and find the websites each year. Or subscribe to mailing lists if they have them. I don't know, it seems like such a hassle. But maybe it's not so bad if you're a full-time researcher. (The problem is, I'm not.)

I think I got a chance to re-examine my field through the eyes of "real AI". Since I took those two courses in '07-08, I'm using some better informed "lenses" to see the world.

I picked a focus - instructional planning and the modeling of teaching strategies, using decision theory and planning - and have been toying with that for the last couple months. It almost tastes like a thesis topic, but I've said that before.

I still want to tie together Ontology as spatial relationships with these Minsky-inspired thoughts on actuators and preceptors as selectors on a "mode", perhaps of teaching. And exploring my doubts about whether decision theory is really the right tool for my problem, this post (Probability as a projection on the selection of teaching strategy) might help me get back into that thought space.

Well, the baby's awake. That's all for now. I don't feel like I'm done my reflection yet.

Posted by Frozone Permalink on July 19, 2009 02:02 PM | Comments (0)
categorized under Computer Science & AI




June 22, 2009

Plan-space planning and the optimal policy calculation, Part II

So, the point of an optimal policy is to tell you which way to go when you reach a decision point. If you're thinking of the problem in terms of a graph traversal, then the optimal policy tells you which edge to follow when you reach a decision node. In plan-space planning, there are no decision nodes, because each node is a partial plan. It's only at the leaves in the plan-space graph that you actually have actions ordered in a sequence (I think?). What's the difference between a fully-articulated plan and a policy???

Maybe by trying to associate an optimal policy calculation with a plan-space graph, I'm trying to compare apples and oranges. Hrm. I might e-mail my professor over this one.

So where to next? 'Think I'll pick up on the IMS LD stuff again; here's the last entry where I talked about it.

Posted by Frozone Permalink on June 22, 2009 02:39 PM | Comments (0)
categorized under Computer Science & AI




Action predicates

I was concerned last time about action predicates getting undesirably "flattened" into the same dimension as state-of-the-world predicates. I think the answer would be to split off a new dimension for actions (much like what I'm trying to do with an ontology-of-sorts for teaching strategy actions), in the spirit of good software engineering, where you want components that have very specific purposes.

Then, if you wanted to translate an action into your state-of-the-world space, it could be represented as a set of predicates that represent the RESULT of the action being executed in that world. For example, instead of having a "stomp down" predicate, you might have a "vibrations" or a "ground cracking" or a "pound sound" predicate bubbling up out of the world. Next, you always want to collect feedback, i.e. to measure the impact of your action, so you could gather observations about predicate changes that resulted from your action - the world feeds back the effects and you gather 'em as observations.

I really feel like I've gone way too far down a tangent. Trying to refocus... :)

Posted by Frozone Permalink on June 22, 2009 02:28 PM | Comments (0)
categorized under Computer Science & AI




Plan-space planning and the optimal policy calculation

I've been thinking about plan-space planning again. (see my other entry on PSP).

I want to look at how the optimal policy calculation would apply (or even IF it applies) to this type of planning. (see my first entry on the argmax thinggy)

One again, here is the lovely creature:

LaTeX: \delta^*(o) = arg\max_D \sum_{S}p(S|O=o,D) U(S,O=o,D)

I am interested in this question because I want to see what happens to the state. If you look at planning as a graph traversal, in plan-space planning I think that the state is encapsulated within each node. Recall that each node is a partial plan.

In the examples in my planning book, the state is a set of predicates, I guess telling you the things that you know. I imagine you could have probabilities associated with these somehow. (But I can't get over the feeling of being "trapped", when I apply a probability, I imagine that it is attached to a distribution somehow, which in turn is tied down to knowing the set of all possible outcomes, where each outcome has a certain probability, and all possible outcome-probabilities sum to 1.)

So how does the state, this set of predicates (maybe I could think of it as my bank of observations?) fit into the partial plan? When I look at my book again, the agent's ACTIONS are also predicates. This boggles my mind, and I think I'll have to think about this for a while. How could your model of the STATE OF THE WORLD be on an equal playing field with your action definitions? This doesn't make any sense at all. It explains how the state manages to get hidden inside a partial plan, though, if the representation of the actions and the observables are the same.

Hmmm. Going to chew on this for a while. I've bumped into this before -- epistemology vs. ontology... and I think I've blogged about it before, too. Got to dig up those entries.

And I can't forget that my bigger goal is to articulate an optimal policy calculation for a plan-space plan. Why did I want to do this? It had something to do with having a model of teaching, and being able to apply it by taking task domain ontology references and knowledge about the student's context to present it in a pedagogically effective way. I sort of have a visualization in my head about the different types of knowledge involved, and how they might fold together ("carrot ninja ball"), but can't figure out how to manifest this idea as an extension of existing AI planning technology.

I also want to buy some shoes. Blue ones. Closed toe, 2-inch heel, with a T-strap. I'd wear 'em with my gray skirts and black or white tops. And a blue ribbon in my hair to complete the picture. Sigh!

heh

Posted by Frozone Permalink on June 22, 2009 12:07 AM | Comments (0)
categorized under Computer Science & AI




June 21, 2009

The argmax thinngy = Optimal policy calculation

Time to grow up a little more.... what I have been calling "the argmax thinggy" shall henceforth be called, "the optimal policy calculation".

If I am going to study the thing, I may as well get used to referring to it by its proper name. ;-P

Here is a list of the entries, in chronological order, in which I talked about optimal policy calculations.

Posted by Frozone Permalink on June 21, 2009 11:32 PM | Comments (0)
categorized under Computer Science & AI




June 05, 2009

Plan-space planning

On Wednesday afternoon, I went to the natural sciences library at my university and got a book:

Automated Planning: Theory and Practice by Malik Ghallab, Dana Nau and Paolo Traverso.

This book gave me new clarity on types of planning: state-space planning and plan-space planning. State-space planning is like a graph traversal where the nodes are "states of the world" and edges are actions that result in the world changing from one state to the other. A more "advanced" type of planning (or at least, that's how I saw it) is Plan-space planning, where each node is a partial plan itself, and edges connect plans that are similar to each other, and progressing deeper into the graph gives plans that are more and more complete; the nodes at the end (leaf nodes? would this be all leaf nodes?) represent complete plans. Or at least, this is my best understanding right now.

I thought that this harmonized with the chord about how my problem is an optimization problem. Since in my case, I know the "general flow" of how things should work ahead of time, but it's just the small, immediate specific steps I need to be flexible about, then maybe using a plan-space plan is a good approach.

I don't really know yet, since this is such a new idea to me still, but I'm happy to have a new resource to draw upon later!

Posted by Frozone Permalink on June 05, 2009 12:10 AM | Comments (0)
categorized under Computer Science & AI




May 12, 2009

Karl Popper & Bayesian inference

I was just cleaning out my starred items in my Google Reader and I stumbled upon this item from Andrew Gelman's blog about "the most important philosophical point of confusion about Bayesian inference". An excerpt:

From a philosophical point of view, I think the most important point of confusion about Bayesian inference is the idea that it's about computing the probability that a model is true. In all the areas I've ever worked on, the model is never true. But what you can do is find out that certain important aspects of the data are highly unlikely to be captured by the fitted model, which can facilitate a "model shift" moment. This sort of falsification is why I believe Popper's philosophy of science to be a good fit to Bayesian data analysis.

I was delighted - it made me feel validated, somehow -- because this reminded me of I question I asked in my AI class once. I couldn't see how a Bayesian network could be used as a problem-solver without relying on "using evidence as proof", which would be contradictory to Popper's philosophy. I couldn't see how it was possible to put forth a query that could be refuted by evidence.

I remember thinking that this was a really huge question, and it actually upset me quite a bit that Bayesian inference may not hold up to Popper's definition of true scientific pursuit. I remember that my instructor offered up the idea of the "clarity test" where basically your theory is falsifiable when our variables take on the value OTHER than the one that the theory proposed. I am probably totally bungling up the logic here, but... I guess the clarity test supposes that if you structure your question such that it could be answered by a magical all-knowing being, and they could verify your theory as true or false, then it is falsifiable. In your Bayesian network you can enter the evidence you DO have, and make appropriate inferences based on the data.... So even though your inference is coming from evidence, it is still OK by Popper because it is possible that the evidence could have shown your theory is false.

Hrrm, I'm not totally satisfied that I remembered that or explained it properly. But anyway, I just wanted to make a special little posting for my starred item and document the story behind it. :-)

As always, I welcome discussion in the comments if anyone would enjoy playing with these ideas some more to clarify!

Posted by Frozone Permalink on May 12, 2009 01:55 PM | Comments (0)
categorized under Computer Science & AI




May 11, 2009

Probability as a projection on the selection of teaching strategy

The more I think about it, the more it makes sense to me that the probability in the argmax thinggy should be for the probability that each teaching strategy will be the most effective at this point, given the student model and resources at hand. It fits that the distribution is over 1, because surely you will chose ONE teaching strategy. Over time, each strategy will increase/decrease in probability of its effectiveness. For example, the longer you follow the same story thread, the more likely it is that the student will get bored, so the probability for this strand will gradually shrink with each time-step, showing our belief that the student is more and more likely to get bored and would like to side-track for a bit with a distraction. I don't know how yet to juggle the student's multi-tasking ability; I'm sure I can fit that in somehow as a balance of weights over the strands.

The point of this planning system is to help the student navigate through hoards of information and activities in a pedagogically meaningful way, taking advantage of the usability laws of transitivity, and exploiting our knowledge of how learning works - building up from what the student is already familliar with, etc.. So my point is clear: selection of the best opportunities, based on our history. Flexible, personalized exploration/learning.

But now I have no idea what to do with the utility function... oh, right! It's supposed to reflect whether the path represents a "meaningful experience", which I understand in my head but have yet to define precisely. Conveniently (but I'm not sure how significantly!) this also plays well with the "feedback" thing you hear about in popular science, such as Jeff Hawkins's Hierarchical Temporal Memory where he emphasizes the importance of feedback in machine learning, and, in Douglas Hofstadter's I am a Strange Loop, where the emphasis is on self-referencing systems.

I am aware that I might be abandoning the greatest strength of my tools, here. I may have to search for a different model other than my new love, these Markov Decision Processes. Namely, the strength of the MDP is that it can model uncertainty when you don't know the result of your state transition, but you still want to be able to act confidently. (At least, I think that's the point and the power of the MDP. Gosh, I still feel like I don't know anything!) By using the probability distributions to measure the belief that various teaching strategies/strands will be effective at a given point in time, I am not modelling any uncertainty about state transitions. i.e. This probability distribution is NOT over a set of possible next-states. Wait a minute. It is, kinda. But the point is not to predict which one is coming next, the point is to pick the one I believe will be the most effective. Gosh, this is subtle. Will it mean the difference between adopting the MDP vs. abandoning it for a different model??? Or maybe I can just introduce new notation to distinguish this new subtlety.

I haven't yet figured out what my states are, so, I can't say for certain that the MDP is or isn't the right tool for me, i.e. whether I'm in the situation of knowing which state I'm in after an action. But at least I'm aware of this question so I can recognize it as I progress. And maybe in the future, I'll stumble upon a better tool or model that fits my problem better.

Posted by Frozone Permalink on May 11, 2009 09:34 AM | Comments (0)
categorized under Computer Science & AI




May 10, 2009

State transitions: the impact of the probability distribution

Mental note, I have several posts now on state transitions. I should organize them all and create a new subsection on my research page.

Warning: This post doesn't make a lot of sense. I have a lot of "cleaning up" to do... really it is just a dump of convoluted ideas, but I hope that by "dumping" I give myself something to build up from for a future post. Sorry dear readers for the mess!

So, you're executing a plan. Your next action is chosen according to the option with the highest utility. For reference (and because I still get a little rush when I can post some sexy greek symbols on my blog, LOL, I hope that wears off soon and I can grow up already) here is the argmax thinggy again.


LaTeX: \delta^*(o) = arg\max_D \sum_{S}p(S|O=o,D) U(S,O=o,D)

As we just said, the expected utility U(S,O,D) has a huge impact on our choice. But recall the other component of the argmax thing: the probability, P(S|O,D). What are these probabilities? I think they represent the probability of your action causing the referenced state to be the next actual state. In other words, the probabilities usually represent the fact that you don't know exactly how your chosen action will affect the next state in the state transition.

But as I was thinking about this, I felt like I was hooked on something -- I'm not at all satisfied with this structure, so I want to plough through the details a bit; I want to look at it in a different way.

Normally, you would have predefined probabilities of actions outcoming with different results, right? Like, part of modelling your problem before you let the robot rip is to pre-define the graph of state transitions. You would know, offline, which states are likely to follow which other states, with degrees of probability for each alternative. I'm still a little uncertain about where "actions" relate to the whole transition function thing. Are actions always specified as part of the state transition function?

Looking back... in STRIPS, it looks like the state transition function IS an action definition. i.e. The state transition function is a triple (action,preconditions,effect) where
- the action is defined to be some predicate like action_name(argument1,argument2),
- the preconditions are a set of predicates, and
- the effect is another set of predicates.

(In situation calculus, it's similar. .... Bah, but I can't explain this without an example. I have to go back to my fairly world and contrive something with the upside-down As. So there is an idea for my next post!)

Anyway, my point for now is that in all my past experiences, the actions HAVE been "tied in" as arguments somehow in the state transition functions.

So if you usually have predefined probabilities of transitions between states - such as what your Markov Decision Process would rely on -- then this means that we usually assume that you define your actions and their consequences ahead of time, with probabilities. Is that right? Am I beating this one to death? I'm trying to establish what "the norm" is here, so that I can propose an alternative. But man, am I ever building on shaky ground here! But for the sake of explorations, moving on...

So what if, instead of having predefined probabilities of transitions between states, you already know exactly how the "order of things" goes. You already have a repretoire of microteach strategies. So we don't have any uncertainty about "what happens next" but rather we are choosing the best action from a SET of KNOWN processes. It just occurred to me that I may be looking at the difference between partially-observable MDPs and regular MDPs, I think, maybe. Because in my situation, since I know "the order of things", doesn't that erase the "partially-observable" part? On top of that I'm trying to braid together multiple MDP chains, representing different strands of interleaving learner goals.

So, suppose instead our job is to pick which strand to follow next. Usually, you want to follow a single, mainstream process so that the learner has a sense of continuity and purpose. However, for fun and variety you need to side tack onto other goals or interests and have some "side quests" going on. So what is the machine's role here? To anticipate valuable opportunities and present them to the learner. The machine is a filter on the opportunities out there. A faciliatator. A provider of context.

Pulling this back to decision theory: your distribution should be a discretization of story threads, each with a learning goal and quest history. The utility is a function over each strand, I guess, telling you how much value to expect from choosing that option based on maximum relevance, keeping the balance of continuity and progress vs. variety -- the LOC of control -- etc.

So the probabilities. What are these? How are these distinct from the utility function? Is it because we don't know what the learner will do next? We don't care, really, there is no sense trying to predict. Maybe it's based on what we assume will happen if they take that strand, because we don't exactly know what will happen when we go ahead and weave those learning objects into our story (where the learning objects are associated with the strand). I really like this thought, but I have to figure out how it fits into the math.

But 1 more problem: why would all our story strands have to sum to 1? I guess because you're bound to pick 1 of them, i.e. we know for certain that we will do SOMETHING. But that doesn't matter. We said already there is no sense trying to predict what the user is doing. Where is our uncertainty? It is on what our assembly will do tithe learner. How do we distribute that over 1? Definitely I have some thinking to do...... and there we go, baby is awake, blog time is over!!!!!!!!!!!!

Posted by Frozone Permalink on May 10, 2009 09:42 AM | Comments (0)
categorized under Computer Science & AI




May 08, 2009

This dichotomy

I was reading yet another paper about the application of Markov Decision Processes in AI planning (BI-POMDP: Bounded, Incremental Partially-Observable Markov-Model Planning [Washington, 1997]), and this one in particular had a clear and precise abstract . So clear in fact that it helped me pinpoint another subtlety that's been tugging at me since I began this journey - this dichotomy - between trying to define my problem and learning about the tools (i.e. concepts from AI) so that I can apply them in a solution. (hold that thought..!)

The other aspect to the dichotomy is that the body of research related to my problem has two "sides" with little overlap, and i'm trying to help bringthem closer together.

One side is so mechanical that you see things like "good pedagogy" being defined as "minimizing the number of teacher actions required for the student to learn something". You can see how this is easy to measure -- the number of "teaching actions" is easily counted. But measuring whether the students "learn something" is much harder. The other side of the literature is much more general, the topics more wide-reaching. Too big for a computational model. I'm in the middle, working with some computational awesomeness, but with "heart". =D

Anyway, the subtlety I was talking about was related to "uncertainty about action outcomes" vs. "uncertainty about the current state you are in". Yes, your actions affect the state... but... is there a way you can have a model with certain state transitions, leaving uncertainties as fringe observables or something? They can still affect your state transition using weights or whatever to push in another direction, but why design your system so that your state is fuzzy? I shouldn't use that word to avoid confusion with fuzzy logic. Uncertain, fuzzy... unclear? Misty? Blurry? Shaky? Not discrete? Continuous? Elusive? Gah.

Anyway, I'll keep reading the paper. But I am gathering my details. Gaw, haw, haw!

I feel like I must have said this already. Oh well, there's the swirliness of human thought for ya; backtracking and solidifying can be good, too. :)

Posted by Frozone Permalink on May 08, 2009 09:03 AM | Comments (0)
categorized under Computer Science & AI




May 03, 2009

Shape trees

In an earlier post, I was talking about an analogy between comparing 2 visual things vs. comparing 2 abstract ideas for the purposes of compare/contrast in an educational setting. Basically, I'm trying to make some progress in the field of instructional planning, i.e. using robotics research - how machines figure out how to walk around obstacles to reach their destination - and applying it in an analogous way, presenting ideas/activities/support to a learner as they work towards their own goals, where the obstacles are gaps in knowledge or crevasses caused by lack of context.

Anyway, I just wanted to say that at the end of the post, I mentioned that I couldn't find a particular paper that gave a document a "shape", and I wanted to see if those ideas could be used to give task domain ontons their own "shapes". I found the paper: Document identification using shape trees by [Henker & Petersohn, 2009]. I flagged it for reading!

Posted by Frozone Permalink on May 03, 2009 10:27 AM | Comments (0)
categorized under Computer Science & AI




May 01, 2009

Programming languages for AI planning

I asked my Twitter network the other day,

What programming language do folks use for AI planning these days? Is STRIPS still the primary choice? Or is there a "modern" alternative?

Even though I'm following a lot of AI folks, I didn't get any leads. Strange! But then I thought, maybe I should try harder to send out more tweets and answer other people's questions... maybe then I would get some of my own questions answered... isn't that how the universe works?? :)

See, I really want to take a closer look at the nails and bolts of PLANNING. Right now I'm doing a lot of math, and I thought that if I could figure out how people are building AI systems these days (i.e. which programming languages are they using...) then maybe I could learn about some more details. No luck yet, though!

So in the meantime, I dug up some assignments from my AI class a couple years ago and will take another look at those. We used situation calculus and STRIPS. Maybe those are still state-of-the-art, I have no idea. 'Will also read some more papers on planning to see if I can figure out how MDPs get rolled into planning, and maybe take another look under my new perspective of conditional probability (see previous post about "conditioning").

Update: I just re-read a couple papers in my field ("instructional planning") and both used STRIPS. Then I went and flipped through some robotics journals and found a 2008 paper that used STRIPS. I don't want to make too strong of an induction, but it looks like STIPS is still fairly state-of-the-art. Hrm...... okay....

Gonna go take the baby for a walk and think about this for a while.

Posted by Frozone Permalink on May 01, 2009 12:21 PM | Comments (0)
categorized under Computer Science & AI




Learned some stats lingo

I learned that "conditioning" means "to instantiate known variables" or, you could also say, "to apply the evidence to your influence diagram".

It clicked when I was reading this document (The Boxer, The Wrestler & The Coin Flip - PDF) by Andrew Gelman about the difficulties with Bayesian inference, and I read the sentence, "Figure 3(b) displays the posterior distribution after conditioning on the event X = Y."

And all I could think was , "What?" and I remembered from CMPT 417 that your priors are variables without any "givens", i.e. those variables with no observations, no evidence applied. So posteriors must be variables where values ARE known, i.e. where you HAVE introduced evidence into the network. So when he says "conditioning on the event X = Y" I remembered the probability notation P(A|B), pronounced "the probability of A given B" and that B is a "given", i.e. It is evidence, it is an observation, it is an instantiated variable. And that "B" is called "the condition" because P(A) - the probability of A is - is affected by your knowledge of B. If you didn't know B, then P(A) could be different, unless the events A and B are independent.

So when you say "after conditioning on X = Y" it means that you discovered the values for X and Y, and it turns out that they were the same. I'm still a little foggy on what exactly a "posterior distribution" is, and just how it is affected by your conditioning.

But at least I'm a step closer!

Posted by Frozone Permalink on May 01, 2009 08:52 AM | Comments (0)
categorized under Computer Science & AI




April 28, 2009

Conditional probabilities, and "the argmax thinggy"

Last time, after joyously discovering that I can now talk in LaTeX, I explored expected utility. At the end of the post, I said that I wanted to do more work on defining "the state". Like, what is it? Sort of related in my mind are conditional probabilities, because they often include state information. I guess I am looking at these tools - conditional probability, and the concept of "state" - and am trying to figure out how to apply them to my problem.

Some time ago, I talked about the "state" in terms of a predicate in logic. This time, I'm interested in looking at the state in the context of a planning problem using Markov decision processes (MDPs). The two main examples that I have gathered from my readings using MDPs for planning are:

Both papers use what I think of as, "that arg max thing". In the first paper, which is more technical, it looks like this: (can I get into trouble for reproducing equations??? I sure as hell hope not. This is all for the purposes of education and the advancement of humanity, so, I assume I will be okay. teehee.)


Latex: \pi^+(s)=\arg\max_a E_\pi\{\sum_{k=0}^{\infty}{\gamma^k} r_{t+k+1}|s_t=s, a_t=a\}

It was actually 2 separate equations, so I collapsed into 1 for the purposes of this blog, and I probably screwed up doing it... apologies to the authors if this is so! But I did do my best and think it will suffice for the purposes of this entry.

Anyway. It looks pretty scary. And it is, when you think about all the detail in there. But! I've seen enough of these guys now that it's not so bad. I'll try to explain what I know to you, focusing on our topic of the day, i.e. the state.

But first, the argmax thinggy from the other paper:


Latex: P(i)=\arg\max_a \sum_{j}{M_{i,j}^a}U(j) 

Much less scary! And it's very much the same pattern as the first one.

First, notice the pattern of the summation (the big sigma ) followed by some probability notation.1 I'll bet that this is the same as what we looked at last time, i.e. an expected value calculation which is a probability distribution and its utility function, all summed up. And the argmax just means, "calculate all of the expected utilities, and take the one with the HIGHEST expected utility".2 Because when we make our decision, obviously we want the one that looks like it's going to turn out the best for us!! Of course this all assumes you have a utility function that is a true reflection of your desires.

A lot of the rest, I think, is just details.

But unfortunately I am at a point in my research where the details are important, and I have to crunch down and work through 'em. So, to keep going, I want to address the conditional probability in the first equation (i.e. what comes after the "|" symbol, which I pronounce as the "given"). The part, I'm assuming just means "given that the state at time t is 's' and the action at time t is 'a'".

So in this case, the givens are just grounding the whole thing in current space time. This is different than what I was thinking. I was comparing the "state" to "observables",and thought that state=ontology and observables=epistemology. I mentioned this in a couple earlier posts.

I think I will conclude with one more equation, showing both the state and observables in an argmax thinggy. Notes: This equation shows the selection for an optimal policy. I think this is subtly different than the maximum expected utility (MEU). Here, instead of just calculating the MEU, we are selecting the policy (i.e. a set of actions, one for each decision node in our influence diagram) that when executed, will give us the MEU. This equation comes from my class notes from CMPT 417, which I took in 2007-08, instructor Mike Horsch.


LaTeX: \delta^*(o) = arg\max_D \sum_{S}p(S|O=o,D) U(S,O=o,D)

Notice the pattern again of the summation of probabilities followed by a utility function ("U"). Here, states and observables are differentiated. Here, D is the action. The other papers used an "A" to mean the same thing, I think. It is also interesting that the state, observables and action are part of the utility function. In my piecewise function from the other day (see my last post about the rainbow, the tree and the sparkles on the fingertips), I looked only at actions taken ("D"). Interesting, hrrmmm.

So I will end there for now. Next, I think I would like to go back to my fairy waving her wand example. Either that, or go back to my pedagogical ontologies and see how a DETERMINISTIC process (maybe instantiated as a policy??) might be used, thus freeing us up to have our uncertainty lie somewhere else. Hmmm, if the policy is known, then we don't have uncertainty in state transitions... or do we? I guess that depends on how you design the state. I think there is definitely uncertainty in your observations. If you grant this, then can you say that your state is "known"? I'd like to think so. Hmmm. So I guess that is not a very definite direction for next time, but I'll keep thinking about it.

Thank you to this person for their tip on how to write argmax in LaTeX!

Footnotes:
1
Note: In the second equation, the M is a transition matrix showing the probability that action 'a' will cause your state to move from 'i' to 'j'. I think this is an interesting way to apply the probability, i.e. as uncertainty of your state transitions. I think that you can avoid this problem - i.e. being uncertain about the state transitions - by applying utility in a different way, which I talked about in my last post. This would free you up to allow for a different kind of uncertainty without getting out of control, computationally-speaking. I had a dream last night that I was trying to write a proof showing the deterministic vs. non-deterministic elements of my system. But that is enough of this digression!
2
Actually, as I recall the argmax is a little more subtle than that. The 'a' under the "max" means you are taking the biggest "a", and the "arg", which means "argument", means you are taking some parameter that is related to "a", somehow. But the true meaning is escaping me right now; I hope to clarify in my next example!

Posted by Frozone Permalink on April 28, 2009 05:19 PM | Comments (0)
categorized under Computer Science & AI




April 25, 2009

Some wobbly steps forward... the math

I feel so liberated! I stumbled upon a tool that lets me easily convert LaTeX to something I can paste into my blog. Yes, I'm relying on 3rd-party tools, but, I don't even care. What I like is EASY. Not long ago, I would have bent over backwards, doing things the hard way, just so I could have done it "properly" so that I wouldn't be reliant on external resources, using only server space that I'm paying for (and therefore have some guarantee of its reliability), etc. However nowdays I save such effort only for the most important stuff. These days, I value shortcuts very much. So I'm gonna try this out.

So what else can I tell you about decision theory?

I can tell you that calculations will often include an expected value. I'm still trying to work out the big picture in my head, but, usually, the expected value of an action is equal to the sum of the probabilities of all possible outcomes, multiplied by the value of each action. Does that make sense?

You start with a simple probability distribution. I like to think in graphs and pictures. I've seen a lot of discrete probability distributions, so I will conjure up a discrete example (as opposed to a continuous example) here.

Suppose I am a magical fairy, and I wave my magic wand. Here is a probability distribution showing the possible outcomes of my wand-waving. As you can see, it is most likely that a tree will grow. It is slightly less likely that a rainbow will appear in the sky. Or, the smallest chance is that I will have stars twinkling on my fingertips as a result of my action.


Fig 1: A probability distribution for a planning problem

So the whole graph represents the results of an action taken. The other day, I was talking about the utility function. This is where you attribute some value to each possible outcome. So, suppose I am really hungry: the event that a rainbow appears in the sky won't help me, nor will stars twinkling on my fingertips. However, the tree might be an apple tree or something. So, I will place a high value on the event of the tree growing, and low values on the rest of the event. (If one of the events was "a hungry monster appears", I might assign this a negative value - a "cost" - because the hungry monster could eat some of my food, and that would be very bad!)

I don't know much about designing utility functions. So I am just going to wing it here. I'll just whip up a piecewise function like so, I hope it is not too ugly.


Latex:
r(x)=\begin{cases}
x = \text{A rainbow appears} & y=1\\
x = \text{A tree grows} & y=5\\
x= \text{Sparkles appear on your fingertips} &y=1\end{cases}

I probably didn't do that right. I was thinking that r(x) is my utility function ("r" stands for "reward"). The argument, x, specifies the event or action that was chosen where and of course the big X is defined in the probability distribution above.

I'll have to design a couple more of these and maybe send email to one of my mentors for help on the specifics!!

Anyway. So far, we have


  • the probability distribution showing all possible actions and their likelihoods
  • the utility function showing how much I "like" each of the possible outcomes above

Finally, here's the part where I wanted to try out my LaTeX skills. The expected utility of an action, E(x) is calculated simply by multiplying the utility by the likelihood of the outcome of that action. In other words, you multiply the probability of "a rainbow appears", which is 0.3 by its utility, which is y=1. Then you multiply similarly for the other two events. Finally, you sum all these numbers together.

So, E(x) =

Which expands to E(x) =


Latex: P(\text{x=A rainbow appears})r(x) +P(x=\text{A tree grows})r(x) + P(x=\text{Sparkles appear on your fingertips})r(x)

Hmm... I'm uncomfortable with my notation for r(x) here... and also for E(x) it doesn't taste right... but I can't put my finger on the problem. But anyway... moving on...

E(x) =

Latex:
P(\text{x=A rainbow appears})(1) +P(x=\text{A tree grows})(5) + P(x=\text{Sparkles appear on your fingertips})(1)

Finally, E(x) =

E(x) =

E(x) = 3.

Why did we sum all the numbers together? Well, if you want to know the overall expected cost for the entire situation, you note that the probability of any of these evens occurring is 1. When you factor in the rewards, you get a number (in this example, we got "3") that can be compared to another situation, say, if you were using a different utility function or if you had a different set of events to deal with.

I think. Gosh, I'm not even sure anymore. I need to think about this! I still have questions about the "state", too: Do actions result in states? Yes, obviously, but where would these fit in for my example?

Well that was fun, but clearly I have more work to do! I hope to come back with another example after I've done a little more reading. Eep! And I haven't gotten as far as employing a Markov Decision Process, or even applying a policy, or playing with observations or states. And the layout of my math looks ugly on this blog, LOL. And the LaTeX isn't rendering in my RSS reader.

But overall this wasn't bad for my first time.

Posted by Frozone Permalink on April 25, 2009 08:34 AM | Comments (2)
categorized under Computer Science & AI




test... LaTeX

Neat!

Posted by Frozone Permalink on April 25, 2009 08:13 AM | Comments (0)
categorized under Computer Science & AI




April 24, 2009

Planning vs Learning

Mmm, and another unpublished blog entry that I discovered from my cloud. (Here's the first one.)

***

In AI, Planning is when a machine considers the input it receives from its preceptors, and decides the actions it will take as a result. Learning is when the machine decides its actions based on past experience.

Can the two strategies - planning & learning - work together? I've always assumed so, but I never really explored the question until I read Geffner, Hector. Perspectives on Artificial Intelligence Planning.

Is planning "offline" and learning "online"?? No, I don't think that's it.

Learning is usually associated with reams of data.

I'm sure that planning can be done online (by "online" I mean "during runtime" as opposed to "precomputed"). Learning feeds planning.

What I really need is to work through an example.

Posted by Frozone Permalink on April 24, 2009 07:29 AM | Comments (0)
categorized under Computer Science & AI




March 31, 2009

Ontological comparisons as spatial relationships

Despite my new love for open notebook science, I have not been very loyal to the vision. All of my research notes lately are on paper. I blame it on having close-to-nil computer time. Alas! I will be so happy when mediatronic paper becomes cheap & affordable (heh, or even "in existence"!) so that I don't have to "wait for computer time" in order to share my thinking. I'd be able to carry around a crumpled up piece of paper in my pocket, write on it when I have a spare 15 seconds, and I would be able to categorize it and link it up properly in my blog from wherever I happen to be (walking the baby in the stroller, working in the kitchen, nursing, etc). Ah, yes. It would be grand.

Fubble wubbles; enough fantasizing, Frozone! =)

Lately, I've been thinking about how to squeeze my problem into a decision theoretic framework. I laid out my problem as an influence diagram. This helped me identify where the information required for my problem comes from. Sources include


  • the learner's history
  • assumptions projected onto the learner
  • usage data from other, similar learners (this is the world of recommender systems)

All of these are "chance" nodes, which come in two types: states and observations. I figured that the states are your assumptions or projections about the world (bullet point #2 above), and that your observations come from hard inferences from real data (bullet points #1 and #3 above). (I tweeted about this a while ago; you probably saw it if you are following me on Twitter, heh.)

Decision nodes where were I found myself trying to build patterns of behaviour, such as teaching strategies. I thought that you could implement a teaching strategy as a policy (also discussed in my first decision theory entry).

I've been using example teaching scenarios to try and lay everything out and abstract the "shape" of teaching. For some reason, I have this fix in my head about how I think ontological engineering will be an important tool here. Right now, I see it as a way to "rise above" and be able to give my computation a little more subtlety.

A "Top Ontology" is an ontology that includes concepts that are common across domains such as time, space, etc.. This is introduced in Breuker et al., which I talk about in this other entry. What I'm getting at here is that ontologies have layers; if they themselves are hierarchies then categorizing the ontologies themselves gives you a sort of hypergraph. (I'm not going to let my research veer in this direction yet because I'm in too much danger of sprouting fairy wings -- maybe I'll revisit when I have some mathematical foundations to keep me real.)

Getting back to the idea of the Top Ontology, I think that one (fundamental?) component that appears over and over in teaching is the ability to compare, contrast and present an idea as an example using a story or metaphor. It's like: how do you express one idea in terms of another? How do you compare two ontons? (I learned the term "onton" from someone else's blog, which I shared & commented on via my Google Reader. The title is Deciding, Learning.)

I'm trying to pick out my ontons for the process of teaching, and maybe these ontons will themselves form a hyper-ontology, sorta like the Top Ontology, but a level below that.

Being a mostly visual person, I naturally thought that the obvious way to compare 2 ideas is to compare their shapes. Contrasting points are represented with great distances, and close points are represented with short distances.

So this forks into 2 problems. First, how do you represent a subset of a task domain ontology as a shape? I was sure I'd found a paper where I thought I might find a lead, but now I can't find it. Was this it? I thought the paper was more about document profiling. Gah, I'll come back here if I find it. (UPDATE: May 3rd, 2009 - found it! Document identification using shape trees by [Henker & Petersohn, 2009].)

Secondly, how do you compare the shapes and work them into your plan? The problem of spatial navigation is commonly tackled in robotics. This is where you'll find the nuances of different types of teaching -- how you handle the contrasts of the shapes.

Gosh, I'm treading on shaky ground here. But then again, I always do, don't I? And I love it. LOL

Looking forward to the next post!

Posted by Frozone Permalink on March 31, 2009 07:51 AM | Comments (0)
categorized under Computer Science & AI




March 04, 2009

The importance of projection

I was playing with 'Faces' in iPhoto and laughed to myself when the software spotted a "hidden" face: the photo was of my mother holding my daughter, and they were standing in front of a bookshelf. The software guessed correctly my mother's face and my daughter's face (impressive!!). I laughed when there was a little square around a third face: on one of the books on the bookshelf in the background, there was a picture of a child's face illustrated on the front cover.

The conclusion I made is that projection is much more important than we may usually think about: as humans we rely so much on prior knowledge to "sift out" the zillions of things we subconsciously pick up on our inputs. In order to disregard the illustration of the child's face, I had to know ahead of time about the concept of illustrations on books, know that Mom & my daughter were standing in front of the bookshelf, etc..

Neat-o.

Posted by Frozone Permalink on March 04, 2009 10:53 AM | Comments (0)
categorized under Computer Science & AI




February 11, 2009

Decision theory for teaching strategies

I'm trying to wrap my head around decision-making under uncertainty using decision theory and Markov decision processes. After a lot of tumbling and turning, I realized that I'm trying to compare and contrast two things:


  1. optimal policy construction, and
  2. Markov decision processes (MDPs).

These are 2 ways (not the only 2) to tackle decision-making. This post is going to be about #1 -- I'll tackle MDPs another day. As for policy construction -- I began my hunt with the notion that you would start with an influence diagram such as the one here.

Your decision problem is modelled as a graph. The square nodes are decision nodes. The diamond nodes are utility nodes. The circle nodes are the same as the nodes in a bayesian network. Circles can point to diamond and squares. Squares can point to diamonds and circles. Only one diamond allowed per diagram.

A policy (I learned to represent policies with the greek letter delta, δ 1) is like a "rule of thumb" for the action you choose when faced with a decision. The optimal policy is the policy that gives you the greatest utility (from the triangle node). You can think of the policy as being connected to the decision nodes (the squares).

One thing that has confused me for a looong time is that your random variables (circle nodes) can be either "states" or "observables/evidence". Recently I had a little epiphany where I thought of the states as your ontology and your observables as your epistemology.

I'm perplexed about the application of a decision network like this. Would you use the same network over and over? I guess you would have to build a network for each type of decision you'll need to face. And, the only time you'd re-use the same network is if you face exactly the same sort of decision again. Although you could modify the values in the CPTs (conditional probability tables) if you had better information the next time around.

Anyway, a "policy" is something that you can apply to your decisions, and the policy tells you which direction to go. It is a function from states to actions. Basically, for all decision nodes in your network, (all squares), your policy is the set of decisions to make -- one decision per decision node. (So, is policy construction always an offline problem?)

With that background in mind, I want to return to the paper I was reading last time. Remember, my whole quest right now is to figure out, "What is teaching?".

In this paper, I think I was a little mislead by their usage of "pedagogical strategy". I was thinking, "oh, are they modeling how to gently guide a student vs. material that's new to them vs. challenging a student to get them to become even more familiar with material they've already been introduced to?" But after reading the paper a couple of times (and I could still be missing something -- the material was pretty dense and a lot of it very technical) I think what they meant by "pedagogical strategy" was "the order in which concepts are introduced". To me this is only a small dimension of a teaching strategy. It's like, this is the "content planning" without "delivery planning".

I was also a little surprised to learn that these researchers used artificial students. I didn't understand what was being measured with the artificial students -- which part of the system was being "tweaked" by optimizing against different types of students, and where they got the "student types" from. (Thinking, 'hey, could I ever use artificial students in my experiments?').

I missed out on learning about "Reinforcement learning" and how they were using MDPs. I still have so much more to learn before I can really grasp a lot of the research that is going on.

On the bright side, this paper did force me to take a closer look at decision theory.

Anyway, my journey about finding teaching strategies continues. I also feel like I'm getting closer to picking a thesis topic. (HA! I know, I've been saying that for years..... ugh.... lol). But, I'm confident enough this time that I might put this statement on my "About Me" page: I'm interested in how to model teaching strategies such that an abstract task domain ontology can be taken and "filtered through" the teaching strategy. This way, you'd have a universal machine that can teach. Scientists all over the world can continue to make discoveries about physics or math or chemistry or astronomy or geology or medicine or anything, and any Jane Doe could learn about it if she wants because she'd have a(n artificial) tutor to help her explore the material whenever and however she wants. I'd like to figure out how to take a learning object and weave it into an instructional plan that is conscious of overall themes and stories that can stretch from lesson to lesson to create an enjoyable, meaningful experience.

1I have also seen pi (π) used to denote a policy. I don't know if there is a difference or if it's just inconsistent notation.

Posted by Frozone Permalink on February 11, 2009 03:22 PM | Comments (0)
categorized under Computer Science & AI




November 02, 2008

Language, Symbols & Thinking

A machine's identity and "machine-telepathy"

An intelligent machine should be able to communicate with another intelligent machine. As software systems grow in sophistication using protocols, web services (or whatever!) to share information with each other, the line between individual systems can fade. A lot of the artificial intelligences out there (if I can use AI as a proper noun) would just be part of the sea of a collectivish mind.

For example, a tutoring system references an ontology about geology during an exercise with a student. A software system for construction equipment references the same ontology during a calculation. Although the 2 systems are completely different, they are accessing the same 1s and 0s for their calculations, so there is no distinct boundary between them. (Okay, okay, this is a pretty flimsy argument because I doubt there are actually any ontologies out there that are solid enough to be used by such a diverse set of systems. Right? Hmm. I don't know, now that I question it. Anyway.)

When do intelligent machines require identity to perform their task?

A task domain ontology's impact on communication: discretizing language?

In a previous entry, I fantasized about how object-oriented programming was like weaving thoughts. This fantasy was tickled again as I was reading the chapter called "Mentalese" in The Language Instinct, by Steven Pinker. In this chapter, the author shows how people's thoughts exist in a form "above" their spoken language, called Mentalese. It is only when people need to extract their thoughts from their heads and transfer them to another human's mind do we require spoken/written language. (Or, at least that's the gist I got from the chapter.)

I was surprised that there wasn't a discussion of ontologies as part of the chapter on Mentalese. (Maybe that's a topic for another book!) To me, it's obvious that a machine's "Mentalese" would use references to domain ontologies when "thinking" about an idea. Could machines communicate more effectively with each other by using some form of artificial Mentalese? Would this give them an advantage over humans, who are not telepathic, but rather require the use of a spoken/written language?

How would a Prolog program refer to an ontology?

I thought that the use of a task domain ontology discretized language in that in removed ambiguity. With an ontology reference, there is zero room for misinterpretation of what you're referring to. This loss of ambiguity might take away things that characterize human language: misunderstanding, humour, creativity... hrmm..

Do your symbols limit your capacity to "think"?

Related are discussions of how one's symbols impact what you can think about. Is the power of your thought restricted by the symbols you use to communicate those thoughts? (Likely no) Pinker cites Orwellian Newsspeak here: if your language does not allow you to talk about "freedom" because there is no word for freedom, then can you even fathom the concept? (Yes, in Mentalese.)

I was also reminded of Searle's Chinese Room experiment. Is the person doing the symbol-processing limited by the library of symbols he has? (I guess not, because, as you might argue, he's not even thinking in the first place because he's merely arranging symbols according to some matching algorithm that he did not derive on his own.)

Is Mentalese visual? See West, Thomas G.. Thinking like Einstein: Returning to our Visual Roots within the Emerging Revolution in Computer Information Visualization. Amherst, New York: Prometheus Books, 2004. I referenced this book in another entry.

Posted by Frozone Permalink on November 02, 2008 12:08 PM | Comments (0)
categorized under Computer Science & AI




January 01, 2008

Consciousness, and the other side of the loop

Hi everybody. I think that this is my longest blog entry ever. It's so long that I went in and included headings. I also feel like these thoughts mark another significant "step" in the evolution of my thinking. I hope to refer back to these thoughts often in the future. :-)

In my previous entry, I discussed a dialogue between a human and an artificial tutor. This time, I found that in order to better understand that dynamic, I had to get a better grip on consciousness itself, whether it be the human learner's consciousness or the artificial consciousness. I want a better understanding of both in order to shed better light on how the two could interact with each other.

To that effect, I started reading Consciousness Explained, by philosopher Daniel Dennett. (See previous entry again for bibliographical reference.) I'm about a third of the way through. I'm sad that the holidays are almost over, and it'll be much harder to continue research once everyday life returns to steal away one's time. Oh well; enough whining, Stephanie: seize the moment and get back to work! :)

In his book, Dennett explains the Cartesian Theatre, a popular way to think about consciousness. (Named, of course, for René Descartes because of his "I think, therefore I am.") The Cartesian Theatre model has a "viewer" (i.e. the Consciousness) sitting in a great chair somewhere in the mind, observing as perceptions come through, taking note of thing goings-on and generally being "conscious". The author takes us through a series of scientific experiments and different situations to show how the Cartesian Theatre fails to accurately model consciousness according to what science tells us about how the brain actually perceives things. He then presents the Multiple Drafts model, which more accurately represents the goings-on of our mysterious minds.

(Or, such is my take on the book after having only read a third of it!)

Why do humans sometimes think that we know things, but aren't sure if we actually know them? There is a fogginess to human minds. At the same time: How can humans make huge leaps and understand things while computers need to be more explicit in their reasoning? It's like humans can see another dimension higher than computers can. Why? It has something to do with the structure of our minds. For me, Dennett's model begins to address this.

To illustrate the difference between the Cartesian Theatre and Multiple Drafts models, I'll take a moment to summarize a portion Dennitt's discussion as best I can. If you have a copy of this book, this starts on p. 114.

Moving Red & Green Dots

Consider a study where subjects are shown an image of a red dot, then that image is quickly replaced with a second image, where the dot is moved off to the side some distance such that it appears that the dot has moved. In addition, the dot's colour is changed from red to green. Subjects were asked questions about what they perceived, and the study found that subjects found that the dot seemed to change colour "in between" the two images.

Following the Cartesian Theatre model, what the subjects reported in the study shouldn't be possible: How could you have a gradual progression to green? Wouldn't this require prior knowledge that the final colour would be green, which you wouldn't know until AFTER the second dot is displayed, and by then, of course, the progression in-between-dots moment is over.

What could have happened? Dennitt presents two questions: After the green dot was presented, did the subject's mind go back and *change* the original in-between memory to include a gradual progression of colour? Or, perhaps the preception of both dots was suppressed somehow, until after the subject's mind had the chance to construct the in-between frames then "play" the final construction of the event after it had been assembled.

Neither option is attractive. Inherent in these options is the assumption that consciousness follows the Cartesian Theatre model. Given the failings of this model, the author presents an alternative approach.

The Multiple Drafts model would recognize that the brain has a location preceptor and a colour preceptor, and as the colours and locations of the dot are perceived, there is no single "ah, ha!" moment where the subject realizes that the red dot gradually turned green as it moved to its final location. Instead, the brain merely had the red-preceptors and green-preceptors and dot-location-preceptors firing as the various inputs came into the senses, and, somewhere up the chain the mind registered the gradual change of colour as the dot moved. The brain would not go to the trouble of *creating* the in-between visual perceptions. Instead, it comes to the conclusion that this must be what happened, and adjusts from there. [Dennett, p.127]

Consciousness isn't a series of perceptions all filtering through the same spot. Instead, it's a collection of senses, each at varying levels of "editing", to use Dennitt's word. As conclusions are drawn, input-data that is no longer needed is forgotten, and instead the brain remembers the conclusions it made based on the various inputs.

This was quite a radical thing for me, to think about consciousness in this more dynamic way. I think it changed the way I'm going to think about AI from now on.

Impact of this new model on my interests

Specifically, this model addresses my worry about "starting empty" (paragraph 6 again in previous entry). My research interests are teetering between AI and AIED, where in AIED I desperately want to know how to build machines that will help people with their learning, while I can't do this without going deeper into AI and learning how machines think.

Trying to push out of the philosophical world of artificial intelligence and trying to move "back home" to the computer science angle of AI: Why is the Cartesian Theatre problematic, computationally-speaking? Well, to me, as I stated previously, it's related to the "starting empty" problem. There, I struggled with how to perceive a search or an algorithm and its role in the machine's overall consciousness. It's clearer to me now that, just like how human consciousness is not a "single point of thinking, i.e. the viewer in the chair, watching inputs and being 'aware' of them", similarly, a machine's consciousness can't be limited to the perspective of a single algorithm. There's more going on than than that; it's more dynamic.

(elaborating) To me, artificial intelligence isn't about finding a "super algorithm", that given a set of inputs and a program for behaviour, can perform effectively (or, if I had my way, *more* effectively than a human) at whatever task it's doing. It isn't about being ingenious enough to find that one-answer or that one next-action in a mound of data. I'm not saying that developing good algorithms isn't important - no, no, no! What I'm saying is that in order to have a thing that thinks - in order to have something that "there's something that it's like to be," (as my PHIL 292 prof would say) you can't commit the existence of this thing's mind to a single point in a computation.

So what does this mean? Does it mean that because computers are designed to have a single point of thinking - the CPU - that machines are intrinsically trapped, and that this is the limitation that keeps computers less-aware than humans are?

Nowdays the norm for a new computer is to have two or more CPUs. Will this change anything? I recall the Turing Machine, and the case of the machine having multiple heads. By increasing the ways/times that you can pay attention simultaneously, can you overcome the Cartesian Theatre problem?

I am really stretching into areas where I am not an expert here, but my memory is telling me about a proof I may have worked through once that showed that having multiple heads in a Turning machine does not grant any major computational gains; the problem-space of what is solvable is the same with many heads as with having only one head. It has something to do with a space/time trade-off. Anyway, the point is that I think that overcoming the Cartesian Theatre problem requires more than adding additional CPUs. (My point needs strengthening with a copy of that poof - I'll have to go rifle through some old notes.)

(Any theoretical computer scientists out there, if you should happen to stumble across my humble blog, I would love any insight that you could share! If this path proves fruitful, I'll march this question over to a professor or two and will come back with details. :-D )

Adding more CPUs, I think, would be akin to creating multiple Cartesian Theatres. This would not solve the problem; it would amplify it! So, the Multiple Drafts model approach may ask something like: Instead of having n single points, how can you interleave the precepts, such that they are part of the same self-aware machine? (*without* the requirement that the precepts pass through a common point!)

To explore this interleaving-of-preceptions-without-going-through-a-single-point, I want to discuss three ideas:
1) "Self-reference" and Kurt Gödel's incompleteness theorem
2) Trying to apply my baby-AI skills to build a computational model for consciousness. (Yeah, yeah, I'm known for biting off more than I can chew.) (Remaining questions)
3) How do collective/hive minds work? (Remaining questions)

Kurt Gödel's incompleteness theorem states that (clumsy paraphrasing by yours truly to follow) any rule that is provable in a language must make reference to the language itself. It's a recursive sort of definition. The theory needs itself. Your knowledge-base KB, and a sentence "KB is not true" have an interesting relationship. If the sentence is part of the KB, then you have a tainted/inconsistent KB. If the sentence is not part of the KB, then you have discovered a sentence that FOLLOWS FROM the KB, but isn't a part of the KB. Which means that your KB is incomplete. So, you cannot have both a complete KB and a consistent KB. (I hope I got that right!) I believe that there is a stipulation to this theory that your KB must be of sufficient complexity.

(Subsequent to typing this out, I stumbled upon a great link that presents explanations of Gödel's incompleteness theorem from many different sources. Reading these over helped solidify my own understanding.
http://www.miskatonic.org/godel.html)

Going back to the subtlety of how a mind can't be "a mind" if it is restricted to a single point in computation: How does this affect precepts and actuators of an agent? Where does the self-awareness fit in, logically? Is it part of a unification? Is it part of the definition of a precept and an actuator? If you "reference yourself", does that free you from being "in one place"? Maybe a precept must be modelled such that it is dependent on itself, in a way. An agent must always take its own actions into account, after all; if it is modifying the environment with its actuators (even if it is only moving, its point of observation has changed, which affects all other incoming precepts) so maybe this is the self-referencing coming into effect.

Re-thinking Precepts & Actuators

In Dennett's Multiple Drafts model, precepts go through several levels of "editing"; some are recalled vividly, others forgotten quickly. The different kinds of precepts overlap with and affect each other in a confusing vortex. (I took the word "vortex" from the preface of Douglas Hofstadter's Gödel, Escher, Bach; I haven't read nearly enough of this tome to speak fluently on its ideas.)

Given a set of n precepts, each with m layers of strength (i.e. weak ones are forgotten quickly and strong precepts are retained for a longer period of time), could consciousness be the result of each precept trying to make its way through "the mind" to arrive at the output, i.e. actuators?

There are a couple of issues here. First, I just noticed that I've discretized the spectrum over which a layer is forgotten quickly vs. having a high impact / retained for a long period of time. (How would one represent a continuous spectrum? An integral, I suppose?) From now on I will call my 'm layers of strength' with this name: "\intM" layers of strength. (Mathematicians, forgive my horrid notation!) Another interesting point is that my question about memory culling in a previous entry is addressed here: Although you don't explicitly delete memories, this model does take into account that some perceptions are forgotten quickly and even discounted once the mind has made the conclusions it needs to make with that input. Only the conclusion is retained; the original memories are no longer needed and are therefore forgotten.

Secondly: I spoke of each precept "trying". Would each precept have GOALs of its own? How do collective minds work? Where do the overall goals of the system come from, and how do the individuals influence it? Does each individual have a speciaized type of precept, or does each individual have access to a set or subsets of all possible preceptors, maybe with different aptitudes for sensing on each?

On the topic of "overall goals" of the system: How can you have an overall purpose without succumbing to the Cartesian Theatre problem? Can an overall system have goals without all of it's precepts going through a single point of checking-in? Yes, I think so.

If consciousness is some function of N preceptors by /intM by A actuators, can you have consciousnesses within consciousnesses as these streams from N, /intM and A interleave with each other?

In his book Programming the Universe Seth Lloyd presents how the universe could be a giant consciousness that is constantly computing itself. Given that humans are conscious, and humans are subsets of the stuff in the universe, can this computational model support consciousnesses within consciousnesses?

...

So many questions!! Alas, I have to end here. I feel like I've built up a solid foundation for something awesome, but have to um, withdraw before the climax. ! On paper I have sketched out more fully some ideas on how to recognize consciousness in this way of thinking. But they are so rough, I can't even put them into words yet. So, sadly, I have to leave for now.... It's ten-thirty at night, and I have to go to work in the morning. I'll be back with more in the next entry!!

If you are still reading this, thanks for staying with me to the end. (Of this entry. )

Love Steph :)

Posted by Frozone Permalink on January 01, 2008 10:35 PM | Comments (0)
categorized under Computer Science & AI




December 26, 2007

Completing the Cycle

Maybe, in an Intelligent Tutoring System, the idea is to "Complete the Cycle".

I've read several popular science books lately (see references) that talk about the human brain and how it learns. This is one of those things that becomes glaringly obvious once you put the pieces together, but is less obvious when you've only seen the idea in scattered bits under different lights. I realized that as a (wanna-be!) designer of AIED systems, my job is to turn the computer into a "mirror" students to be able to see their own understanding next to their learning goals in front of them, specifically so the brain can complete the anticipation-verification loop required for learning.

The brain is constantly in anticipation-mode, and sense-data inputs either affirm or contradict what the brain is anticipating. Part of the art of student modelling, I think, is to have the computer know what "anticipation-mode" the student is working under. The computer should know when it presents something, whether the student will say "Yeah, yeah, I was expecting that" vs. "Huh? That's different; I wasn't anticipating that."

People learn by "latching on" to something that they already know. The computer's job is to present information in such a way that it is reflected as an extension of their own individual understanding. This is a link to delivery planning as in [Brecht 1990]. When the student is presented with the "Huh?" and it follows logically from their existing base-of-understanding, then learning occurs.

How does the computer know what the student would be anticipating, and what knowledge framework they are starting with? I don't know, but I'm working hard to find out. :-) I think that the route to an answer requires some serious artificial intelligence and knowledge representation & inference techniques, combined with some well-designed gathering of statistical data. (Ecological mining of one's "own" data as well as similar users' data.)

In addition to giving me hints about how to teach students, I think that my readings into neuroscience are highlighting the differences between human thinking and artificial thinking. Brains are very reflective: there are direct physical mappings of what is perceived and how that perception physically manifests itself as an arrangement of synapses firing off. [Zull 2002 - see chapter 8] In contrast, an artificial mind's perceptions don't have any physical similarity to its sensory inputs. I think it's interesting to look at this, and I wonder if computers really could "draw shapes" around patterns amongst reams of data, that it could achieve a higher level of consciousness of what's going on. Using constraint-satisfaction problems as modelling granularity during the diagnosis of a student's misconceptions as in [McCalla and Greer, 1994] is relevant here.

But I still wonder - in terms of data mining - how patterns might somehow emerge. I think that I'm looking for consciousness. I've always been a little bit lost when studying AI and Search because I have this stipulation in my head that a "search" has to start "empty" and look through data-items one at a time until a specific goal is found. I don't think that's the point, though. What sort of search algorithm would take into account the current situation (not "starting empty") and could pattern-match, even without having a specific goal...? How can you look for something in pieces, where you have infinitely many preceptors, each of a different kind (ex. humans have vision-precepts and hearing-precepts; I imagine a machine would have, I don't know; task-domain-precepts, student-modelling precepts?) and how could the same event, perceived from all these different dimensions, unite in the machine's consciousness somehow to give it real understanding of what's going on? Is unification of precepts the "shape" that I'm looking for? Is this consciousness - a unification of different precepts with knowledge of how this is relative to everything else in the creature's experience/existene?

How could you demonstrate a student's current understanding to them (explicitly firing the right anticipation neurons)? Would this be as simple as an overlay model of a task domain ontology? (If I remember correctly, discussion of the overlay model of student modelling comes from [Wenger, 1987] in a survey of many student modelling techniques). I would like to use the overlay method as a visual tool, but I think that the actual student model would require more detail than just a subset of the task domain ontology, i.e. we need the statistical interaction data as part of the student model beyond the overlay.

How do the range of pedagogical techniques relate to such an understanding of the human brain? Specifically, can pedagogy inspire any algorithms we invent to output an environment (sub-environment, really) in terms of how to help student brains complete their anticipation-verification cycles? Don't forget the importance of spatial orientation within the greater task domain. Really, this algorithm I'm thinking of is only half-way through the cycle. You have to show students where they are, but more importantly, the knowledge (sub-environment) has to come from the student, not from the computer, for learning to occur. We need a malleable sub-environment for students to present ("create") their understanding. Going back and reviewing their own work would be a more effective reference book as well. (data left behind from malleable environment = more statistical data for next time)

How do you use a sub-environment to "extend" a student's current understanding to their learning goals in the task domain ontology? I need to dig a little further into psychology. Use of shapes & imagery is surely culturally-dependent. 'More ontologies needed here. Would you show students an empty path from their understanding to the goal, and fire their neurons so that their anticipation-senses are active at the same time as the new raw sense-data of the new input, and then have them perform a (kinesthetic) activity to build their own mappings (to fill in the gap) to physically latch their new learnings to their old framework?

I have lots of work left to do! Thank heavens for the holidays. :) :)

References

Brecht, Barbara. "Determining the Focus of Instruction: Content Planning for Intelligent Tutoring Systems." Ph.D. thesis. University of Saskatchewan, 1990.

Dennett, Daniel C.. Consciousness Explained. New York: Back Bay Books/Little, Brown and Company, 1991.

Hallowell, Edward M., M.D.. Crazy Busy. New York: Ballantine Books, 2006.

Hicks, Esther and Hicks, Jerry. Ask and it is Given. Carlsbad, California: Hay House, Inc., 2004.

Hawkins, Jeff. On Intelligence. New York: Times Books, 2004.

McCalla, G. and Greer, J. (1994) Granularity-Based Reasoning and Belief Revision in Student Models, in J.E. Greer and G.I. McCalla (Eds.), Student Modelling: The Key to Individualized Knowledge-Based Instruction, Springer Verlag, Berlin, pp. 39-62. (Nato Research Workshop)

Wenger, Etienne. Artificial Intelligence and Tutoring Systems: Computational and Cognitive Approaches to the Communication of Knowledge. California: Morgan Kaufmann Publishers, Inc., 1987.

West, Thomas G.. Thinking like Einstein: Returning to our Visual Roots within the Emerging Revolution in Computer Information Visualization. Amherst, New York: Prometheus Books, 2004.

Wolfe, Jeremy. "Lecture 8: Cognition: How Do You Think?" Introduction to Psychology. MIT OpenCourseWare. 2004. Accessed July 7, 2007. [http://ocw.mit.edu/OcwWeb/Brain-and-Cognitive-Sciences/9-00Fall-2004/CourseHome/index.htm]

Zull, James E.. The Art of Changing the Brain. Sterling, Virginia: Stylus Publishing, 2002.

Posted by Frozone Permalink on December 26, 2007 11:15 AM | Comments (0)
categorized under Computer Science & AI




September 30, 2007

Thinking about types of search problems

So, with too much time on my hands between assignments, I've been thinking about a certain type of problem and have been wondering about the shape of it. Like, what would be the "search space" of this problem? What is the shape of it, and, what approach do you take to solve it? Is it a search-problem? A sorting-problem? I don't know. Here it is:

Imagine that you have reams of data, and that you regularly perform a variety of queries on it. The data grows and grows, and, although more data is generally better in terms of having a complete data set, this baggage problem is starting to get messy in terms of the portability of your sub-systems and for the performance of your data mining algorithms.

Would cleaning up the data be worthwhile? Could you remove duplicates / repeats of similar situations, somehow, to reduce the sheer amount of data? Could you just take note of the number of occurrences of some patterns of data, thus keeping summaries instead of needing the entire data set? (Removing data, recording key summaries, creating shortcuts for yourself, generating a set of generalities)

Could this "memory-culling" be done without destroying too many interesting randomizations from the standpoint of algorithms that use randomization points (simulated annealing, random-restart hill climbing, etc.) Would this be considered a type of sorting problem, rather than a search?

Hmmmm.....

(Updates: Is it along these lines where you start to invent heuristics? What's a heuristic again? *furrows brow*)

Posted by Frozone Permalink on September 30, 2007 12:28 PM | Comments (0)
categorized under Computer Science & AI




July 14, 2007

Teaching Large Classes, Web 2.0, Agent environtments & eProfiling

It's funny how ideas & events that span, like, a YEAR can come together in a coherent salad if inter-relatedness. *chuckle*

Not too long ago, I attended a pre-conference workshop called Becoming a Teacher of Large Classes at the STLHE conference in the middle of June 2007. I rather felt like an imposter, sitting in on this intellectual discussion among seasoned faculty as they shared their experiences, wisdom and inside-stories about what it's like to be that Sage on the Stage, lecturing classes of 300++ students, and how they struggle to make course content meaningful to the individual among the masses, and what techniques they've used to actually accomplish this. It was fascinating! My notes don't do the workshop justice, but here they be. I hope to have more of these precious-rare experiences in my future!

Meanwhile, I had an ulterior motive for participating in the group: I wanted to glean some valuable pedagogy about handling large-classes and then apply it to online-classes, thinking that online classes were similar to large classes in that students may also feel disconnected and "lost in the masses". This naturally leads into the next segment of this blog's subject line: Web 2.0, or, web-applications that become more effective as more people use them. Right now I'm reading Semantic Web and Education by Vladan Devedzic, which is offering a well-poised refresher on technologies from a software engineering perspective for a computer scientist like me who is desperately trying to link the Work of the Wise in pedagogy to these technologies.

In a previous blog entry (gulp, literally a YEAR ago), I explored applications of AI agent environments on the semantic web. This work can be found on my blog's archive pages about agent stuff. I deliberated over two approaches to viewing the environment:
1) the "Environment" in which agents interact is actually the whole Semantic Web, and each learning object or person is its own agent
2) the Semantic Web is not transformed or formalized into a giant AI Environment, but, rather things remain as they are: a mix of hand-coded HTML files, web applications, meta-data in XML files sitting here and there, streaming video servers, RSS feeds, you name it! Where do agent environments like JADE fit in? These environments could take the form of pockets that could be attached to the odd server here and there to the Web as we know it. Perhaps these environments would advertise themselves using a WSDL-like framework. Environments could be clustered together in graphs (in the Computer Science sense of the term) to represent how one community relates to the next. A user's personal agent could travel from community to community by use of a SOAP-like exchange, where the agent (represented in Java objects) could be transported from environment to environment as it hops around from server to sever. This way, you get to have the beauty of AI-environments on the semantic web without having to do the impossible and change the chaotic form that the Web is today.

Allow me to copy/paste here a relevant snippit that I composed in a note to a colleague in August 2006. I hope that it's not too far out of context and that he doesn't mind my publicising this part of our conversation. :-D (How selfish, all of the dialogue below is me talking, but I don't want to publish stuff written to me in the confidence of e-mail without permission...)

(snip)

"In the general semantic web, I agree that agents should have a stateless, protocol-independent framework for interoperation; With millions of agents, each with multitudes of different goals and purposes, I think that it's necessary to have a clear and simple communication model that's as beautifully adaptable as RDF-based languages. However, at some point, agents will need to talk to each other about very specific things, such as a mentor tutoring a student, for example. The two agents need to share deep understanding of each other's roles and history, if only temporarily and about a specific topic. The teacher inevitably needs to keep constant track of the student's progress so that he can continue to offer relevant and effective help. Of course, even in a classroom setting, humans are humans: the student, out of pride, may have difficulty revealing that they're having trouble with a topic and may lie about their understanding. Or, the teacher (be it a human or artificial entity) will inevitably be faced with a question from the student that it cannot answer. Each agent's impression of the other agent will necessarily be limited, but it seems to me that agents do need some sort of constant understanding (i.e. a model) of the other agent in order to meaningfully interact. For example: "I hired Tim to build my deck because he did such a good job on the fence last year -- I will ask him to use the same kind of wood." Tim's agent would surely need a way to retrieve the type of wood that he used with the Stephanie client on last year's fence project, unless Stephanie, who knows absolutely nothing about fence- or deck-building in the first place, is expected to recall such a detail. ('Another question to which I have no answer!)"

"For all I know, inter-jade agent communication is already RDF-based. I agree with your thought on abstracting the physical location of the agents away from the problem: Really, given a good framework, it doesn't matter where the agent "lives". At some point it will be necessary for agents to retrieve historical data about themselves or about other agents somehow, and these memories can't very well be saved "in the agent" due to the baggage problem. As long as we don't rely on storing all data in memory on the local server, I think that Jade agents can be considered expandable and could very well be living on totally separate servers. I haven't progressed far enough in my research to actually have run into the need for specific communication protocols that can only run within a local jvm. I'll certainly keep our discussion in mind as I progress!"

"Lately I've been studying teaching strategies so that I can put a bunch of agents together in a tutoring scenario and actually figure out how they will interact. Can all communication be done with appropriately-referenced ontologies using RDF-based messages? My intuition is that the answer is yes -- now I'm eager to start designing an experiment; I'm also eager to learn about Protege. I feel that the time is right to take another crack at AgentOWL, ..."

(end snip)

... ta da! Reading that a year later and it still sort of makes sense to me; I hope it makes sense to you too! (if anybody is actually reading this. LOL)

I'm going to step back a couple of paces to the Pedagogy-Web 2.0 theme. In November 2005, I attended the Alberta Online Consortium Online Learning Symposium and I did glean some good pedagogy form a session there, also. The presenter, Darrell Nunn, shared a teaching method he has developed over his years teaching online. His method requires students to think deeply and collaborate intensively with each other though the subject material. Such co-operation on a deep level requires online bulletin board or forum software.

Here is a snippet of my conference notes from that session:

(snip)

Here's how a typical assignment works:
1) A set of 30-or-more questions is released
2) Students use a "claim" posting to grab dibs on one question. Thus, each question has 1 student assigned to kick-start the discussion. In their answers, students should focus clearly on some key ideas for other students to respond to. Such answers are 1-3 paragraphs.
3) In addition to publishing "answer" postings, students are required to publish replies to a certain number (7 or 8, if I remember correctly) of their peers' answers.

Due to the asynhronus nature of bulletin boards, students can think carefully about how they will reply. This allows discussions to become more thoughtful and analytical. In this way, students develop higher order thinking skills as they deeply explore the classroom material. Even better, they're doing it by bouncing ideas off of each other. The teacher can sit back and act as a "guide on the side" rather than "the sage on the stage".

Criteria for marking and evaluation are provided for students on Darrell's website. http://ilearn.senecac.on.ca/darrell.nunn/coursestruct.html#sample (See Writing Guidelines - 'Posting an Answer' and 'Guidelines for Response Postings')

Mid-term and final exams require students to analyze threads of discussion from the term and draw conclusions about the development of ideas as they were tossed back and forth between students during the term.

This teaching technique is particularly handy when teachers are asked to run 2-3 sections of the same course. In these cases, teachers can send -all- students to the same bulletin board and simply increase the number of assignment questions available. This gives students more variety of choice in their assignments and also allows for better depth when students go to publish their replies.

(end snip)

I also wanted to talk about eProfiling in this blog entry, but admittedly I'm getting tired. :-) Basically, the gist of using eProfiling to benefit large classes is that it gives the instructor some means of discovering more about the general demographics of the students so that they are better able to contextualize the information they are presenting to make it easier for the student to relate to the material and thereby increasing the overall "learning" that occurs in the large class.

Whew! Well, there's my discussion for the day. I wish I had more steam to keep going, because I've got lots of good stuff to put together and interconnect with the rest of the stuff in my blog. Good thing I'm young; I can always come back and keep writing next time!! heh heh ;-)

Posted by Frozone Permalink on July 14, 2007 11:16 AM | Comments (0)
categorized under Computer Science & AI




February 03, 2007

Memory Culling: Necessary part of intelligence? (artificial or human)

Speaking of manually imposed amnesia: I've been picking up trends lately about how a truly intelligent being (artificial or human) has mechanisms for long-term memory retention. Given the huge overload of information available to both humans and machines (via sensory input, information on the 'web, agent-to-agent interactions, etc.) the only way to keep coherent and useful long-term memories is to use a mechanism for culling out only the minimum, key "hooks" to the past events that will enable you (artifical or human) to put back together the information you need from your past experiences.

In Ray Kurzweil's, The Age of Spiritual Machines in Chapter 4 under the sub-heading "The Holographic Nature of Human Memory", he discusses how as humans, we don't store every memory of our friend's face as see from different angles and different lighting conditions; instead, our brains store her face as a series of synaptic strengths. I'm no biochemist, but I understand that the "immediate" or "what I'm experiencing now" recognition occurs in the cortex (outer-layer) of the brain, and, (according to what I read in Kurzweil's book) the longer-term memories are stored deeper in the brain chemically encoded in RNA or in peptides. Somehow, the human brain decides which experiences from the cortext get filtered down and stored more deeply for long-term memory.

In Edward Hallowell's CrazyBusy he explains "Rhythm", a mechanism to put some busy jobs in our lives on auto-pilot in order that we may better focus on the unpredicatable, impossible-to-prepare-for demands that come flying into our lives. Like riding a bike or playing the piano: At first, some actions are difficult for humans and require meticulous attention over every movement. This activity happens in the brain's frontal lobes. Eventually, though, as our brains learn the activities, the actions move back into the cerebellum. When this part of the brain is working on the activity, we don't have to pay attention to every little detail -- each finger movement on the piano, for example -- instead, our minds now have room to concentrate on other things, such as putting "expression and shading" into the song (as recounted in CrazyBusy), or, carrying on a conversation with a friend as you ride your bike. This "clearing out" of the attention on meticulous frontal-lobe activity, I wonder -- is this also a function/ability required for intelligence?

Bringing all this back to the core of this blog - AI in education: Grant that "you" are an artificial agent and you are representing, say, a learning object on the semantic web. Your job is to make yourself useful under different contexts as various learners and learning-facilitators come along with their instructional plans and with their lesson designs and query "you" along with millions and billions of other agents like yourself to see if "you" will suit the purpose of the context-at-hand. Supposing that you do meet this criteria, you'll have to negotiate with other agents as you settle yourself into an arrangement with other learning objects for a quick brush-up lesson for a student, say - then, you'll record your interactions with the other learning-object-agents and also with the student's agent as the student breezes by on their learning journey. At the end of just this one learning interaction, you will be left with a snapshot of the student model and snapshots of how you related yourself to other learning objects (as in, if I'm not too far-off-the mark, [McCalla 2004: The Ecological Approach to the Design of E-Learning Environments: Purpose-based Capture and Use of Information About Learners] and [Vassileva, McCalla and Greer, 2003: Multi-Agent Multi-User Modelling in I-Help]). That's a lot of memories. How do you figure out what to keep and what you need, so that you know how "you" are useful relative to other learning objects and to certain types of students using your long-term memories? What do you keep and what do you discard because you can re-assemble it later? What core things do you need in order to be able to re-assemble later?

Memory Culling & intelligence. Hmmmm. Surely this is a topic in AI studies somewhere...? Ah, ha! I smell a trail. 'Time to go dig up some papers. =)

Posted by Frozone Permalink on February 03, 2007 02:35 PM | Comments (1)
categorized under Computer Science & AI




December 23, 2006

Complexity of Task Domain and its effect on the adaptability of tutoring systems

Does the complexity of a task domain ( 'task domain' à la Kurt VanLehn's The Behavior of Tutoring Systems ; same paper also referenced in my other post ) affect a tutoring system's ability to offer the right amount of advice/guidance for the student?

I would say yes, at least at first. I mean, if you have an incredibly simple task domain " This is an Orange" then how far can you possibly get when the student is faced with the problem of being asked to identify that bumpy, round fruit on the table?

As with anything, you could get as complex as you want: look at the molecular structure of an orange, or even trace the appearance of the fruit thoughout literature and in movies. But, our actual "task domain" is really only as good as the data we've got on it. I may only have recorded the basics:

- an orange is a citrus fruit
- most oranges have a peel on the outside, then some white stringy stuff, then the wedged, juicy fruit on the inside
- oranges grow on trees
- oranges are healthy to eat. 'Good vitamin C.

So, if a student is standing there, scratching their head and wondering where the hell to start, I may hint, "It is a good source of vitamin C" or, "it was in that movie with George Clooney." (lol, I made that up.) Maybe the student eventually catches on, and we have to offer a little more background information on citrus fruits so that they know more about oranges next time.

Maybe that was a dumb example, but my point is that I can't build a plan in the AI sense of the term if there is not enough domain material to actually break down and approach from different angles.

I should have re-worded my question:

"Does the amount of data we have on our task domain affect the system's ability to be an effective tutor?"

I think the answer is probably yes. But, here is the real question:

"Is there an upper limit?"

Even supposing that we had an infitite amount of data at our finger tips (cough, cough - the semantic web), are there some tasks that are just too simple for a system to be able to plan a tutoring strategy?

I was thinking about my item-selection service again. All the user has to do is "pick 10 items" out of a big list of 200 or whatever. Imagine that this is a puzzle game, and there are lots of rules and conditions outlining goals and patterns. Wow, just as I was typing this I thought of an excellent analogy: If you are a Warcraft fan, think of the allocation of your talent points. ( It's been almost a whole year since I've played, but I am still an Affliction Warlock at heart. Die, my pretties! Slowly! Mwuahaha!)

*cough* Ahem.

The "job" is very simple -- just pick your selections -- but there is so much background behind it, you have to consider each choice carefully, and, all of your subsequent choices depend on your previous ones.

In the item-selection tool I'm working on ( albeit not as cool or fun as WoW ) I had brainstormed some ideas on how to teach the user some of these background rules and how to plan ahead to see how their options now can set themselves up for juicier selections in the future. I found myself wondering about the limits of how far I could actually do any tutoring -- after all, the user just needs to click 10 times and they're done..... but it's the thought-process behind the action that's significant and requires the tutoring. I guess it's the same kind of thing in a multiple-choice quiz; the action itself is quite easy; anybody can circle an answer -- it's choosing the right one that's the challenge.

So I don't know if I've answered or even articulated my question at all but I think I've clarified in my head that it's the "thoughts behind" the action I should be tutoring, not the process of making the clickies themselves.

Posted by Frozone Permalink on December 23, 2006 09:11 AM | Comments (0)
categorized under Computer Science & AI




November 22, 2006

Memory problems (java.lang.OutOfMemoryError)

I built a system that has a lot of objects and a lot memory references. It runs on a servlet. During the servlet's initialization, I run a series of object-creations in order to load up a domain model -- basically, I have to load up all the models that will be shared by all of the users.

My problem is that the initialization routine fails on a:

Exception in thread "Cluster-MembershipReceiver" java.lang.OutOfMemoryError: Java heap space

... and the servlet doesn't even run at all -- HTTP 500 error on every request, caused by the NullPointerException which in turn is caused by the failure of my data structure to load up.

I tried increasing the java heap but I haven't had any luck.

I noted that if I comment out the last part of my initialization procedure -- i.e. I prevent a large number of the objects from being created -- then the servlet boots up just fine. The only problem is that pieces of my domain model are missing!

How do people usually deal with such problems? I think I'm faced with the necessity of cleaning out several objects from my system. Let's see, I've got 45 main objects, each with 3 main attributes, who in turn have about 5 attributes each. That's, what? 675 memory references? How much memory could that possibly take, and why is that enough to crumble my system?

Sigh. I was so pleased with the number of things that my system is capable of "thinking about" and "rationalizing about", but alas! Reality has struck. Although computers are capable of holding more "stuff" or "awareness" in their heads than humans are, it appears that artificial intelligence, too, has its limits.

Update: I discovered an interesting tool (read about it here) -- 'am running jdk 1.5.0_09 on Windows (I know, I know -gag!- Choice of OS is outside of my control) ... and, anyway, if you go to the command line and type in jconsole you get a nice admin window showing you how much memory the heap is taking. To get it working - again, this is in Windows where Tomcat is running as a service - go into the Monitor Tomcat application and click on the Java tab. Add the following lines at the bottom of Java Options:


-Dcom.sun.management.jmxremote
-Dcom.sun.management.jmxremote.ssl=false
-Dcom.sun.management.jmxremote.port=9004
-Dcom.sun.management.jmxremote.authenticate=false


Next, open your command prompt and type 'jconsole'. Next, go to the Remote tab and enter "Host" = localhost and for the port, enter 9004 (or whatever you typed in above).

Jconsole should boot up for you from here, monitoring your Tomcat.


Update: I solved the error!!!!!!!!!!!!!!! I am still trying to wrap my head around it, as it doesn't really make sense, but I'll tell you what I did.

Basically, my data-load routine that was throwing the OutOfMemoryException would connect to an LDAP directory to read in people's names, job titles, group memberships etc.. So, the fact that I commented out the last part of my data-loadup routine didn't actually indicate that I had too many objects, as I had throught. Rather, it was merely co-incidental that the one person in our directory with a null memberOf attribute happened to get loaded up in the last phase. Strange how humans make these connections.... "I got an OutOfMemoryException, so I disabled some of the objects, then my error went away. Therefore, the cause of my problem must have been that there were too many objects in the component that I disabled." How wrong I was!!!

Basically, all I had to do was check to see if the Attribute was null before I started looping through it. This fixed the problem. Weird!!! Why did I get an OutOfMemoryError instead of a NullPointerException? Oh.... wait a minute, I -did- get a NullPointerException in my browser's HTTP 500 display. The OutOfMemoryException was in Apache Tomcat's error logs. How can a NullPointerException also allow the OutOfMemory exception? In a threaded application, maybe, where the pointer in question is needed in a loop in another thread? I don't know.

Below is my code -- the correction is in bold.

Here I am just setting up my JNDI contexts and was gathering attributes. Note, code is snipped/edited.


LdapContext ctx = new InitialLdapContext(ldapEnvironmentHashtableParameters,null);
SearchControls searchCtls = new SearchControls();
searchCtls.setSearchScope(SearchControls.SUBTREE_SCOPE);
NamingEnumeration answer = ctx.search(searchBaseString, searchFilterString, searchCtls);
while (answer.hasMoreElements()) {
 SearchResult searchresult = (SearchResult)answer.next();
 Attributes attrs = searchresult.getAttributes();
 if (attrs != null) {
  Attribute memberOf = attrs.get("memberOf");
   if (memberOf!=null) {
   //for each memberOf attribute, do something
   for (int i=0;i     // domain object setup, blah blah
   }
  }
 }
}


Update, January 12th 2007 -- I am also developing a suspicion that this error can be caused if you are running your Tomcat on a machine that only has the Java JRE (as opposed to the JDK, which includes a compiler.)

I was also getting this error when I was missing the file TOMCAT_HOME/webapps/MyApp/WEB-INF/web.xml file.

Posted by Frozone Permalink on November 22, 2006 04:52 AM | Comments (0)
categorized under Computer Science & AI




November 04, 2006

When in doubt, do some math

A couple of the papers I've read recently (well, re-read in one case, lol) both discuss the mathematical foundations for intelligent behaviour.
- Using planning techniques in intelligent tutoring systems (Peachey & McCalla, 1984-85)
- Foundations for the Situation Calculus (Levesque, Pirri & Reiter, 1998)

Both discuss the use of predicates for notating "states", as I think of them. That is, "Student knows concept A" -- SK(conceptA) or "John is a human" -- human(John). I'm still foggy on existential localities, like, "Where is the object of focus in the predicate's assertion? In the function, or in the argument?" Perhaps I am confused because I am focused on the flesh-and-blood people. In the first one, the person is in the predicate itself - "*Student* knows" - but in the second one, the person *John* is not the predicate, but rather he is the object.

After pondering the problem for a moment, it seems to make sense that a predicate is the state of being that the object can be. So, the conceptA is "in the state of being known by the student" and John is "in the state of being human". I guess this means that the flesh-and-blood person can be either the object or a part of the meaning of the predicate; it's just a matter of context. I trust that this will make more sense as I keep going. In elemenatry school, I never really did understand what my teacher was talking about when she mantra-ed, "Every sentence has a subject and a predicate!" 'Bout time I dealt with that. Onward, ho!

I will now attempt to make a connection between situation calculus and the semantic web.

The first time I heard the word 'Ontology', I was sitting in a second-year philosophy class entitled 'Introduction to Metaphysics'. The second time I heard the word, I was sitting in a weekly meeting with my honours degree supervisor and he told me about the field of study called 'ontological engineering', or the work of organizing knowledge into meaningful structure. (See my other post about ontological engineering.) I wonder how these studies from the Arts, Sciene and Engineering relate to the field of Education. Curriculum studies, maybe? I wish I knew more about education.

In an earlier post back in July, I discussed a sample Semantic Web ontology about Grade 11 Chemistry. How do the ontological statements in this specification relate to situation calculus?

Well, the components of Grade 11 Chemistry are written in OWL - Web Ontology Language - which is an extension of RDF - Resource Description Framework, which in turn is a language composed of triples of subject-predicate-object s.

There, I found it!

. . .

Harumph. My Grand Conclusion Of The Day seems rather obvious in retrospect. Sigh.

Back to the math. Given that we do have these philisophical "hooks" and can grab content and relationships out of the 'Web, what can you do with that power? This is the fuel, and we have it in abundance, but what is the engine? What are the applications of situation calculus in artificial intelligence?

I should know the answer to that - I really should. Why didn't I study more AI while I was doing my degree? Sigh.

My AI textbook (well, my brother's actually) has about 8 pages on the subject. Gah, and it's almost noon already. Enter image of Stephanie doing dishes with AI textbook propped up at the corner of the kitchen sink.

Posted by Frozone Permalink on November 04, 2006 11:48 AM | Comments (0)
categorized under Computer Science & AI




October 26, 2006

Thoughts & Brains

I remember the first time I built a system whose intelligence frightened me.

It was two years ago. It sounds dumb now, but it was a booking system.

It is true, though: The system was "asked" to begin handling a particular responsibility - formerly taken care of by two humans and a fax machine - and it now handles the day-today interactions with its 757 unique users (yeah, I queried it - 'was curious) with more attention and insight than I would, were I asked to handle the same thing.

I clearly remembered the moment of fright I had, when I thought about all of the things that this system was aware of at a particular moment. The knowledge of how many different items up for grabs, when each item would get distributed, and where; a physical sense of what was currently available and what wasn't, who was currently logged in at the time, and how to match up their username with the staff directory and find out where they are stationed and how to contact them via e-mail (which it does); even its scope and reach, with both its fat client and web-based components, making it adaptable to different environments.

None of these functions are particularly spectacular - any second-year computer science student could build you such a thing. What interested me was the thoughts and brains of the thing. The system's thoughts are a sort of particular-ization of Plato's ideals (universals), where the universals correspond to real-world objects, and the particulars correspond to the computer brains. (Rather backwards, if you think about it - physical things being the ideal, and the intangible being the attempt-at-actualization.)

I (as the generic programmer "I") have achieved what I only read about in fantasy books about magic and conjuration. Metaphysics should be a required area of study in Computer Science. :-)

That's my favourite part of my job - "programming computer brains". I enjoy learning about the real world and how it works, and then conjuring the corresponding images in my head. I like to model these things and then make them interact, first making the "imaginary" objects work like they do in the physical world, and then adding improvments to make the whole thing run even more efficiently than it does in the real world.

Going back to Thursday's talk, I remember a phrase that Dr. Wasson spoke that just glued in my head: "... building a computational model." That was what she achieved in her thesis; she studied the art and techniques of instruction, and was able to form the model - instantiating a platonic object out of the universal, if you will. I struggled to articulate these words - "building a computational model" - earlier, in my other post, last month.

So much for narrowing down a topic -- out from Computer Science, to Quantum mechanics, to Philosophy, and now, to Religion. I read a book by Rabbi David A. Cooper where he outines creation and levels of awareness from physical being up to wisdom and understanding. At the upper levels, we have an "archetype wisdom out of which the mold of thought can first form".

I've been trying to figure out how to push my systems out of mere rationalization and logical thinking up to the point of having direction and purpose, that being to come to a better understanding of its own purpose and be able to act according to that goal, rather than merely to follow strict operations of procedure as dictated by The Lady Stephanie.

How do I bridge that gap? To move from logical thinking to logical feeling?

I continute to use myself as a lab rat. As I was teaching my students this week, I paid attention to the things I was telling them as I worked with them to teach them the basics of web programming -- HTML, CSS, etc.. Sometimes, I would answer their questions by uttering a sentance or two of task domain explanation, as if filling in pieces of the curriculum they hadn't understood from that morning's lesson. For example, "A line of CSS is made up of a property name - like 'color', for example - and a value for that property, like, 'red'.". Other times, I would explicitly point out an error, for example - "Ah, you wrote down the CSS property correctly, but you need to put a semi-colon at the end of the line."

How did I know what to tell the student? Whether to point out their error, or whether to re-deliver a component of the morning's lecture? Somehow it's built-in. Unlike most of my colleagues, I'm not trained as a teacher, so my skills in instruction techniques certainly aren't as sharp as theirs would be. But at the same time, I'm better at it than a computer would be.

There is still something that my brain has got that my computer doesn't.

One of these days, I'll figure out what it is. Then, I'll build it.

(Cue manianical laughter.)

Posted by Frozone Permalink on October 26, 2006 08:25 PM | Comments (0)
categorized under Computer Science & AI




October 09, 2006

Woes of algorithm-implementation

You think you know an algorithm.

Then, when you try to implement the thing, you -really- get to know it.

My first attempt used a terribly embarrassing implementation of a non-directional graph. And, my iteration techniques were atrocious. The thing just kept getting more awkward, with hack after hack, so I scrapped the whole project and then dug out my old second-year notes on graph theory.

Re-reading those notes, I realized I'd picked up a bad habit or two in the years I've been employed as a programmer. For instance, in my first attempt at implementing Dijkstra's algorithm, I programmed my Graph G (i.e. G= ( V, E ) where V is the set of vertices and E is the set of edges) by representing the set of vertices and edges with two vectors instead of using an adjacency matrix. Ouch!!!

I coded up my own classes for a Stack and for an Array2D and fixed up my graph. I ended up using a Vector to store all the Node objects, and an Array2D to store all the Edges, where Array2D(x,y) stored the Edge object that pointed from Node x to Node y. As you can see, this architecture assumes that each node has an integer id whose value is somewhere between 1...n where n is the number of nodes and no two nodes have the same id.

Finally, armed with a couple of good data structures, my implementation of the algorithm worked. Given any start node and end node, it would find the shortest path-value between the two nodes. It took me another couple hours to figure out to get a set of pointers to an actual *path*, i.e. "First you visit Node A, then you visit Node B"... etc.. The trick to this was each time you set a node's shortest-known-path-so-far (i.e. d[n]) that you also maintain a second list P where p[n] is a pointer to the edge(u,n) along the shortest path leading to n. Then, you go back and grab p[u] to get the edge(t,u), and so on... work backwards until you have the actual path. I stored the nodes in a stack for later use -- you can just pop off each node to visit it.

I won't log my code here just yet because I know that the third-year AI students are working on a very similar assignment right now. (Not that my code is that good, lol!)

Next I'm going to go find some more papers on applied graph theory for AIEd systems. Maybe I can find something about the application of a specific algorithm for student diagnosis. That would be cool, hmmm......

Posted by Frozone Permalink on October 09, 2006 01:00 PM | Comments (0)
categorized under Computer Science & AI




September 30, 2006

A multi-level learning journey with Dijkstra's Algorithm

****

What follows is a scientific experiment upon my own brain. I begin with a problem formed from a nightmare I had last night and end with an optimistic step in the direction of implementation of an algorithm. Enjoy.

****


My skills are SO rusty -- I'm having trouble solving my dream about Dijkstra's shortest-path algorithm. Seriously: at the end of the algorithm, each node has only an integer-value of the best weight of the path from the start node. The actual distance-travelled is not as useful to me as the path itself. How do you keep track of the path? Do you need a data structure, like a stack, perhaps - and do you have to maintain it along the way as different paths are traversed during the algorithm's execution?

What did I want to do with the path anyway? Maybe I can just print to command line at each step, knowing that it could be added to a memory stack or something...? But, no, I may need to store the path for purposeful traversal as part of a more complex procedure somewhere else. I definitely need to be able to retrieve the path again and again without actually stepping through the search algorithm multiple times. A quest, then: To solve the challenge presented to me by my dream!

Armed with metaphysical motivation, I thought it would be amusing to use this learning-quest as a pedagogical experiment. So, I stepped through each of my thoughts and tagged them with learner-attributes such "student asks question" or "student expresses confusion" or "student requires additional learning resource". These pedagogical notes are denoted in italics.

'Here goes.

*** BEGIN EXPERIMENT ***

Student gathers:
Dijkstra's Algorithm

Student opens introductory article but is soon daunted by technical notations. Student seeks easier/preferred visual representation of the object and interacts with it a bit, but soon returns to the Wikipedia article to seek an explanation for the observed behaviour.

Student understands that algorithms have input and output. Student attempts to apply abstract-knowledge to current instance.
input = Graph and source node (Knowledge obtained from reading introductory article.)
output = ??? Student doesn't understand how to apply abstract-knowledge of algorthms to specific-instance of Dijkstra's algorithm.

Student understands that algorithms require initializations of variables: primitive data types and pointers to primitive data types. Student attempts to gather "data that gets saved" during execution of the algorithm.
Data that gets saved
- each vertex saves the shortest path so-far from the designated source node (Student gained information, again, form introductory article.)

Student takes note of a tool that is used to understand the material: Notation
- the notation for the current shortest path from the source node s to any destination vertex v is d[v].

Student understands obvious notations about the start-state of the input.
- For example, if the distance from the source node to itself is zero, then d[s]=0.
- When distances from the source to a node v is unknown, then d[v]=infinity. Super-meta note: "Ugh, how do you Insert--> symbol in a blog entry?"

Student looks beyond physical appearance of the new object and into its functionality. Learns new terminology along with a side-story about springs as they relax and stretch.
- The values for d[v] will continually change as the algorithm executes. When temporary values are replaced with certain values, then the node is "marked", also called "relaxed".

Student asks her first question.
- "But, how do you get the whole path? How do you know which steps to take? An ordered set?"

Student returns to take closer look at introductory article; applies new term, remembers notation learned in a previous learning experience of notation and set theory. Student ammends initial understanding of "Data that gets saved"
- The algorithm also, in addition to storing d[v] at each node, maintains two sets:
S = set of all vertices where d[v] has been marked as "relaxed"
Q = all the rest of the vertices

Student objects.
"I don't just want the COST of the shortest path; I want the PATH itself!!"

Student attempts to construct solution on her own.
"If I want the path to a specific vertex, maybe I'm responsible for keeping the traversed nodes in a data structure myself."

Student considers proposed solution, but is daunted by the hassle of implementation issues.
"You mean I have to build a graph AND a stack data structure? There's got to be an easier way!"

Student returns to introductory article, then poses question and forms a misconception.
"Is the order of the items in the set S relevant? Is that set maybe also an ordering of nodes in a shortest path?"

Student attempts to verify the question-misconception and discoveres something new.
"Woah, what's this about a sub-graph? I kinda remember this from the lecture. Do I just do another depth-first search on the sub-graph? But that still doesn't solve my problem of keeping track of the path along the way."

Student seeks re-wording of the knowledge-component-to-be-learned (i.e. Dijkstra's algorithm) in a secondary instructional source. In this case, the student's primary source was the Wikipedia article and the second source was the class textbook. Student could not find any explicit descriptions of Dijkstra's algorithm in the textbook. Notes something foggy but relevant:
- After a node is relaxed, add it to path-so-far.

Student seeks yet another description of the learning object and recognizes the term "spanning tree".
- "Hmmm. My linear algebra textbook says that a spanning set is somehow related to the sums of elements in the set. It's been several years since I took that class, though, so I don't really remember the details of spanning sets. This is more computer science than mathematics anyway so I'll keep looking back in my original instructional source."

Student notices an explanation in the second description that was not noticed the first time around. Questions validity of learning sources and is reassured.
- "Ah, ha! Each node stores its best predecessor in the path from s! So you can re-trace the path backwards from the destination simply by following the referece to the predecessor. Awesome! Why didn't Wikipedia tell me to store the predecessor? ..... Oh, it does. I just didn't see it."

*** END EXPERIMENT ***

Well, that was an amusing bout of self-reflection. Once I got by the embarrasment of how lost I was with a "simple" undergraduate-level algorithm, I enjoyed myself. Now I feel ready to whip up a program in java to return a shortest-path.

'After I do the laundry.

Posted by Frozone Permalink on September 30, 2006 11:51 AM | Comments (0)
categorized under Computer Science & AI




August 15, 2006

Applications for the Genetic/Evolutionary algorithm

At the beginning of June, I felt ready to create a pool of agents, endowing them with goals and behaviours but I was (and still am) lacking understanding of educational methodologies to do so in a way that would be meaningful to AIEd systems research. I had the biology, but not the life.

A couple of nights ago, James and I went to the bookstore and I picked out a book by Ray Kurzweil: "The Age of Spiritual Machines". The predictive nature of the work in combination with technical accuracy was reminiscent of "On Intelligence", which I enjoyed immensely. Anyway, at the back of this book was an article entitled, "Pseudo code" for the Evolutionary Algorithm. The article explained an algorithm (i.e. "Genetic algorithm") which I thought immediately applicable to one of my own questions. I had been thinking about this problem recently as a result of some recent discussion with a colleague.

Namely: As learning-object-agents and human agents interact with each other, collecting all kinds of purpose-based interactional data, the required storage space grows and grows. Eventually, the portability of these agents will become a problem. Further, search times increase as data mining algorithms need to hunt through more and more data. It occurred to me that maybe this Genetic/Evolutionalry algorithm could be directly useful for addressing the need for intelligenct garbage collection. (As in the Ecological Approach paper by Gord McCalla).

Within a vast world of learning-object agents and human agents, and under this ecological theme, certainly, some agents will need to die. Some pieces of agents' memory will also need to die. If data is stale and useless, it must be purged. This is the start of intelligent garbage collection and also allows for the "survival of the fittest" -- necessary for an evolving system.

A second aspect -- moving beyond death -- that I had not considered before was the need to create new creatures by copying surviving creatures while introducing small, random variations. Genes. How would learning objects reproduce? Surely, human teachers will continue to create their own learning objects, but, there needs to be an artificial method of doing so as well. We can't very well rely on human teachers to create ALL learning resources -- it's a waste of their expertise. Teachers should be allowed to spend time interacting with students, not punching data into course management systems.

Thirdly: How do you measure the success of a learning object to determine if it should die, or if its "genes" will be useful in the next "generation" of learning objects? I remember this was a topic of discussion in an 862 class (now CMPT 872, I believe); At this point we discussed Granularity-Based Reasoning and Belief Revision in Student Models in order to see exactly how a student's understanding had changed and thereby give some mechanism for measuring the effectiveness of a particular learning object.

I believe my thought experiment is beginning to take form. I have to figure out where other people have already experimented in this area to help me focus. I repeat my outline from last time:

Given a pedagogical ontology and an engine for building a content plan, how would a set of (JADE) agents -- representing learners and learning objects -- interact given a set of starting instructional goals and a student model?

I have a bit of an answer, now. How would they interact? Well, by evolving. Some of them die. Some of them reproduce. I don't yet know how the concept of a "generation" would live in the system. I can't expect agents to grow "old" in the sense that humans and animals do; they are not bound by the fallibility of biology that prevents creatures from reproducing simply due to their extended temporal existence. Ha!

Well, I'm gonna go do some reading. I'm spending far too much time gathering papers and reading abstracts and not enough time studying the material that I've got.

Posted by Frozone Permalink on August 15, 2006 10:06 AM | Comments (3)
categorized under Computer Science & AI




June 15, 2006

Sketching out a design

This is really hard. I'm stuck between the theory and the application.

The whole point of my project is to build something - a toy. The last time I did anything constructive was 2 years ago in 2004, when I built Eruces. The only way I could have built Eruces was to make some very serious assumptions, the most notable being that I had a 1960s-era hard-coded instructional plan. As I learned in an algorithms class, if you want to bring a problem out of non-deterministic space into a programmable, deterministic space, you've got to add in some assumptions/restrictions.

So. As I sketch out this design, I often stop myself with worries like,
"But, someone out in the great wide world is surely already doing work like that, and is probably doing a better job. Why re-invent the wheel? Shouldn't you instead focus on expanding on existing work rather than starting from scratch on your own?"

At the same time, I want to build something complete enough in itself that it can be tinkered with and developed on its own. When this toy gets good enough, I intend to share it with real teachers and maybe even offer it as a toy in a real online classrooms so students can fiddle with it as bonus course work. The thing I have to remember is that I can't possibly expect to build a perfectly scalable and complete system on the first go. And now.... "Quit the worrying and git back to work!"

The primary computational environment shall be in JADE - the controller is the agent environment and all of the interacting objects will be agents. I will focus on the delivery of the system from a student agent's point of view, knowing that interfaces can always be built for the learning object agents (ex. excercise development) or teacher agents (ex. helping students with their learning).

My toy system shall contain hard-coded learning object agents from an existing Cyber School course. I've chosen part 12 of the unit on "The Atom" which is a part of grade 11 Chemistry, taught and designed by Norm Lipinski. This section is entitled "The Quantum Model" and contains 5 sub-components:


  1. Erwin Schrodinger
  2. The Heisenberg Uncertainty Principle
  3. Quantum Numbers
  4. Electron Configuration
  5. Orbital Diagrams

There is a quiz at the end of 'Electron Configuration'.

(How come I didn't get to learn this juicy stuff in grade 11? Maybe quantum theroy hadn't yet reached Saskatchewan's high school chemistry curriculum by 1997. Or, maybe I've simply forgotten that I learned it, not being able to appreciate this material at the age of 16.)

In addition to hard-coding in the learning objects (more on this in next paragraph), I will also hard-code in a suggested instructional plan - i.e. First, students proceed through Erwin Schrodinger, next Heisenberg Uncertainty Principle, next Quantum numbers, etc.. Later on I will expand into a dynamic content planner as inspired by Dr. Brecht's thesis.

I'll also expand on my hard-coded learning objects by building in a discovery learning object element by using the continually-expanding knowledge repository as maintained by our Cyber School's digital library. The data shall be pulled in simply by discovering the items in an RSS feed - Grade 11 science. Some new objects will be relevant, some won't (ex. biology). This may open interesting garbage-collecting work. (note to self. read some of Ms. Tiffany Tang's work.)

Maybe I'll be able to do something like:

//determine what to show next (eg. Schrodinger, Heisenberg Uncertainty Principle, etc.)
ContentPlan cplan = teacherAgent.getSuggestedInstructionalPlan(); //this would eventually be automated for the individual student's needs rather than getting a manually-built one by the teacher

//determine how to show it
deliveryPlanner.getDisplay(studentModel, contentPlan); //???? How does Dr. Brecht's system organize this??

I'll have to figure out how to build a Navigation scheme for the student according to the content plan - perhaps using Apache Turbine. Next, I'll have to explore different delivery possibilities for each learning object. I've used Apple's VoiceOver utility to record audio versions of a couple of these articles for an auditory alternative to what Norm has typed out for students to read.

I wonder if I can use the Apache Commons' Logging libraries to generate RDF traces of interactions between agents.

Okay that's enough for now... time to go grocery shopping.

Posted by Frozone Permalink on June 15, 2006 11:02 AM | Comments (0)
categorized under Computer Science & AI




June 10, 2006

A milestone?

I have enough of a framework in my head to feel comforable that I can proceed with a scalable and easily-adoptable design. I'm comforatble moving agents back and forth from server to server using serializable objects over data streams, and will only be using standardized (W3) methods for secondary storage (ex. using RDF). I find it acceptable to assume that all participating clients and servers have java-aware engines running so that I can make use of serializable objects and JADE libraries and so on. ('Realizing, of course, that this is not a great assumption because even in the school division under which I'm employed, most electronic systems are Microsoft-based, meaning that implementing a clean and logical architecture is inherently more difficult in some situations.)

To summarize the last couple of months, even, I've now got a rough idea of how to:
- represent domain knowledge
- represent pedagogical relationships between learning objects and even interactions between learners and those objects
- provide an agent-based environment for all of these interactions to occur

I haven't explored anything about student modelling, diagnosis, collaborative learning, content or delivery planning, with loads more. Where do I go next? I know I need to elaborate on the "question mark" part of my agent in my other post, but I'm not sure I know enough about the field to be able to fill that part in.

I think the next turn of events in my research should involve something directly pedagogical. To keep myself grounded in a teacher-student-classroom mentality, I often check out this very exellent blog called Teaching and Developing Online. I particularly enjoy reading Darren's PMIs - Positive, Negative, Interesting - where teachers and staff write about their experiences in online education. This helps me think about how my software can really augment and support the positives, while aiming to reduce and smooth out the negatives. The more I read about how actual online education works today, the bigger the gap seems between my own research and the real world. It's kind of like there's 2 camps of people: Educators/Teachers and Computer Scientists. What is considered "E-Learning" is really shockingly different when you talk to a member of one camp or the other.

So, in my continuous attempts to bridge this gap, I think I've decided on the next step in my research. I'd like to pick some kind of teaching technique (drilling, tutoring, instructing, etc. - this is where I need to do some more reading) and then concoct a learning context and place these different object-agents and human-agents inside these contexts, and maybe introduce some goals, and then see what happens. Whoozah.

I've got a framework. Now I need to jolt in some electricity, so to speak, so that I can observe the interaction of the moving parts and begin to bless in some guides to make the system operate according to my grand design.

Posted by Frozone Permalink on June 10, 2006 11:43 AM | Comments (0)
categorized under Computer Science & AI




Index to Steph's Notes

Feb. 24th 2007 - Weee! This new part of my website is not an entry, but rather a permanent fixture whose purpose is to "Look Down on All Those Notes With Some Grand Vision of Organization". Wish me luck. LOL
  1. Representing meta-data (fuel) & the different kinds of "hooks" that intelligent systems can use (how fuel is injected into the motor of the engine)
    1. Motivation: Semantic net / Rationalizable to a machine
      1. Semantic network
      2. Genetic graph
      3. Prerequisite AND/OR graph
      4. Constraint Satisfaction Problems
      5. Bayesian networks / causal graphs
    2. Technology & Philosophy: RDF, modus ponens,
      1. Predicates, Logic & situation calculus
        1. When in doubt, do some math
    3. What kinds of data? - What kinds of meta-data would an AIEd system possibly need, and how is it represented?
      1. task domain knowledge
      2. "is-prerequisite-to"-type knowledge
        1. Jackpot! A pedagogical ontology
      3. interactions with learning objects & other learners - (location, composition is-a/part-of, sequencing by restricting navigation, personalization, ontologies for LO context)
        1. Types of 'Ecological' data
      4. lesson plans, curriculum plans, practicing sessions (What is stored, what is generated on the fly? What is remembered?)
        1. Agent memory
    4. How to organize it - When is it stored in a database? Meta-data? Agent memory banks? Protocols? Repositories? XML files? Home-servers? WSDL services? Frameworks? Portable banks? P2P access?
      1. Database of object-agent interactions
      2. Concept of "Home" on a P2P network -- maybe the bulk of a learning object's usage data is on its home server and can be queried using WSDL or something ? Similar homes for each student's usage history, etc. Baggage problem.
    5. Links to the ontologies
      1. referring to a concept/relationship - ex. AgentOwl?
        1. Using Vocabularies in JENA
        2. Referring to a concept/relationship in an ontology
        3. Improved: Referring to a concept/relationship in an ontology
        4. Using OWL to reference constraints in tutoring systems
    6. Generation of this data
      1. Rationalization: For use by other AIEd systems
      2. What is generated - discuss items under part I.C.
      3. When it's generated - describe procedural model, which parts of the engine generate what (isa-part-of data, XML feeds, web services, meta data bout groups and collaboration, protocols, examples Friend of A Friend FOAF project)
        1. Thinking about the system's RDF output
      4. Technical notes of HOW it's generated: JENA, issues of implementation demo, my Hermione & Ron agent examples, lol
      5. Usage of this generated data - see part IV. A.
  2. Given the engine, who uses it?
    1. Students / Learners / "Me"
      1. instructional planning, student model, pre-requisites, tutoring, coaching, collaboration,constructivism
    2. Teachers / Educators / "Me"
      1. putting together lessons
      2. be able to browse through task domain knowledge in an objective / encyclopaedia format, then be able to pick-and-choose what you need for your students
      3. compose examples, design explanations, pull together diagrams, learning objects, etc. Haystack Relo?
    3. Administration / Governement / Structure / Crowd Control
      1. as restrictions/obstacles/sand pit to the robot in agent environment
      2. can't just have a swarm of students and teachers out there -- need structure of courses, curriculum, objectives, requirements (at least, we do in this day and age!) - Report cards, evaluation, feedback
      3. government, marks, certificates, requirements, funding, curriclum, attendance, delinquent, non-attending, motivation
      4. school''s images, goals, strengths, payroll, HR, security, accounts, permissions, privacy
      5. registration, failed courses
  3. User Environment -- How does this engine work? What does the user see on the screen?
    1. Introduction - Given a background in educational psychology, how does the system present itself -- what does the user see, and were does this data come from? Links to thoughts from part I.)
    2. Task Domain Browsing - Suppose you're you're just idly browsing through the "raw" content. How would it look when it's not wrapped around a learning-context or lesson or tutorial or anything. 'Cross between browsing a raw task domain ontology and browsing a learning object repository.
      1. Cleaning up the data -- Visualizing the data for humans to pick through the task domain and work on it. Suppose the "Subject Expert" discovers an advancement in science and needs to update the "world's" domain knowledge. (I used the "Subject Expert" terminology from Ontologies to Support Learning Design Context - Thanks Chris) How would they make corrections to ontologies and learning objects, or at least point the users of "old" objects towards adopting the newer ones.
      2. "Modes" - Learning & Lessons / Checklist - Homework, Assignments, Courses being taken / Collaborative mode / Teaching mode / Calendar- email -adminisrative mode -- See also the different kinds of scenarios in the ActiveMath system
        1. Educating myself about Education
  4. Evolution of this engine
    1. target some key implementation hooks discussed in part I - design an experiment/demo
      1. scrape a page - (Note, scraping can only give objective data, not in-context dat)
      2. LO repository - related to browsing the task domain?
      3. a learners "To Do" list - where does it come from? Assignments, courses.
      4. sample group scenario
      5. sample teacher lesson planning
      6. sample data "left behind"
      7. sample use of that data
    2. Data mining (for what? lol )
      1. discovery / generation of ontologies - when do you need to hunt for them, and when do you have to have a solidly-known & predictable ontology?
        1. Ontological Engineering: taking a first bite
    3. I/O - where it happens, which languages, protocols, which agents perform i/o and when, precepts, actuators
      1. Role Assignments
        1. Levels of authorization in web applications
      2. My Environment Adapts to me
        1. Displaying feedback from the server on JSP pages (Software engineering considerations)
        2. Sketching out a design (Content planning vs. Delivery planning)
      3. agent negotiations / social structures / ummm... Web 2.0 ?
        1. Towards student modelling
        2. Anatomy of an agent
    4. garbage collection of meta data
      1. Artificial Intelligence & Evolution
        1. Memory Culling: Necessary part of intelligence? (artificial or human)
        2. Applications for the Genetic/Evolutionary algorithm
      2. open learning environments
  5. Agents, pets, grouping, Community modelling
    1. Protocols - finding groups, cyber dollars, state diagrams (?)
    2. "Community Studies" - graphs & communication hubs, types of communities (free-for-all, hierarchy of authority, etc.)
    3. implications of joining a community - what do you share, which parts of your student model are relevant
    4. Walls & sand traps -- deliberate restrictions as problem-solving for learning
    5. Communication channels - individual-to-individual, individual-to-community, chat channels, agent-only "administrative" communications, ex. requests for related learning objects in a particular community, etc.
  6. Educational/Pedagogical focus (this part probably shouldn't be its own section but rather incorporated into the whole picture, but it's separate for me right now because I'm still only just starting to learn about it.)
    1. Semantics - what there is to talk about in Education
      1. ex. Merril's First Principles of Instruction, linking educational terms to AI terms
        1. Educating myself about education
    2. Pedagogical skills for tutors -- supporting human *and* artifical tutors
      1. Modelling teaching strategies
      2. What is teaching?
      3. Decision theory for teaching strategies
      4. My pedagogical issues
      5. Ontological comparisons as spatial relationships
    3. Student modelling - what the machine needs to know about the student, pedagogically-speaking, about learning history/preferences
    4. Roles - Simulated students, Coaches, Tutors, Teachers,