Index - Domain Knowledge Representation (DKR)

July 16, 2011

Vector Space Model for Information Retrieval

A few months ago, I asked my supervisor about using the analogy of space for knowledge representation. (like, Cartesian space, not outer space) My supervisor said that I should check out a technical report from Cornell University, it was published in 1978. He said it had something to do with social processes underlying mathematics, and it brought geometry back into favour.

So, today I went a web searchin', and I believe I found it. Thank you eCommons!!!

Mathematics and Information Retrieval
Author: Salton, Gerard
http://hdl.handle.net/1813/7452

There is something tugging at this thought in my head, a recent memory. Ah, here it is. http://philsci-archive.pitt.edu/8712/1/AlgorithmicDefinitionOfInformation.pdf - An Algorithmic Approach to Information And Meaning - A Formal Framework for a Philosophical Discussion, by Hector Zenil. I have only skimmed the paper, but there is something about how Descartes came in with a way to transform space into planes of numbers. One point of view was that Descartes' system "tarnished" space and knowledge somehow, while the other perspective argues that Descartes' innovation only added value & perspective, and it did not diminish that thing being studied (epistemology?)

If I find the reference, I will update this entry.

Anyway, back to the Vector Space Model. I didn't study the paper thoroughly, but I understood that it is possible to formulate a query on a database in terms of a mapping of one set of numbers onto a matrix.

Other topics in the paper include Set Theoretic Models for Information Retrieval, a correlation calculation (statistical correlation, I think?), Indexing Theory, Retrieval with Fuzzy Sets, Probabilistic Retrieval Models, Interactive Searching, File Organization and Record Clustering.

I'll not go into more depth (why do I always rush my research? I know, because I have a time limit on my degree program, grrr) but I'm happy now to have this little "hook" from my blog world out there toward the bigger research world.

Posted by Frozone Permalink on July 16, 2011 02:03 PM | Comments (0)
categorized under Domain Knowledge Representation (DKR)




July 09, 2011

Cognitive Realism & Cognitive Relativism

As I was eating a hot dog with cheese sauce, I skimmed through Bertrand Russell's The Correspondence Theory of Truth and William James' The Pragmatic Theory of Truth.

I thought about that old saying, "If a tree falls in the forest and nobody hears it, does it make a sound?"

I tried to apply my new awareness of these two theories to the adage. Let's see if I got this right. The Cognitive Realist believes that facts can exist outside of humans thinking about them. So, the Cognitive Realist would say "Yes, of course there was a sound."

On the other hand, the Cognitive Relativist would hold that there is no mind-independent fact, so, since nobody was there to perceive the tree falling, it is not a part of the vast network or web of truthes, so, of course the sound wasn't there. Nobody was there to know it. How could we know it even happened? It is not fact because nobody perceived it.

Now I will tell you why I was reading these things over lunch. You see, I am building a computer system that will be able to help people learn & study and grow. There is great technology available to help people find relevant content. However, I hold that an important part of the learning process is to let yourself be CHALLENGED. You have to fight and struggle and to have the things that you understand and hold near and dear to be pulled out from under you like a rug. This requires time, depth, process.

Why does my model need to be informed by epistemology? So that I can go "deep", as described above. Deep? What do you mean by "Deep"? Relative to what? If there is a "Deep" kind of knowledge, then this implies there would also be a "Shallow" kind of knowledge. In a recent entry (Actually, Research topic: Teaching techniques have Shapes.), I discussed Threshold concepts. I went to a great session at the STLHE 2011 conference by Dr. Maureen Connolly, Brock University entitled, "Threshold concepts and expressive writing: intersections and tensions". Connolly cites (Entwistle, 2009) in a discussion about "Deep" learning as distinguished from "Surface" or "Additive" learning.

After attending this session, I formulated a hypothesis. Can the path to deep learning be thought of as a SEQUENCE, or a path? Maybe yes: it's a sequence of threshold concepts. With threshold concepts, you see that deep learning is necessarily sequential, to an extent.

So, in the context of my thesis theme (Global Coherence vs Local optimization/dynamic adaptability), I can fight the question of "Isn't deep learning more than a sequence of learning object recommendations?" by clarifying that the sequence is important for depth at a macro scale; you sequence threshold concepts. But the dynamic, adaptable part is at the immediate / micro scale where the learner is picking their own directions for the short term, and the system is pushing them and challenging them on a broader level. But it defintiely changes, the control has to go back and forth. Maybe at a specific activity the system takes very tight control but it's within the learner's own choice because they want some practice at a particular skill. It can work the other way around, too, with the system all tight at the macro scale but the learner having wide freedom within the system's boundaries.

So. Threshold concepts, going deep. This is why I wanted to understand epistemology. The Russel & James papers I mentioned at the start are two well known approaches to ways of knowing. They are two ways to structure human knowledge so that a computer system can have something to go "deep" and "shallow" against.

Russel & James are only two philosophers. I wish I knew other perspectives from around the world, especially aboriginal cultures.

I would like to thank my friends Holly & Erin for recent lunchtime conversations about epistemology.

Posted by Frozone Permalink on July 09, 2011 01:50 PM | Comments (0)
categorized under Domain Knowledge Representation (DKR)




July 02, 2011

Where A.I. is stingy, Metaphysics is fruitful

I just read this great paper. 'Want to read again and discuss.

I thoroughly and finally understand that my research area cannot only look into the literature from the Computer Science perspective o f A.I.. My work NEEDS philosophy.

http://pantheon.yale.edu/~jp677/Papers/RppPaul.pdf

Posted by Frozone Permalink on July 02, 2011 09:22 AM | Comments (0)
categorized under Domain Knowledge Representation (DKR)




July 07, 2010

ISO standard, Ontologies

I am posting this link (below) because I am interested in ontologies, and the following appears to be a related ISO standard.

via @sebpaquet @notthisbody @technoshaman


http://www.ontopia.net/topicmaps/materials/tao.html

Posted by Frozone Permalink on July 07, 2010 11:14 AM | Comments (0)
categorized under Domain Knowledge Representation (DKR)




May 13, 2010

Ontology Alignment Evaluation Initiative

'Stumbled upon this, thought it was cool. 'Filing it away here on my blog for future reference.

Ontology Alignment Evaluation Initiative
http://oaei.ontologymatching.org/

Posted by Frozone Permalink on May 13, 2010 03:28 PM | Comments (0)
categorized under Domain Knowledge Representation (DKR)




February 03, 2009

Bayesian networks / Causal graphs

This is a juicy topic! Lots of work has gone into using Bayesian networks in AIED, I hope to cover a few examples here as well as throw in a couple thoughts of my own.

I'm not happy with the quality of my previous entry, but at least I'm moving forward. What I *did* like about that posting was how I started it, and finished my thoughts over the next few days. It screws up my RSS feeds because I go back and add to old entries and those edits don't get sent out as notifications. I really should be using a wiki. Alas.

The nodes in a Bayesian network are "Events" from probability theory. The edges represent conditional probability relationships. For example, if you have the probability of A given B (denoted P(A|B)) then you would have 2 nodes: one for A and one for B. There would be an edge pointing from B to A. I always get confused about the direction of the edges. I think of B, (the "given") as being the parent because A is kind of inheriting the given knowledge (B) from before. So, the node that we had marked as A is actually more correctly labelled P(A|B) and the node for B is more correctly labelled P(B).

Ugh, I hope I didn't mess that up! Moving on... I intend to flesh out the following headings over the next while, whenever I can escape from my beautiful baby.

February 04, 2009

Bayesian networks for student modeling

In Assessing effective exploration in open learning environments using bayesian networks (Bunt & Conati, 2002), the system applies a Bayesian network to support student exploration in an open learning environment. The network is used to let the system use its observations of student behaviour to guide hints and feedback to support their exploration.

In this system, nodes are organized into 2 types: knowledge nodes & exploration nodes. As best I understood it: Knowledge nodes represent the probability that the student has the knowledge of that topic. Exploration nodes represent the probability that the student has effectively explored an exercise, unit or category.

Edges represent causal links between the nodes (as with all Bayesian networks!). As best I understand it, the edges were manually constructed by the authors who examined what they thought the dependencies should be between knowledge & exploration nodes. Part of their work was to optimize this and make sure they had set it up in the most effective way possible.

As usual, I'm interested in how a platonic/abstract ontology could have been weaved into this system -- probably using an adapter through the exercises/units/categories.

The most interesting part of this system, to me, was that the node "types" allowed the researchers to code in the relationships between activities (of various grain sizes) and belief about students's knowledge of the concept related to that activity.

February 06, 2009

A second application of Bayesian networks can be found in Student Modelling based on Belief Networks (J. Reye, 2004). In this system, there are also several types of nodes: global nodes for modelling the student's overall characteristics, nodes to represent concepts-to-be-learned and the probability that the student knows that concept, and thirdly: nodes that act as indicators for the concept-nodes that allow the system to "watch" for students to demonstrate how to demonstrate their knowledge for the particular topic.

Prior probabilities are the knowledge the student has coming into the activity. Edges, like in the previous system, are causal links. Prerequisite nodes would point towards child nodes because knowing that piece of information woudl "cause" your knowledge of subsequent concepts. (As best I understood!)

I think that the teaching strategies are still kinda embedded / flattened within the network in this case. Still gotta keep reading to figure out how to abstract out that teaching strategy so that it can be applied to a platonic ontology!

Posted by Frozone Permalink on February 03, 2009 12:56 PM | Comments (0)
categorized under Domain Knowledge Representation (DKR)




January 31, 2009

Constraint Satisfaction

Constraint Satisfaction Problems

This topic is hard for me because I think that constraint satisfaction is a more powerful tool than the others I've talked about so far, and therefore more difficult to figure out how to use it.

Basically a CSP (constraint satisfaction problem) is when you want to find values for all your variables such that none of the constraints (rules about how variables relate to each other) are violated.

The variables and their constraints are usually defined by a human programmer. (Although I am fascinated by research into how systems can even figure out these parameters!) Applying the CSP model to my domain is a big challenge for me. Later on in this posting, I will explain how some researchers have done it.

Before that, I just wanted to take a sec to brainstorm: How *could* you apply the CSP model?

  • Could the variables be possible next-topics to teach the student? Constraints might be related to instructional planning rules. (CSPs plus Planning is another fascinating intersection. I have some papers on that; I wish it were easier to cite research within a blog entry!) And, how would broken-down teaching strategies manifest themselves? As plans? But how do you take an abstraction of a teaching strategy and an ontology and weave em together?
  • How about (the variables represent) what to display on the screen? (Current table of contents of the unit you're working on, perhaps some form of your user model, current lesson, some hints, choices for alternative paths to follow / open exploration, a log of things you've accomplished so far, contact people - other students or teachers, obviously your scratchwork for the current problem or exercise you're working on... etc!!) Constraints might be rules of good HCI... when and where to provide the tools to the learner.
  • What if the variables represent learning objects, and the constraints be some pedagogical rules on how to present the objects to the learner?

Hrrm.... Baby's calling. The trend continues for me to require shorter blog postings. I have to fragment my ideas into smaller posts. It hurts!

But I will be back later with more CSP stuff.

February 02, 2009

Kk, I'm back.

Since I've been presenting all the techniques so far as graph problems, I will explain how you can view CSPs as graphs as well.

Given a problem definition, you can create nodes for each variable, and edges for each constraint that involves those variables. This is called a constraint graph.

You could also look at a CSP as a graphing problem, where each node represents a complete assignment of values to all your variables. Edges link together the nodes where there is exactly 1 assignment-difference between them. That is, the 2 connected nodes are identical save for 1 variable having a different value assignment. Thus, the solution to your CSP can be regarded as a graph traversal. I think this might be a kind of "local search" because you start with a complete solution, and are just tweaking at each step to reduce the number of constraint violations.

You could also make each node a *partial* solution to the CSP. For instance, the root would be "no variables assigned values", then each child node at level 1 would have 1 variable assigned a value. Then, the grandchildren each have 2 values assigned, where one of them is the value inherited from the parent.

I'm not very familiar with algorithms to solve CSPs: such as traversals for the graphing-problems. I do know about a couple of heuristics: the order in which you "commit" values to variables is important. If I remember correctly, it's best to choose the variables with the smallest domains first. (Minimum Remaining Values Heuristic) Then, you have to choose which values leave the biggest world of possibilities open. (Least Constraining Variable Heuristic)

I'm interested in finding out what to do if your constraints, variables & domains are unknown.

Whew! That almost hurt! I don't know why it was so hard to "vocalize" all that. Maybe because I was frightened of explaining it incorrectly. I'm such a perfectionist; it hurts to publish something that isn't polished and perfect! But that's not the point of these notes.

Next, I wanted to note how CSPs have been used in AIED.

Hrm, baby's fussing. Will have to do this later!

February 03, 2009

In this paper, authors describe the constraints in their system as tuples: one tuple is like an evaluator that tells you in which cases this constraint is relevant. The other member of the tuple is the actual constraint that has to be satisfied. In this system, I saw the task domain ontology as being "blended in" with the representation of the problems the student is working on; the domain is represented as correct solutions to problems. The system can be very effective in its responses to students because it can detect where the "holes" are in student knowledge with its observations of the competencies demonstrated by students.

Waa, baby's awake. Laters!

Posted by Frozone Permalink on January 31, 2009 08:34 AM | Comments (0)
categorized under Domain Knowledge Representation (DKR)




January 22, 2009

Sweep of representation techniques, Part 0

Index

- Note to self: Update my "black box" from Feb 2007 below...

  1. Semantic network
  2. Genetic graph
  3. Prerequisite AND/OR graph
  4. Constraint Satisfaction Problems

Posted by Frozone Permalink on January 22, 2009 05:57 PM | Comments (0)
categorized under Domain Knowledge Representation (DKR)




January 21, 2009

Prerequisite AND/OR graph

... continuing from Part 1...

Prerequisite AND/OR graph

Similar to the semantic network, nodes in the Prerequisite AND/OR graph are also concepts from the task domain. Edges are labeled with either "AND" or "OR". Parent nodes tend to be very general with descendant nodes becoming very specific in terms of the content they represent. This type of graph is ordering the concepts in the order-to-be-learned: a student must "master" (or be familiar with...) the parent-concepts before they would be able to understand the child-concepts.

This representation is flexible because the content can be presented to the student in many different orderings; the ordering picked depends on the student model. For example, if you have the concept "the nitrogen cycle in your fish tank", you might have 2 parent prerequisite nodes "nitrogen" and "fish tank". You must understand what "nitrogen" is and what a "fish tank" is before you can understand "the nitrogen cycle in your fish tank". (Although I can't think of why you would want to learn about the nitrogen cycle in a fish tank if you don't know what a fish tank is....!) Anyway, the point is that the student can be taught about "fish tank" or "nitrogen" in any order; you can learn about fish tanks first or about nitrogen first, whatever floats your boat.

Each node (concept) can be sub-divided into its own mini-graph. This allows you to define orderings on many levels. For instance: Divide a course into Chapters, and each chapter is divided into Sections, each section into Sub-sections, etc.. The orderings of the chapters and sections can change for each person.

I learned about the Prerequisite AND/OR graph from McCalla, 1992.

I sort of worry about ending up with a big tangled graph. Is it really possible to pick out all of the prerequisite knowledge required for a given concept? Human beings are so contextual. What might be prerequisite knowledge for one person might not be for another: maybe I need to understand what fish tanks are before I can understand "the nitrogen cycle in a fish tank", but maybe my friend is so skilled with abstract thinking he doesn't need to understand what a fish tank is in order to proceed with the lesson. I don't know. It seems to me that the decisions required to build the AND/OR graph need to come from a good, human teacher. This is probably how the data structure was intended, anyway. :-) Or, what if somewhere down the path you have a leaf node that you actually could get away with learning without having all of its ancestors mastered under your belt. It's not possible to connect all of human learning in a single Prerequisite AND/OR graph... so, you must need multiple graphs. How would you hop from one to another? I need to catch up on 20 years of research to figure out if you can abstract the teaching style out just a little bit more.

More to come!

Posted by Frozone Permalink on January 21, 2009 01:11 PM | Comments (0)
categorized under Domain Knowledge Representation (DKR)




January 16, 2009

Semantic Network & Genetic Graph

Over the years, I've read about many ways to represent knowledge (any knowledge -- the task domain ontology, teaching strategies, student models, etc.) and have struggled to balance out the strengths of each technique and how they should be applied to build an effective AIED system. What's fascinating to me is the number of ways the "student model", the "teaching strategy" and "task domain ontology" all swirl together in different forms using these representations. There's a gazillion ways to represent the different parts of the system, and a gazillion more ways to make the components work together.

I feel comfortable enough to list and summarize some approaches (although I know there's still a LONG way to go!). So, I decided to start this "Sweep of representation techniques" on my blog. With my baby daughter, I'm finding that I no longer have 2 hours to work on a posting, so, in order to give myself the thrill of actually publishing something, anything, I've decided to break this posting down into parts.

Ways to represent knowledge

Semantic network

A semantic network in graph form uses nodes as concepts and edges to represent the relationships between the concepts. Semantic networks are advantageous in that they're easy for humans to understand and a machine can work with these also.

Genetic graph

In a genetic graph, nodes represent stages in student knowledge and edges represent cognitive operations.

So, "early" in the graph you might have a very rough representation of the student knowledge, and "further down" in the graph by using cognitive operations (analogies, simplifications, generalizations, deviations, corrections -- from Brecht & Jones 1998), you'd have a very specific representation of student knowledge.

Stay tuned for the next part!

Posted by Frozone Permalink on January 16, 2009 02:29 PM | Comments (0)
categorized under Domain Knowledge Representation (DKR)




June 17, 2006

Working backwards

I'm working backwards today.

Given a completed lesson that is being delivered to high school students in a Grade 11 chemistry course, I'm attempting to find an ontology on Chemistry and to make references to the learning objects in this course.

I've found a pretty good ontology for high school chemistry, however, it does not cover all of the material and examples that exist in the actual online course that I've chosen to model. I suppose this is to be expected. It's almost like I need 1 primary ontology for curriculum (I wonder if Sask. Learning has one in their official curriculum for high school chemistry....prolly not - did they have OWL in 1992? lol ) and then I can use ontology discovery to find references to supplement and hopefully fill in the skeleton provided by this primary ontology. This is where we'll start to see individual variation of the lesson-generation according to each student's needs and interests. As a dumb example, perhaps girls will be given examples of using chemistry in the world of rainbows and unicorns, whereas boys would be presented similar examples but with using race cars and hockey.

For my purposes, I think I'll simply grab an arbitrary domain ontology that's already written in OWL, then assume that it is the curriculum. Next, I'll have to figure out how to handle all of the content in the lesson that doesn't actually come from the curriculum ontology. These are all coming from the teacher's head, obviously, but I'm interested in how an AI system could build this automatically.

Posted by Frozone Permalink on June 17, 2006 11:07 AM | Comments (1)
categorized under Domain Knowledge Representation (DKR)




June 03, 2006

Towards student modelling

I'm still trying to get straight in my head how the data will look and how student agents will swim through it. I'm seeing each learing object (perhaps with many dimensions of granularity) as its own agent and each student, obviously has his or her own agent.

(I've learned this agent approach from various I-Help papers including A Multi-agent Approach to the Design of Peer-help Environments and Lessons Learned in Deploying a Multi-Agent Learning Support System: The I-Help Experience.)

Perhaps, in my own simple system, each learning object will have its own RDF file, with multiples of triplets, representing things like:
- where the object itself fits in a domain ontology (represent facts, relationships, tendencies, etc.)
- how it has been used in learning situations in the past: records of object-agent and learner-agent interactions and their results, and where this interaction had fit in that student's larger-scale learning
- (see Types of 'Ecological' data)

In the past I've talked about starting to build my own data, using scrapers and so on. Perhaps now that I know a little more about domain ontologies and pedagogical ontologies, I can begin an attempt to create a sea of metadata myself and begin to explore how agent navigation may occur in this sea.

Posted by Frozone Permalink on June 03, 2006 02:40 PM | Comments (2)
categorized under Domain Knowledge Representation (DKR)




May 27, 2006

Auto-discovering ontologies vs. explicit ontology choice

I was thinking the auto-discovery ontologies should be reserved for domain knowledge (i.e. things people are learning about: geology, math, etc.).

We'd use hand-picked ontologies for the other levels of knowledge, especially for showing pedagogical relationships between students and these domains. One such hand-picked ontology would be this one, as described in my other entry.

So, all of the RDF files that you "leave behind" as a result of learner interactions with objects will use hand-picked ontolgoies for actually telling you something pedagogical about the interaction (i.e. the "relationship" part of the RDF triple), whereas auto-discovered ontologies could be useful for pointing to the object that the student interacted with (i.e. the "object" part of the RDF triple). Hmm, so what would be the "subject" be? Maybe a pointer to the snapshot of the student model at the time of the interaction? Or would you want to want to show how this learning object was used in conjunction with other learning objects by the same student in the same learning scenario?

It seems to me that any one interaction between a student and a learning object would produce a whole pile of RDF statements.

(Wow - I think I've graduated out of the "domain knowledge" category into "student modelling".)

Posted by Frozone Permalink on May 27, 2006 11:50 AM | Comments (2)
categorized under Domain Knowledge Representation (DKR)




May 22, 2006

Victoria Day Calculus

I said I wanted more background on ontologies. Boy, do I have it now!

In Formal Ontology and Information Systems by Nicola Guarino, Calculus, Philosophy and Computer Science become one.

I can tell that this paper is really going to help to clarify the gaps in my knowledge about these "ontologies", but it'll take a while to pick throuh some of the math.

Posted by Frozone Permalink on May 22, 2006 09:52 AM | Comments (0)
categorized under Domain Knowledge Representation (DKR)




May 13, 2006

Procedurally building a scraper

Yum yum! Bacon & Sour Cream potato chips with a vanilla cola. Mmmmmmm! Oh, and the Red Hot Chili Peppers' new album is awesome.

I keep forgetting how I used my Firefox plugins (Solvent & Piggy Bank) to create my RDF files. Soon I will be practing using some data mining algorithms using Weka, so I'm gonna need a fair bit of data... so that I can actually mine it, lol. So I figured I'd better write down how I'm doin' this, and maybe I can automate it somehow, eventually. Or maybe there are sample swarms of data out there that I can eventually practice on.... but my world is still small, here, so I'll stick with self-constructed stuff until I know what I'm doing.

  1. First, in Firefox, go to the page that you want to scrape. (So far this is easy enough to automate... in java you can grab the input stream with the java.net package.)
  2. Next, click the baby-bottle icon in the bottom-right. (I apologize to the SIMILE guys if that's not actually a bottle... but that's what it looks like to me!)


  3. I still find that I have to parse each atomic "thing" on the page individually if I'm using Solvent to do the scraping. (See my other entry for more detail about parsing things individually.) For example, if I'm pulling out data about a bunch of course offerings, I have to do each course individually. Or, if I'm pulling out data about each teacher, I have to do one teacher at a time. Or, supposing I were in an existing online course and I clicked on "Unit 1, General Concepts" and it listed 5 or six core ideas (very coarse grained, if you like thinking in terms of granularity) that make up the whole course, I would have to process each core idea separately.
    So, I find it useful just to begin by choosing "Insert--> Code to scrape 1 page", like so:



    Next, I copy/paste this text into a separate file.

    Then, click the "Capture" button with the butterfly-net icon and click the component of the website that you want to parse. (This part could theoretically be automated if there were a command-line version of Solvent, or maybe there are Java libraries for Solvent, or maybe there are different java tools out there that would accomplish the same thing.)

    This will cause some JavaScript code to appear inside your Code window where your cursor happens to be sitting. For each individual object in #3, assign as many subject-relationship-value trios that you can about the object, accumulating the pieces of code in the same javascript file as you go. I already describe this process in my other posting.

    When you're done, the goal is to produce some JavaScript output like this which pulls as many subjec-object-relationship trios as possible out of the web page under scrutiny.

    (Eww, yuck. I just took a sip of this morning's left-over coffee when I thought it was my cola. Bleugh!)

  4. This is the point where I must make a brutal simplification for the purposes of turning abstract theory into an implemented system. I'm going to have to find a page (or set of pages) to scrape that have a sufficient amount of educational detail to be useful, and I will scrape them all manually. That will form my whole world of RDF-based domain knowledge, upon which I will build the rest of my project.

    So, in my project, I'm building the RDF files myself. For expansion into the whole, practical Web, where would the RDF triples come from? Ideally, they would have been left there by other users/viewers of that data, describing their experiences with it and how they used it. (See the ecological approach, McCalla 2004) The "Scraping" I'm doing is merely a way to extract the data myself because other systems are not yet accustomed to leaving such data behind. Or, perhaps, such auto-scrapers will be the future versions of today's keyword web spiders. But, then again, an automated spider can only produce objective data as opposed to practical, in-context data.

    For the purposes of my project, I will build the RDF datafiles myself with the assumption that such data -could- be left there for me by other systems. In my own project, my system would leave RDF traces of interaction, anyway. The RDF data acts as both input and output.

    The Clerkin et. al. paper discusses how ontologies can be discovered/generated using heierchical conceptual clustering. Could this hierarchy represent an instructional plan? Can you generate instructional plans on-the-fly (partly) based on ontologies out there?

    I've got a real gap here between my actual ontologies and my theoretical instructional planner. I wonder if I should take another angle and leave the domain knowledge gathering aside and maybe focus on a closed domain that is in a format that could be scaled out. Like, I need to find a whole bunch of RDF files that are in the right format. I think I'll end up finding an actual online course, scrape it manually, and then use that as a "closed" domain knowledge souce, knowing that its nature is expandable.

So........ next on the agenda: Let's see if I can manually scrape an actual online course, and then build a system that'll use the RDF files as i/o as they are used in actual (okay, fine: contrived, for the practical purposes of my project) educational scenarios.

Posted by Frozone Permalink on May 13, 2006 11:28 AM | Comments (0)
categorized under Domain Knowledge Representation (DKR)




April 17, 2006

Data mining and Ontology Discovery

Ah, ha! Perhaps I don't have to find a sample RDF-based system after all. There's a great paper here from Trinity College Dublin called Ontology Discovery for the Semantic Web Using Hierarchical Clustering by Clerkin, Cunningham & Hayes. (Thank you, Google Scholar.) The same department has plublished a lot of their other technical papers. I'm gonna have a field day reading all of this! Woo hoo!

Rather than observing an existing system and trying to figure out what the heck it's doing, I can take an academic angle and start with a bunch of ideas about "how" and then see if any research lab out there has actually built it. Perhaps, just perhaps, I'll be able to glean out something simple enough that I can implement (or preferably, that I can import) for my purposes.

* * *

Ohhhhh, 'Just reading through the list of published papers & theses at Trinity College Dublin, and I found a VERY cool one: Story Games and the OPIATE System / Using Case-Based Planning for Structuring Plots with an Expert Story Director Agent and Enacting them in a Socially Simulated Game World , by Chris Fairclough. His ideas may be fun to explore when I get into studying different delivery planning mechanisms in my own AIEd system.

* * *

Mmm! After reading Ontology Discovery for the Semantic Web Using Hierarchical Clustering a couple of times, I realized that the bridge between vast masses of RDF data and my little wee AIEd system is closing!! The authors describe how they used the COBWEB conceputual clustering algorithm (originally presented by Douglas Fischer in 1987 in Knowledge Acquisition Via Incremental Conceptual Clustering), to be applied to an RDF-based music recommender system. Excellent.

My next question was, "Are there any java libraries out there that will let me use COBWEB just like Clerkin, Cunningham and Hayes did?"

I was delighted to discover that at the University of Waikato in New Zeland, they have developed a collection of machine learning algorithms for data mining tasks in Java. Check out the Weka Wiki, especially the Weka documentation project within. And then you have to say "Weka Wiki" out loud. It is really fun. Trust me!

I'll explore their libraries a little bit to see how I can take RDF data and create some ontologies out of it. " What will these ontologies look like?", I wonder.

Posted by Frozone Permalink on April 17, 2006 07:52 AM | Comments (0)
categorized under Domain Knowledge Representation (DKR)




April 15, 2006

Grabbing core knowledge & domain knowledge

Well, after an exhausting day, and armed with a wee tidbit of theory about core and domain knowledge, I've finally been able to scrape a few pages of my own. Here is what I wrote down before I began:

"To scrape a page, I'm thinking, is basically to separate the core knowledge from the domain knowledge. So by scraping a whole bunch of different pages about Chemistry, maybe I can build a good domain library about Chemistry, while keeping separate core knowledge libraries about the applications of Chemistry. "

In accordance with the whole theme of this quest, I've discovered that it's really very hard to take an actual web page out there and classify the different pieces of it into different types of knowledge (domain or core) because everything overlaps with itself. It's like I need a 3-dimensional scraper.

Anyway, I decided to scrape the web page about the course offerings at our cyber school. I managed to grab the first 3 course offerings on the list. Each course has a name and a URI:

Course nameCourse uri
Christian Ethics 30http://www.scs.sk.ca/cyber/course/ce30.htm
Calculus 30http://www.scs.sk.ca/cyber/course/calc30.htm
Chemistry 30http://www.scs.sk.ca/cyber/course/chem30.htm

First, I had to figure out how to capture these 3 different elements from the web page itself. Because of some of the subtle HTML markup in my sample page, the process wasn't as easy as the example. I saved a snapshot of the original html from my own sample page.

In the example, everything on the page was arranged nicely in its own Item; here's a snapshot form the example:

See how there is a different "+" for each item? They have "Item 1 (TD)" and another top-level item for "Item 2 (TD)" Now, here's mine:

I only had 1 top-level item break down. (Maybe there is a way to explicitly re-organize the tree, but I didn't see one.) So esentially I had to run the scraper 3 different times, each time only grabbing the 1 course I wanted. Then I copied and pasted the JavaScript like this:

(Yuck. I copied & pasted the script here but it looks gross. Instead I'll let you look at it in a separate file.)

Upon examining that code, all I can say is,"Cool, I didn't know you could do a try/catch block in JavaScript!" Lol. It is also amusing that for error-handling the script turns the background colour red - I do that too! hehe. Anyway I'm glad I didn't have to write all that scripting by hand.

After I had this JavaScript in Solvent, I clicked "Run" and then "Show in Piggy Bank". (I sure love that pig icon. hehe.) Finally! Some clean and understandable RDF code!!!!!!


< http://www.scs.sk.ca/cyber/course/ce30.htm >
< http://www.w3.org/1999/02/22-rdf-syntax-ns#type > < unknown >
; < http://purl.org/dc/elements/1.1/title > "Christian Ethics 30"
.

< http://www.scs.sk.ca/cyber/course/calc30.htm >
< http://www.w3.org/1999/02/22-rdf-syntax-ns#type > < unknown >
; < http://purl.org/dc/elements/1.1/title > "Calculus 30"
.

< http://www.scs.sk.ca/cyber/course/chem30.htm >
< http://www.w3.org/1999/02/22-rdf-syntax-ns#type > < unknown >
; < http://purl.org/dc/elements/1.1/title> "Chemistry 30"
.


Not perfect, but at least it's auto-generated. Mostly. heh.

Okay, so what was I supposed to do with my RDF data again?

Oh, right. I was supposed to examine how Haystack makes sense of RDF files. Then maybe I can use similar techniques to bridge the gap between the domain knowledge representation and the AIEd systems I'm trying to study. (Geez, I wish I had a screenshot of PEPE's system (from Determining the focus of instruction : content planning for intelligent tutoring systems / by Barbara Jane Brecht) architecture available! I'll try and track one down. I want to illustrate on a high level how this domain knowledge work fits into the much bigger picture!)

Posted by Frozone Permalink on April 15, 2006 10:49 AM | Comments (0)
categorized under Domain Knowledge Representation (DKR)




April 01, 2006

Ontological Engineering: taking a first bite

I began my quest by reading a paper called Ontological Modelling for Designing Educational Systems by Joost Breuker & Antoinette Muntjewerff from the University of Amsterdam, 1999.

The paper gave me a lot of information about the different types of domain knowledge and how they can be organized into different levels: top, core, and domain ontologies. From my limited understanding, I gathered that the top ontology covers universally-applicable concepts like time, space, cause, etc.. A core ontology contains how knowledge is used in a particular field, and the domain ontology describes the raw knowledge itself. So, I'm thinking that maybe for each subject (like Geology, for example) there is 1 domain ontology, but there can be many core ontologies for the different applications of that raw knowledge.

The framework is further explored with an example and then applied in an educational system. I gather that they performed a study by watching budding young law students and how they experienced difficulty in solving legal cases. They seemed to notice that one reason for student failure was that they had difficulty matching up the case at hand with the huge world of law itself. The students weren't familliar enough with the laws themselves to be able to make all of the appropriate connections with the case at hand. I'm seeing this as a mapping between the core ontology, i.e., the laws, and the domain ontology, i.e. how the laws should be applied in this kind of legal case. I think. Maybe. I don't know.

Anyway, the authors described an AIEd sytem that showed three different levels of knowledge. On the left, the core ontology stuff, i.e. the laws, were shown. In the middle, the different "hooks" or facts in the current case were shown, to aid the student to map the current case with the abstract laws themselves. Then, in the thrid window, the student can use the matching-up of the case to the laws and then to draw a conclusion.

This was all really neat, and the authors said that they planned to write another related paper later on. This was in 1999 so I wonder what they are working on these days.

This framework makes sense, and so far, it's the only thing I know about domain knowledge representation. So I'll keep it in my head until I can read more on the subject. (I know, it's a bad idea to proceed from reading only 1 paper to be thinking about implementation issues in a working project, but, meh. I gotta proceed somehow.)

So! Assuming that I'm still using RDF statements, I'm seeing that all these statements might be classified into the three different categories: top, core & domain. Is it necessary to explicitly classify my RDF statements into these three categories? I would think not, but this will remain one of the many unanswered questions in my head.

Next: I think I need a specific example of what a collection of RDF files looks like, and how they are used in a working system. I think this is were I shall have to start up Haystack and examine the files it uses and observe how the system interprets them. Perhaps I will also get some answers on the auto-generation of RDF files.

Beyond that, I would like to study some working instructional planner (like PEPE, maybe, from my favourite thesis I described in my quest) and to see if it's plausible to hook up that engine with the domain knowledge library I come up with before hand.

'Should be fun! But now the real world calling. I have to go do the laundry and then take my Mitsubishi to the car wash. Sigh.

Posted by Frozone Permalink on April 01, 2006 12:03 PM | Comments (0)
categorized under Domain Knowledge Representation (DKR)




Index to Steph's Notes

Feb. 24th 2007 - Weee! This new part of my website is not an entry, but rather a permanent fixture whose purpose is to "Look Down on All Those Notes With Some Grand Vision of Organization". Wish me luck. LOL
  1. Representing meta-data (fuel) & the different kinds of "hooks" that intelligent systems can use (how fuel is injected into the motor of the engine)
    1. Motivation: Semantic net / Rationalizable to a machine
      1. Semantic network
      2. Genetic graph
      3. Prerequisite AND/OR graph
      4. Constraint Satisfaction Problems
      5. Bayesian networks / causal graphs
    2. Technology & Philosophy: RDF, modus ponens,
      1. Predicates, Logic & situation calculus
        1. When in doubt, do some math
    3. What kinds of data? - What kinds of meta-data would an AIEd system possibly need, and how is it represented?
      1. task domain knowledge
      2. "is-prerequisite-to"-type knowledge
        1. Jackpot! A pedagogical ontology
      3. interactions with learning objects & other learners - (location, composition is-a/part-of, sequencing by restricting navigation, personalization, ontologies for LO context)
        1. Types of 'Ecological' data
      4. lesson plans, curriculum plans, practicing sessions (What is stored, what is generated on the fly? What is remembered?)
        1. Agent memory
    4. How to organize it - When is it stored in a database? Meta-data? Agent memory banks? Protocols? Repositories? XML files? Home-servers? WSDL services? Frameworks? Portable banks? P2P access?
      1. Database of object-agent interactions
      2. Concept of "Home" on a P2P network -- maybe the bulk of a learning object's usage data is on its home server and can be queried using WSDL or something ? Similar homes for each student's usage history, etc. Baggage problem.
    5. Links to the ontologies
      1. referring to a concept/relationship - ex. AgentOwl?
        1. Using Vocabularies in JENA
        2. Referring to a concept/relationship in an ontology
        3. Improved: Referring to a concept/relationship in an ontology
        4. Using OWL to reference constraints in tutoring systems
    6. Generation of this data
      1. Rationalization: For use by other AIEd systems
      2. What is generated - discuss items under part I.C.
      3. When it's generated - describe procedural model, which parts of the engine generate what (isa-part-of data, XML feeds, web services, meta data bout groups and collaboration, protocols, examples Friend of A Friend FOAF project)
        1. Thinking about the system's RDF output
      4. Technical notes of HOW it's generated: JENA, issues of implementation demo, my Hermione & Ron agent examples, lol
      5. Usage of this generated data - see part IV. A.
  2. Given the engine, who uses it?
    1. Students / Learners / "Me"
      1. instructional planning, student model, pre-requisites, tutoring, coaching, collaboration,constructivism
    2. Teachers / Educators / "Me"
      1. putting together lessons
      2. be able to browse through task domain knowledge in an objective / encyclopaedia format, then be able to pick-and-choose what you need for your students
      3. compose examples, design explanations, pull together diagrams, learning objects, etc. Haystack Relo?
    3. Administration / Governement / Structure / Crowd Control
      1. as restrictions/obstacles/sand pit to the robot in agent environment
      2. can't just have a swarm of students and teachers out there -- need structure of courses, curriculum, objectives, requirements (at least, we do in this day and age!) - Report cards, evaluation, feedback
      3. government, marks, certificates, requirements, funding, curriclum, attendance, delinquent, non-attending, motivation
      4. school''s images, goals, strengths, payroll, HR, security, accounts, permissions, privacy
      5. registration, failed courses
  3. User Environment -- How does this engine work? What does the user see on the screen?
    1. Introduction - Given a background in educational psychology, how does the system present itself -- what does the user see, and were does this data come from? Links to thoughts from part I.)
    2. Task Domain Browsing - Suppose you're you're just idly browsing through the "raw" content. How would it look when it's not wrapped around a learning-context or lesson or tutorial or anything. 'Cross between browsing a raw task domain ontology and browsing a learning object repository.
      1. Cleaning up the data -- Visualizing the data for humans to pick through the task domain and work on it. Suppose the "Subject Expert" discovers an advancement in science and needs to update the "world's" domain knowledge. (I used the "Subject Expert" terminology from Ontologies to Support Learning Design Context - Thanks Chris) How would they make corrections to ontologies and learning objects, or at least point the users of "old" objects towards adopting the newer ones.
      2. "Modes" - Learning & Lessons / Checklist - Homework, Assignments, Courses being taken / Collaborative mode / Teaching mode / Calendar- email -adminisrative mode -- See also the different kinds of scenarios in the ActiveMath system
        1. Educating myself about Education
  4. Evolution of this engine
    1. target some key implementation hooks discussed in part I - design an experiment/demo
      1. scrape a page - (Note, scraping can only give objective data, not in-context dat)
      2. LO repository - related to browsing the task domain?
      3. a learners "To Do" list - where does it come from? Assignments, courses.
      4. sample group scenario
      5. sample teacher lesson planning
      6. sample data "left behind"
      7. sample use of that data
    2. Data mining (for what? lol )
      1. discovery / generation of ontologies - when do you need to hunt for them, and when do you have to have a solidly-known & predictable ontology?
        1. Ontological Engineering: taking a first bite
    3. I/O - where it happens, which languages, protocols, which agents perform i/o and when, precepts, actuators
      1. Role Assignments
        1. Levels of authorization in web applications
      2. My Environment Adapts to me
        1. Displaying feedback from the server on JSP pages (Software engineering considerations)
        2. Sketching out a design (Content planning vs. Delivery planning)
      3. agent negotiations / social structures / ummm... Web 2.0 ?
        1. Towards student modelling
        2. Anatomy of an agent
    4. garbage collection of meta data
      1. Artificial Intelligence & Evolution
        1. Memory Culling: Necessary part of intelligence? (artificial or human)
        2. Applications for the Genetic/Evolutionary algorithm
      2. open learning environments
  5. Agents, pets, grouping, Community modelling
    1. Protocols - finding groups, cyber dollars, state diagrams (?)
    2. "Community Studies" - graphs & communication hubs, types of communities (free-for-all, hierarchy of authority, etc.)
    3. implications of joining a community - what do you share, which parts of your student model are relevant
    4. Walls & sand traps -- deliberate restrictions as problem-solving for learning
    5. Communication channels - individual-to-individual, individual-to-community, chat channels, agent-only "administrative" communications, ex. requests for related learning objects in a particular community, etc.
  6. Educational/Pedagogical focus (this part probably shouldn't be its own section but rather incorporated into the whole picture, but it's separate for me right now because I'm still only just starting to learn about it.)
    1. Semantics - what there is to talk about in Education
      1. ex. Merril's First Principles of Instruction, linking educational terms to AI terms
        1. Educating myself about education
    2. Pedagogical skills for tutors -- supporting human *and* artifical tutors
      1. Modelling teaching strategies
      2. What is teaching?
      3. Decision theory for teaching strategies
      4. My pedagogical issues
      5. Ontological comparisons as spatial relationships
    3. Student modelling - what the machine needs to know about the student, pedagogically-speaking, about learning history/preferences
    4. Roles - Simulated students, Coaches, Tutors, Teachers,