« Cookie statistical distributions | Main | Anylogic and JPA »

October 01, 2011

Ruttiness

I am building a simulation model to study the "paths" that people take as they are learning because I want to ultimately design a computer that can best support a learner by suggesting highly effective paths for people to take (and even influencing the path by interacting with the learner).

I am so early in this process. Currently, I have a system that creates random (empty) learning objects, and I can create any number of students who all hop randomly from object to object. Each learning object accumulates learner model stamps as time goes on.

Next, I have to implement some kind of system that compares learner's paths to each other. Why do I want to compare learner's paths to each other? So I can cluster them. Because some paths will be more like others. Why do I need to cluster them? Because I need to be able to take any learner, call her S, and look at their path so far, and find other "paths" like theirs. Then, I can predict ahead to see what these other students did in their paths in the future, and, use this information to suggest something useful to give to S to work from.

I am faintly bothered by something like the chicken and the egg problem. What will happen to my system once my learners start following recommendations and leaving learner model stamps on all the learning objects, and these stamps are used in the future in order to power future recommendations. If people follow what is recommended to them, and the recommendations are based on people's behaviour, then the system is actually feeding into itself. Won't like, everyone eventually fall into the same rut?

In the future, I need to be able to replace my randomly-generated system with a static (but real) dataset. Can we even start to predict what people will do? Is there a typical different kinds of behaviour that emerge from the nature of your starting set of learning objects?

The ability to replace the random data with the real data is what will allow us to make REAL recommendations.

I want to compare the "ruttiness" of paths from the random data with the real set.

So that is why I am writing this post. I want to take a crack at defining "Ruttiness".

See, in the system, each individual human can be said to have exactly 1 path. Take the earliest timestamp that they have, and sequentially line up the objects they visited until you reach the very last timestamp. This is their path.

If every agent has an identical path (meaning the sequence of objects that they visit is identical, but the timestamps do not have to be identical), we might say that our system has a Ruttiness of 0.

If every agent has an absolutely different path, we might say that our system has a Ruttiness of 100.

If some people have identical paths but most are still different from each other, we might have a Ruttiness of like 85.

Now, this isn't quite good enough. Because we might have 2 people with precisely different paths (i.e. no common sequences) but, it could be that there are 2 learning objects that are very much alike each other. So, we might want to treat them as being equivalent. So this is where we start having a homologous path.

How do I define a homologous path? Well, I have to declare likeness matricies between learning objects. Then, while we are comparing 2 paths, we'd take the analogous learning objects and declare them as the same if:
- they are indeed the same object
- they pass a certain threshold in the learning object likeness matrix.

This is good stuff. But I have to be careful when I am explaining this. Because it is deep enough already that I think I risk losing my audience.

So, I need to declare a likeness matrix for my learning objects. I will thank Chris for his suggestion to use a hashtable of hashtables.

Posted by Frozone Permalink on October 01, 2011 11:54 AM | Comments (2)
categorized under Environment & Distributed Planning




Comments

Hi Steph,

The problem you mentioned above is sometimes called exploration-exploitation tradeoff. The agent should explore enough so it discovers unknown but possibly good choices. But it also should exploit those already known good choices to have a good performance. The agent needs to balance these two things as either of them leads to a suboptimal performance.

Take a look at UCT (Upper Confidence Tree). The original paper is Levente Kocsis and Csaba Szepesvári, "Bandit Based Monte-Carlo Planning," ECML 2006. I think it might be relevant to your work.

Posted by: SoloGen at October 4, 2011 11:05 AM

Hello SoloGen, thank you so much for this comment and for the paper reference.

My supervisor also told me to study the ELMER system which is for path comparison. (Plan creation, plan execution and knowledge acquisition in a dynamic microworld
Gordon I. McCalla, Larry Reid, Peter F. Schneider.
International Journal of Man-Machine Studies
Volume 16, Issue 1, January 1982, Pages 89-112
http://www.sciencedirect.com/science/article/pii/S0020737382800734)

I was also looking at algorithms from bioinformatics because they do a lot of sequence matching for DNA. But I am not certain if the notion of "homologous path" appears in bioinformatics.

Thanks again!!!
Steph

Posted by: Steph (a.k.a. Frozone) at October 8, 2011 05:04 PM

Post a comment




Remember Me?

(you may use HTML tags for style)

Index to Steph's Notes

Feb. 24th 2007 - Weee! This new part of my website is not an entry, but rather a permanent fixture whose purpose is to "Look Down on All Those Notes With Some Grand Vision of Organization". Wish me luck. LOL
  1. Representing meta-data (fuel) & the different kinds of "hooks" that intelligent systems can use (how fuel is injected into the motor of the engine)
    1. Motivation: Semantic net / Rationalizable to a machine
      1. Semantic network
      2. Genetic graph
      3. Prerequisite AND/OR graph
      4. Constraint Satisfaction Problems
      5. Bayesian networks / causal graphs
    2. Technology & Philosophy: RDF, modus ponens,
      1. Predicates, Logic & situation calculus
        1. When in doubt, do some math
    3. What kinds of data? - What kinds of meta-data would an AIEd system possibly need, and how is it represented?
      1. task domain knowledge
      2. "is-prerequisite-to"-type knowledge
        1. Jackpot! A pedagogical ontology
      3. interactions with learning objects & other learners - (location, composition is-a/part-of, sequencing by restricting navigation, personalization, ontologies for LO context)
        1. Types of 'Ecological' data
      4. lesson plans, curriculum plans, practicing sessions (What is stored, what is generated on the fly? What is remembered?)
        1. Agent memory
    4. How to organize it - When is it stored in a database? Meta-data? Agent memory banks? Protocols? Repositories? XML files? Home-servers? WSDL services? Frameworks? Portable banks? P2P access?
      1. Database of object-agent interactions
      2. Concept of "Home" on a P2P network -- maybe the bulk of a learning object's usage data is on its home server and can be queried using WSDL or something ? Similar homes for each student's usage history, etc. Baggage problem.
    5. Links to the ontologies
      1. referring to a concept/relationship - ex. AgentOwl?
        1. Using Vocabularies in JENA
        2. Referring to a concept/relationship in an ontology
        3. Improved: Referring to a concept/relationship in an ontology
        4. Using OWL to reference constraints in tutoring systems
    6. Generation of this data
      1. Rationalization: For use by other AIEd systems
      2. What is generated - discuss items under part I.C.
      3. When it's generated - describe procedural model, which parts of the engine generate what (isa-part-of data, XML feeds, web services, meta data bout groups and collaboration, protocols, examples Friend of A Friend FOAF project)
        1. Thinking about the system's RDF output
      4. Technical notes of HOW it's generated: JENA, issues of implementation demo, my Hermione & Ron agent examples, lol
      5. Usage of this generated data - see part IV. A.
  2. Given the engine, who uses it?
    1. Students / Learners / "Me"
      1. instructional planning, student model, pre-requisites, tutoring, coaching, collaboration,constructivism
    2. Teachers / Educators / "Me"
      1. putting together lessons
      2. be able to browse through task domain knowledge in an objective / encyclopaedia format, then be able to pick-and-choose what you need for your students
      3. compose examples, design explanations, pull together diagrams, learning objects, etc. Haystack Relo?
    3. Administration / Governement / Structure / Crowd Control
      1. as restrictions/obstacles/sand pit to the robot in agent environment
      2. can't just have a swarm of students and teachers out there -- need structure of courses, curriculum, objectives, requirements (at least, we do in this day and age!) - Report cards, evaluation, feedback
      3. government, marks, certificates, requirements, funding, curriclum, attendance, delinquent, non-attending, motivation
      4. school''s images, goals, strengths, payroll, HR, security, accounts, permissions, privacy
      5. registration, failed courses
  3. User Environment -- How does this engine work? What does the user see on the screen?
    1. Introduction - Given a background in educational psychology, how does the system present itself -- what does the user see, and were does this data come from? Links to thoughts from part I.)
    2. Task Domain Browsing - Suppose you're you're just idly browsing through the "raw" content. How would it look when it's not wrapped around a learning-context or lesson or tutorial or anything. 'Cross between browsing a raw task domain ontology and browsing a learning object repository.
      1. Cleaning up the data -- Visualizing the data for humans to pick through the task domain and work on it. Suppose the "Subject Expert" discovers an advancement in science and needs to update the "world's" domain knowledge. (I used the "Subject Expert" terminology from Ontologies to Support Learning Design Context - Thanks Chris) How would they make corrections to ontologies and learning objects, or at least point the users of "old" objects towards adopting the newer ones.
      2. "Modes" - Learning & Lessons / Checklist - Homework, Assignments, Courses being taken / Collaborative mode / Teaching mode / Calendar- email -adminisrative mode -- See also the different kinds of scenarios in the ActiveMath system
        1. Educating myself about Education
  4. Evolution of this engine
    1. target some key implementation hooks discussed in part I - design an experiment/demo
      1. scrape a page - (Note, scraping can only give objective data, not in-context dat)
      2. LO repository - related to browsing the task domain?
      3. a learners "To Do" list - where does it come from? Assignments, courses.
      4. sample group scenario
      5. sample teacher lesson planning
      6. sample data "left behind"
      7. sample use of that data
    2. Data mining (for what? lol )
      1. discovery / generation of ontologies - when do you need to hunt for them, and when do you have to have a solidly-known & predictable ontology?
        1. Ontological Engineering: taking a first bite
    3. I/O - where it happens, which languages, protocols, which agents perform i/o and when, precepts, actuators
      1. Role Assignments
        1. Levels of authorization in web applications
      2. My Environment Adapts to me
        1. Displaying feedback from the server on JSP pages (Software engineering considerations)
        2. Sketching out a design (Content planning vs. Delivery planning)
      3. agent negotiations / social structures / ummm... Web 2.0 ?
        1. Towards student modelling
        2. Anatomy of an agent
    4. garbage collection of meta data
      1. Artificial Intelligence & Evolution
        1. Memory Culling: Necessary part of intelligence? (artificial or human)
        2. Applications for the Genetic/Evolutionary algorithm
      2. open learning environments
  5. Agents, pets, grouping, Community modelling
    1. Protocols - finding groups, cyber dollars, state diagrams (?)
    2. "Community Studies" - graphs & communication hubs, types of communities (free-for-all, hierarchy of authority, etc.)
    3. implications of joining a community - what do you share, which parts of your student model are relevant
    4. Walls & sand traps -- deliberate restrictions as problem-solving for learning
    5. Communication channels - individual-to-individual, individual-to-community, chat channels, agent-only "administrative" communications, ex. requests for related learning objects in a particular community, etc.
  6. Educational/Pedagogical focus (this part probably shouldn't be its own section but rather incorporated into the whole picture, but it's separate for me right now because I'm still only just starting to learn about it.)
    1. Semantics - what there is to talk about in Education
      1. ex. Merril's First Principles of Instruction, linking educational terms to AI terms
        1. Educating myself about education
    2. Pedagogical skills for tutors -- supporting human *and* artifical tutors
      1. Modelling teaching strategies
      2. What is teaching?
      3. Decision theory for teaching strategies
      4. My pedagogical issues
      5. Ontological comparisons as spatial relationships
    3. Student modelling - what the machine needs to know about the student, pedagogically-speaking, about learning history/preferences
    4. Roles - Simulated students, Coaches, Tutors, Teachers,