February 11, 2009
Decision theory for teaching strategies
I'm trying to wrap my head around decision-making under uncertainty using decision theory and Markov decision processes. After a lot of tumbling and turning, I realized that I'm trying to compare and contrast two things:
- optimal policy construction, and
- Markov decision processes (MDPs).
These are 2 ways (not the only 2) to tackle decision-making. This post is going to be about #1 -- I'll tackle MDPs another day. As for policy construction -- I began my hunt with the notion that you would start with an influence diagram such as the one here.
Your decision problem is modelled as a graph. The square nodes are decision nodes. The diamond nodes are utility nodes. The circle nodes are the same as the nodes in a bayesian network. Circles can point to diamond and squares. Squares can point to diamonds and circles. Only one diamond allowed per diagram.
A policy (I learned to represent policies with the greek letter delta, δ 1) is like a "rule of thumb" for the action you choose when faced with a decision. The optimal policy is the policy that gives you the greatest utility (from the triangle node). You can think of the policy as being connected to the decision nodes (the squares).
One thing that has confused me for a looong time is that your random variables (circle nodes) can be either "states" or "observables/evidence". Recently I had a little epiphany where I thought of the states as your ontology and your observables as your epistemology.
I'm perplexed about the application of a decision network like this. Would you use the same network over and over? I guess you would have to build a network for each type of decision you'll need to face. And, the only time you'd re-use the same network is if you face exactly the same sort of decision again. Although you could modify the values in the CPTs (conditional probability tables) if you had better information the next time around.
Anyway, a "policy" is something that you can apply to your decisions, and the policy tells you which direction to go. It is a function from states to actions. Basically, for all decision nodes in your network, (all squares), your policy is the set of decisions to make -- one decision per decision node. (So, is policy construction always an offline problem?)
In this paper, I think I was a little mislead by their usage of "pedagogical strategy". I was thinking, "oh, are they modeling how to gently guide a student vs. material that's new to them vs. challenging a student to get them to become even more familiar with material they've already been introduced to?" But after reading the paper a couple of times (and I could still be missing something -- the material was pretty dense and a lot of it very technical) I think what they meant by "pedagogical strategy" was "the order in which concepts are introduced". To me this is only a small dimension of a teaching strategy. It's like, this is the "content planning" without "delivery planning".
I was also a little surprised to learn that these researchers used artificial students. I didn't understand what was being measured with the artificial students -- which part of the system was being "tweaked" by optimizing against different types of students, and where they got the "student types" from. (Thinking, 'hey, could I ever use artificial students in my experiments?').
I missed out on learning about "Reinforcement learning" and how they were using MDPs. I still have so much more to learn before I can really grasp a lot of the research that is going on.
On the bright side, this paper did force me to take a closer look at decision theory.
Anyway, my journey about finding teaching strategies continues. I also feel like I'm getting closer to picking a thesis topic. (HA! I know, I've been saying that for years..... ugh.... lol). But, I'm confident enough this time that I might put this statement on my "About Me" page: I'm interested in how to model teaching strategies such that an abstract task domain ontology can be taken and "filtered through" the teaching strategy. This way, you'd have a universal machine that can teach. Scientists all over the world can continue to make discoveries about physics or math or chemistry or astronomy or geology or medicine or anything, and any Jane Doe could learn about it if she wants because she'd have a(n artificial) tutor to help her explore the material whenever and however she wants. I'd like to figure out how to take a learning object and weave it into an instructional plan that is conscious of overall themes and stories that can stretch from lesson to lesson to create an enjoyable, meaningful experience.
1I have also seen pi (π) used to denote a policy. I don't know if there is a difference or if it's just inconsistent notation.
Post a comment
Index to Steph's NotesFeb. 24th 2007 - Weee! This new part of my website is not an entry, but rather a permanent fixture whose purpose is to "Look Down on All Those Notes With Some Grand Vision of Organization". Wish me luck. LOL
- Representing meta-data (fuel) & the different kinds of "hooks" that intelligent systems can use (how fuel is injected into the motor of the engine)
- Motivation: Semantic net / Rationalizable to a machine
- Semantic network
- Genetic graph
- Prerequisite AND/OR graph
- Constraint Satisfaction Problems
- Bayesian networks / causal graphs
- Technology & Philosophy: RDF, modus ponens,
- Predicates, Logic & situation calculus
- What kinds of data? - What kinds of meta-data would an AIEd system possibly need, and how is it represented?
- task domain knowledge
- "is-prerequisite-to"-type knowledge
- interactions with learning objects & other learners - (location, composition is-a/part-of, sequencing by restricting navigation, personalization, ontologies for LO context)
- lesson plans, curriculum plans, practicing sessions (What is stored, what is generated on the fly? What is remembered?)
- How to organize it - When is it stored in a database? Meta-data? Agent memory banks? Protocols? Repositories? XML files? Home-servers? WSDL services? Frameworks? Portable banks? P2P access?
- Database of object-agent interactions
- Concept of "Home" on a P2P network -- maybe the bulk of a learning object's usage data is on its home server and can be queried using WSDL or something ? Similar homes for each student's usage history, etc. Baggage problem.
- Links to the ontologies
- referring to a concept/relationship - ex. AgentOwl?
- Generation of this data
- Rationalization: For use by other AIEd systems
- What is generated - discuss items under part I.C.
- When it's generated - describe procedural model, which parts of the engine generate what (isa-part-of data, XML feeds, web services, meta data bout groups and collaboration, protocols, examples Friend of A Friend FOAF project)
- Technical notes of HOW it's generated: JENA, issues of implementation demo, my Hermione & Ron agent examples, lol
- Usage of this generated data - see part IV. A.
- Given the engine, who uses it?
- Students / Learners / "Me"
- instructional planning, student model, pre-requisites, tutoring, coaching, collaboration,constructivism
- Teachers / Educators / "Me"
- putting together lessons
- be able to browse through task domain knowledge in an objective / encyclopaedia format, then be able to pick-and-choose what you need for your students
- compose examples, design explanations, pull together diagrams, learning objects, etc. Haystack Relo?
- Administration / Governement / Structure / Crowd Control
- as restrictions/obstacles/sand pit to the robot in agent environment
- can't just have a swarm of students and teachers out there -- need structure of courses, curriculum, objectives, requirements (at least, we do in this day and age!) - Report cards, evaluation, feedback
- government, marks, certificates, requirements, funding, curriclum, attendance, delinquent, non-attending, motivation
- school''s images, goals, strengths, payroll, HR, security, accounts, permissions, privacy
- registration, failed courses
- User Environment -- How does this engine work? What does the user see on the screen?
- Introduction - Given a background in educational psychology, how does the system present itself -- what does the user see, and were does this data come from? Links to thoughts from part I.)
- Task Domain Browsing - Suppose you're you're just idly browsing through the "raw" content. How would it look when it's not wrapped around a learning-context or lesson or tutorial or anything. 'Cross between browsing a raw task domain ontology and browsing a learning object repository.
- Cleaning up the data -- Visualizing the data for humans to pick through the task domain and work on it. Suppose the "Subject Expert" discovers an advancement in science and needs to update the "world's" domain knowledge. (I used the "Subject Expert" terminology from Ontologies to Support Learning Design Context - Thanks Chris) How would they make corrections to ontologies and learning objects, or at least point the users of "old" objects towards adopting the newer ones.
- "Modes" - Learning & Lessons / Checklist - Homework, Assignments, Courses being taken / Collaborative mode / Teaching mode / Calendar- email -adminisrative mode -- See also the different kinds of scenarios in the ActiveMath system
- Evolution of this engine
- target some key implementation hooks discussed in part I - design an experiment/demo
- scrape a page - (Note, scraping can only give objective data, not in-context dat)
- LO repository - related to browsing the task domain?
- a learners "To Do" list - where does it come from? Assignments, courses.
- sample group scenario
- sample teacher lesson planning
- sample data "left behind"
- sample use of that data
- Data mining (for what? lol )
- discovery / generation of ontologies - when do you need to hunt for them, and when do you have to have a solidly-known & predictable ontology?
- I/O - where it happens, which languages, protocols, which agents perform i/o and when, precepts, actuators
- Role Assignments
- My Environment Adapts to me
- Displaying feedback from the server on JSP pages (Software engineering considerations)
- Sketching out a design (Content planning vs. Delivery planning)
- agent negotiations / social structures / ummm... Web 2.0 ?
- garbage collection of meta data
- Artificial Intelligence & Evolution
- Memory Culling: Necessary part of intelligence? (artificial or human)
- Applications for the Genetic/Evolutionary algorithm
- open learning environments
- Agents, pets, grouping, Community modelling
- Protocols - finding groups, cyber dollars, state diagrams (?)
- "Community Studies" - graphs & communication hubs, types of communities (free-for-all, hierarchy of authority, etc.)
- implications of joining a community - what do you share, which parts of your student model are relevant
- Walls & sand traps -- deliberate restrictions as problem-solving for learning
- Communication channels - individual-to-individual, individual-to-community, chat channels, agent-only "administrative" communications, ex. requests for related learning objects in a particular community, etc.
- Educational/Pedagogical focus (this part probably shouldn't be its own section but rather incorporated into the whole picture, but it's separate for me right now because I'm still only just starting to learn about it.)
- Semantics - what there is to talk about in Education
- ex. Merril's First Principles of Instruction, linking educational terms to AI terms
- Pedagogical skills for tutors -- supporting human *and* artifical tutors
- Student modelling - what the machine needs to know about the student, pedagogically-speaking, about learning history/preferences
- Roles - Simulated students, Coaches, Tutors, Teachers,