Index - Semantic Web

June 02, 2009

Twitter lurker?

So, I've done something unconventional. I've added the Twitter feeds of people I follow (well, not all of them, but the most interesting ones, anyway) into my Google Reader. I find that I simply don't have time to go onto Twitter. And when I do, I can only see the 200 most recent updates. If I only go onto Twitter once every couple of weeks, then I miss a LOT, and this is hard for me - like it really bothers me - because I'm such a detail-oriented person. I don't like missing huge gaps in my timeline!

With Google Reader, I can mark as "read" the updates I've seen, and, it will queue up for me all of the unread updates until it's convenient for me to look at them. I found with TweetDeck it would NOT keep track of my unread tweets if I had more than 200 unread tweets. If I did have >200 unread, then too bad for me: I would have to manually go to each person's twitter page and scroll back through their updates. There's no way I have time for that.

So, with Google Reader I can keep reading through all of the tweets of the people I follow. I very rarely send out twitter updates myself because I use Twitter to send out updates on what I've been researching lately, but, unfortunately that hasn't been very dynamic as of late. We'll see -- I think I might get better at sending out more messages if I'm no longer stressed out about being so behind on reading everyone else's updates.

I feel like such a lurker: I'm making all these efforts to keep reading (but I am monsterously behind, ugh!), but I haven't been putting much out there myself!

(UPDATE: Unfortunately, this doesn't work for people who have the "private" type of twitter where they have to approve you before you can follow them. Google Reader can't get those feeds because it needs your twitter UID and password.)

Posted by Frozone June 02, 2009 03:24 PM | Comments (0)
categorized under Semantic Web

December 15, 2007

Thoughts about the 'True Knowledge' system

A colleague introduced me to True Knowledge, a new web search tool. Well, it's not so much a web *search* as it is a *query*: this engine will give you a straight answer, then provide you with the supporting evidence. This is in contrast with traditional web searches which do not answer your question directly; they only provide the evidence where hopefully you will be able to identify the answer for yourself.

Watching the demos, the True Knowledge system is very satisfying: you get answers like "the Eiffel Tower is 118 years, 6 months old", or "No, Jennifer Lopez is not single." In addition to answering your question, the system shows you all the facts it used to build the answer, and leaves you to peruse these facts for yourself and to either "endorse" or "contradict" each one. I like it. I have lots of questions about it, which you will discover if you read on, but, in short, I like it. :)

I offered to my colleague (Erin, may I blog about you? :-D ) to explore a little deeper into this new system, and to do a comparison with some of the basic concepts in artificial intelligence. I just took a third-year computer science course, CMPT 317 - Introduction to Artificial Intelligence, so, while some of the ideas are still fresh in my mind, I want to try to apply them critically for the first time in an outside-of-classroom type of setting. :) I will try to share a little bit about what I have learned, and to develop some further questions of my own.

The True Knowledge system looks very much like a logical database. Basically, this is a set of facts that can be used to model a world. Then, you can take another fact out of the blue (the query) and check it against the knowledge base to see if it's consistent. Each fact is composed of a set of variables connected with logical operators: "and", "or", "not", "implies" and "if and only if".

Somehow, the folks at True Knowledge have built a knowledge base using the data on the WWW, and it seems they have developed a means to translate a user's search string into a logical query, which is then matched up against this knowledge base. I expect that the system replaces the keywords in the user's query with logical variables, then applies the query to the facts in the knowledge base.

I don't know how the set of facts in their knowledge base could have been created, though. That's a lot of work. Much of my own blog here was about my quest to automate the process; I wonder how they approached the problem. I have been seeking a distributed solution, where there is no single "master" database as is the case with True Knowledge. (software engineering/scalability)

Assuming that somehow the True Knowledge engineers have been able to
- "scrape" the web, (see Procedurally building a scraper)
- translate atomic task-domain knowledge into logical variables,
- create relationships between these variables,
- build a logical inference engine, then
- design an algorithm to unify through this KB fast enough to be used in a live search engine, then:
I think that the True Knowledge system might be using a resolution algorithm to uncover the supporting evidence-items that are presented to the user as a supplement to the system's answer to their query. It is an impressive feat.

For Yes/No questions, I expect that the system acts like a Prolog engine, where "No" means "I couldn't find any supporting evidence", and a "Yes" answer means that there was supporting evidence in the knowledge base that matched the query. Of course, it is possible to find supporting evidence in proof that something is NOT true, giving you a strong "No". The softer no, "I couldn't find any answer", results in an invitation to the user to add new facts to the knowledge base.

Keywords from the user's query would match up with atoms (ex. people, places, things) in the knowledge base. I think that True Knowledge must be using a semantic network, where the connections between nodes ("edges") can also have keywords attached to them (ex. "applies to", "is married to"). This is in contrast to an AND/OR graph, where edges represent logical implications. In an AND/OR graph, I think that references to task-domain knowledge is restricted to the nodes. I'd like to learn more about graph theory and more about how to "attach" items from a world ontology to a graph. I've seen relationships from the domain-level represented as nodes themselves. (see Jackpot! A pedagogical ontology) I don't know which forms of knowledge representation work best in which circumstances. Anyway. :) If you would like to keep following this train of thought, check out my entry from about a year ago (Nov. 2006) about philosophical and computational ontologies. (ontologies & the semantic web)

Remaining questions:

Does the True Knowledge system rely on human data-entry workers to populate the knowledge base?

I think that it solves the multiple-purpose problem... the True Knowledge system can refer to the same content within different contexts.

What if we wanted to prepare content so that it's contextualized to an individual user? Would this affect the structure or type of knowledge base being used? (representation of epistemological vs. ontological data)

Many remaining questions.... But all in all, I was really pleased to learn about this new system. It's a step in the right direction, 'seeing logic applied in a very practical way! :)

Posted by Frozone December 15, 2007 03:35 PM | Comments (0)
categorized under Semantic Web

August 12, 2007

"Smart" nodes in a search tree? Perception in the human brain vs. data mining on the semantic web

Okay, this is where a Computer Scientist tries to remember the nitty gritty of computability theory, some three-and-a-half years after the end of her education.

I was listening to a psychology lecture about perception and about the human brain intakes so much information - visual data, for example - and about how the image-garthering part of the brain that's attached to our eyes is capable of filtering out a lot of the extra information so that our brain need only worry about the particular "thing" we are trying to focus on. (For the lecture, check out MIT OpenCourseWare - http://ocw.mit.edu/ - it's been FABULOUS for someone like me wanting to learn a subject, but not necessarily having the time to take the subject myself for credit. Plus, it's an MIT course so you get a little bit of extra "prestige" knowing you're "taking" a course from them. :-/ )

Part of this lecture discussed how the human brain "makes sense" of data, and, in particular, how it executes search routines. For example, suppose you are looking at a diagram of messy shapes, patterns and colours. Then, suppose you were asked a Y/N question whether or not you saw any vertical lines anywhere in the image. It would take a certain amount of time for you to come up with the answer, but you do it quickly enough. Next, you are given a similar image, again with a variety of shapes, patterns and colours. Now, you are given the same question, only you are provided with the hint that if there are any vertical lines, they will be "red". You will be able to answer the question more quickly, because your brain is able to discount all of the non-red things in the image, thus making the search-space smaller.

How does this work, brain-wise? Well, you have sensors in your brain that can "see red" and you have other sensors that can "see vertical". Somewhere in your brain, you also have an intersection-mechanism or conjunction-mechanism that enables you to put pic out "red, vertical" things.

(That's the gist of the experiment, anyway. Psychologists forgive me if I missed the point. lol)

Anyway, I found myself scratching my head and trying to remember waaaay waaaay back to 2002-2003 when I took CMPT 360 - Machines and Algorithms and CMPT 361 (now 461) - Intractable Problems and Models of Computation. How would this problem be modelled computationally? How could you speed up your search by having an extra bit of information about what you are looking for? How can can you search for things in the world of messy data (ex. the Web) where you don't know everything about the items you're wading through?

First, I was like, "Okay, this has got to fit into the deterministic-finite-automata framework somehow. What are my states? Ummm... the state of having found the 'red-thing' and the state of having found the 'vertical-thing'? No, no! I've got that all wrong. I should be thinking of a search in terms of a graph traversal."

So, secondly, I was like, "Okay, so suppose each node in the graph is a 'thing' that can have an orientation (horzontal, vertical, diagonal) and it can also have a colour. So, as I traverse the graph, I can succeed when one of the nodes matches either of those desired attributes!" But, wouldn't having more things to search for actually INCREASE the amount of time that it takes for your search? Because, at each node, you really have to execute another sub-search to make sure that the node is "red" and that it is "vertical". Our point here is to DECREASE the number of items we have to search through by narrowing the search-space even before the graph traversal.

Is there such a thing as a graph with "smart nodes" that could prune themselves for you, before the search begins? For example, what if a node knew ahead of time that the goal was to find red-things. Then, it could "drop" its connections to all of its non-red neighbours in attempt to speed up the search when the traversal crosses over him. I have no idea how a node would "know" to adjust its neighbour-set ahead of time. Parallel processing? Help! I'm stabbing in the dark! Or, maybe this would have to be done at traversal-time anyway.

After combing my brain through graph-traversals and deterministic-finite-state-machines, something else popped into my head: Constraint satisfaction problems! Don't problems of this nature allow you to optimize your search given a constraint?

Constraint satisfaction problems have a set of variables and a set of constraints. An assignment of these variables that does not violate the constraints would mean that the data set (the variables) "matches" the state-of-affairs we wanted when we defined the constraints. (I think.)

So, for this finding-red and finding-vertical problem, how would I assign my variables? Would I have two sets of variables, maybe? One indicating a boolean whether or not the thing is read and another set indicating whether or not the things are vertical? This would be consistent with the human brain's organic having a "see red" function and having a "see vertical" function. The next question here is: how do you form a conjunction of these, computationally-speaking? Am I actually working with TWO constraint satisfaction problems? Then do I merge the results? Or do CSPs by nature have a way for me to define constraints over multiple domains, where a domain is the set of possible values that my variables can take and over which the constraints are defined. I'm thinking that "Colour" and "Orienation" have to be in 2 different domains, don't they? Would I have to have 2 graphs, one from each sort of detector in my head, and then generate a hybrid graph somehow that accounts for all of my search criteria?

And, does a CSP account for the fact that we do not know ahead of time what the value of a variable actually IS? Suppose my "red-detector" is going around my data set, but it falls upon a node that does not identify itself as red or not. I have to have some sort of flexible red-detector within the red-detector itself. Like, the evaluation of redness should actually come from the red-detector and not from within the item being analysed. I'd have to define some sort of generic red-evaluator that could mark a thing as "red" or not, no matter what its shape or size. Yowzers! How do I do that?

I guess that's why Ontologies were invented... so that you could check a thing's classification in an ontology somehow in order to evaluate its red-ness. So, really, the red-detector shouldn't have to process each and every object in existance, but, rather is faced with the smaller problem of annotating an ontology and finding which items THERE could fit as "red". The organization and coordination of ontologies, though, is another matter!!!!!

Well that was a fun exercise. In my own little head, I think I've stepped though a connection between Ontologies and the sematic web and speeding up searches, and, possibly-maybe some parallels between the structure of the human brain and web architectures. Hmmm. Time to go have breakfast. Bye!

Posted by Frozone August 12, 2007 08:36 AM | Comments (0)
categorized under Semantic Web

December 31, 2006

'Have just discovered Ajax

I was reading an article somewhere (gah, forgot where) and it referenced something called Ajax, short for "Asynchronous JavaScript and XML".

I was thinking about my tabbed item selection service, and about how the user's choices in one tab will affect the adapted "advice" under the other two tabs. In my existing implementation, the content that is shown/hidden depending on the current tab selection is given to the web browser in this format:

(err... copied to this external text file. As always, I struggle with showing markup on a web page. =) )

... You can just go View--> Source under my tabbed item selection service and see for yourself. For a rant about CSS positioning and browser compatibilities, enjoy this.

Basically, each "tab" is its own < div > on the web page. I just have a little DHTML menu to show/hide whichever < div > the user picks. The middle tab is the main work area and is the default tab. The user can complete all necessary tasks using the middle tab only; the other two are just supplemental information if they choose to seek help.

Any time the user clicks on one of the items in the middle tab, the advice in the other two tabs adapts. The user can choose when they want to look at it by going into one of the other tabs. So, if they wanted to keep working and uninterrupted by advice from the computer (well, at least, the unsubtle advice) then they can opt not to check out any of the advice under the tabs.

Basically, this means that the code fragment I showed you earlier is all that needs to get re-sent to the browser each time the user clicks a checkbox. My existing system is actually sending back the entire HTML document (derived from a jsp page) : The header at the top, the "special instructions" from the teacher, the bulleted list of current selections, the links to Handy Rescources, etc. etc.

In the actual implementation, I do have all of the JavaScript and CSS refereced in external files, but we still do have a lot of code that gets sent back-and-forth each time, unnecessarily. In fact, it looks like there is more static stuff than there is dynamic stuff!

Which means, if I am reading this properly, that the usage of Ajax could cut the size of my client-server messages by more than half. It would not change the number of client-server interactions, but, each message would be much shorter. And, I wouldn't necessarily be sending HTML markup back to the client; instead, I'd just send the raw data and the browser would do all of the calculation required to put the web page together.

I think I've got that right.

It looks like Ajax is attempting to move the "brains" for page formatting further into the client side; the web application would still dictate exactly how the page looks, but, it's the web browser that would render the data into HTML rather than having the web server both compute the data and render it into a user-viewable form.

Woops, it's almost 10:00 - time for me to go! Bye!

Posted by Frozone December 31, 2006 09:20 AM | Comments (2)
categorized under Semantic Web

Index to Steph's Notes

Feb. 24th 2007 - Weee! This new part of my website is not an entry, but rather a permanent fixture whose purpose is to "Look Down on All Those Notes With Some Grand Vision of Organization". Wish me luck. LOL
  1. Representing meta-data (fuel) & the different kinds of "hooks" that intelligent systems can use (how fuel is injected into the motor of the engine)
    1. Motivation: Semantic net / Rationalizable to a machine
      1. Semantic network
      2. Genetic graph
      3. Prerequisite AND/OR graph
      4. Constraint Satisfaction Problems
      5. Bayesian networks / causal graphs
    2. Technology & Philosophy: RDF, modus ponens,
      1. Predicates, Logic & situation calculus
        1. When in doubt, do some math
    3. What kinds of data? - What kinds of meta-data would an AIEd system possibly need, and how is it represented?
      1. task domain knowledge
      2. "is-prerequisite-to"-type knowledge
        1. Jackpot! A pedagogical ontology
      3. interactions with learning objects & other learners - (location, composition is-a/part-of, sequencing by restricting navigation, personalization, ontologies for LO context)
        1. Types of 'Ecological' data
      4. lesson plans, curriculum plans, practicing sessions (What is stored, what is generated on the fly? What is remembered?)
        1. Agent memory
    4. How to organize it - When is it stored in a database? Meta-data? Agent memory banks? Protocols? Repositories? XML files? Home-servers? WSDL services? Frameworks? Portable banks? P2P access?
      1. Database of object-agent interactions
      2. Concept of "Home" on a P2P network -- maybe the bulk of a learning object's usage data is on its home server and can be queried using WSDL or something ? Similar homes for each student's usage history, etc. Baggage problem.
    5. Links to the ontologies
      1. referring to a concept/relationship - ex. AgentOwl?
        1. Using Vocabularies in JENA
        2. Referring to a concept/relationship in an ontology
        3. Improved: Referring to a concept/relationship in an ontology
        4. Using OWL to reference constraints in tutoring systems
    6. Generation of this data
      1. Rationalization: For use by other AIEd systems
      2. What is generated - discuss items under part I.C.
      3. When it's generated - describe procedural model, which parts of the engine generate what (isa-part-of data, XML feeds, web services, meta data bout groups and collaboration, protocols, examples Friend of A Friend FOAF project)
        1. Thinking about the system's RDF output
      4. Technical notes of HOW it's generated: JENA, issues of implementation demo, my Hermione & Ron agent examples, lol
      5. Usage of this generated data - see part IV. A.
  2. Given the engine, who uses it?
    1. Students / Learners / "Me"
      1. instructional planning, student model, pre-requisites, tutoring, coaching, collaboration,constructivism
    2. Teachers / Educators / "Me"
      1. putting together lessons
      2. be able to browse through task domain knowledge in an objective / encyclopaedia format, then be able to pick-and-choose what you need for your students
      3. compose examples, design explanations, pull together diagrams, learning objects, etc. Haystack Relo?
    3. Administration / Governement / Structure / Crowd Control
      1. as restrictions/obstacles/sand pit to the robot in agent environment
      2. can't just have a swarm of students and teachers out there -- need structure of courses, curriculum, objectives, requirements (at least, we do in this day and age!) - Report cards, evaluation, feedback
      3. government, marks, certificates, requirements, funding, curriclum, attendance, delinquent, non-attending, motivation
      4. school''s images, goals, strengths, payroll, HR, security, accounts, permissions, privacy
      5. registration, failed courses
  3. User Environment -- How does this engine work? What does the user see on the screen?
    1. Introduction - Given a background in educational psychology, how does the system present itself -- what does the user see, and were does this data come from? Links to thoughts from part I.)
    2. Task Domain Browsing - Suppose you're you're just idly browsing through the "raw" content. How would it look when it's not wrapped around a learning-context or lesson or tutorial or anything. 'Cross between browsing a raw task domain ontology and browsing a learning object repository.
      1. Cleaning up the data -- Visualizing the data for humans to pick through the task domain and work on it. Suppose the "Subject Expert" discovers an advancement in science and needs to update the "world's" domain knowledge. (I used the "Subject Expert" terminology from Ontologies to Support Learning Design Context - Thanks Chris) How would they make corrections to ontologies and learning objects, or at least point the users of "old" objects towards adopting the newer ones.
      2. "Modes" - Learning & Lessons / Checklist - Homework, Assignments, Courses being taken / Collaborative mode / Teaching mode / Calendar- email -adminisrative mode -- See also the different kinds of scenarios in the ActiveMath system
        1. Educating myself about Education
  4. Evolution of this engine
    1. target some key implementation hooks discussed in part I - design an experiment/demo
      1. scrape a page - (Note, scraping can only give objective data, not in-context dat)
      2. LO repository - related to browsing the task domain?
      3. a learners "To Do" list - where does it come from? Assignments, courses.
      4. sample group scenario
      5. sample teacher lesson planning
      6. sample data "left behind"
      7. sample use of that data
    2. Data mining (for what? lol )
      1. discovery / generation of ontologies - when do you need to hunt for them, and when do you have to have a solidly-known & predictable ontology?
        1. Ontological Engineering: taking a first bite
    3. I/O - where it happens, which languages, protocols, which agents perform i/o and when, precepts, actuators
      1. Role Assignments
        1. Levels of authorization in web applications
      2. My Environment Adapts to me
        1. Displaying feedback from the server on JSP pages (Software engineering considerations)
        2. Sketching out a design (Content planning vs. Delivery planning)
      3. agent negotiations / social structures / ummm... Web 2.0 ?
        1. Towards student modelling
        2. Anatomy of an agent
    4. garbage collection of meta data
      1. Artificial Intelligence & Evolution
        1. Memory Culling: Necessary part of intelligence? (artificial or human)
        2. Applications for the Genetic/Evolutionary algorithm
      2. open learning environments
  5. Agents, pets, grouping, Community modelling
    1. Protocols - finding groups, cyber dollars, state diagrams (?)
    2. "Community Studies" - graphs & communication hubs, types of communities (free-for-all, hierarchy of authority, etc.)
    3. implications of joining a community - what do you share, which parts of your student model are relevant
    4. Walls & sand traps -- deliberate restrictions as problem-solving for learning
    5. Communication channels - individual-to-individual, individual-to-community, chat channels, agent-only "administrative" communications, ex. requests for related learning objects in a particular community, etc.
  6. Educational/Pedagogical focus (this part probably shouldn't be its own section but rather incorporated into the whole picture, but it's separate for me right now because I'm still only just starting to learn about it.)
    1. Semantics - what there is to talk about in Education
      1. ex. Merril's First Principles of Instruction, linking educational terms to AI terms
        1. Educating myself about education
    2. Pedagogical skills for tutors -- supporting human *and* artifical tutors
      1. Modelling teaching strategies
      2. What is teaching?
      3. Decision theory for teaching strategies
      4. My pedagogical issues
      5. Ontological comparisons as spatial relationships
    3. Student modelling - what the machine needs to know about the student, pedagogically-speaking, about learning history/preferences
    4. Roles - Simulated students, Coaches, Tutors, Teachers,