January 27, 2007
Database connection pooling and clustered Apache Tomcats
As much as I enjoy/prefer theoretcial work, sometimes reality sets in and you've just got to roll up your sleeves and clean it up!
Here's the situation:
I've got 3 Apache Tomcats running on port 8080 (behind our organization's firewall) on separate machines. They are load-balanced using mod_jk/1.2.14 behind an Apache HTTP server. Thus, the only way to interact with the Tomcats (from outside the firewall, anyway) is via HTTP traffic on port 80 on the web server.
The application requires access to 5 databases. Each is hosted by a different machine. (Some of these machines are the the same as the Tomcats.)
Because I don't know any better, I created the 5 ODBC connections under "Control Panel/AdministrativeTools/ODBCAdministrator" on *each* of the Tomcat machines. How else can an application running in a Tomcat Servlet access a remote database? I don't know, and maybe there is a better option out there for me; perhaps in the form of a different protocol. But in the meantime, this is working, at least.
My problem: Lingering database locks. If TomcatA is using a database, and if the application doesn't give up the lock, and if a user whose session is being served by TomcatB tries to read information from the same database, then I get an error (was it java.sql.SQLException or java.lang.NullPointerException? I can't remember; 'should have written it down.)
Obviously, I have to clean up the lock on TomcatA (Thinking, "I explicitly told it to use a non-exclusive lock, why is it doing this?!"). Secondly, I got thinking about the whens and wheres of my database reads: When do I have to open the connections, and when do I close them? Sometimes, if you have to do multiple queries in sequence, it's better to re-use the same connection for those 3 queries. Other times, it's best to swoop in, grab your data, then close it ASAP so that the many users on one of the other Tomcats get a turn.
In a multi-user system, I thought I could be all smart and use static Connection objects, so that each Tomcat would have only 1 connection shared by all its users, thus eliminating the possiblity of 2 connections clashing with each other (at least on the same Tomacat). The Connection would be closed if nobody was using it, and opened so that a handful of users could use the same Connection to execute their Satements. This approach didn't work, because if "you", as the application, are encapsulated within a user's Session, you have absolutely no way of knowing if anybody else is using the Connection, or will ever need that Connection, so who are you to decide whether it should be opened or closed? This doesn't help the lingering-lock problem, either. Further, there were no brains for sharing the database connections between multiple Tomcats. So, I scrapped that approach.
(In my own defense, I came up with the aforementioned approach in 2004, so I'm actually much smarter now. Really! heh heh)
Later, I discovered database connection pooling. This approach takes the decision of when to open/close the "master" connection away from me as a programmer and lets the Tomcat handle it. Yay!
Now that I've been working in a connection-pooled environment for a while (though I'm still suspicious about whether or not I've got it all hooked up properly) I'm starting to develop some questions:
Are there any brains for co-ordinating multiple-Tomcat access to a database?
Does "having an open Connection" always nececessarily mean that you have it locked?
All 5 databases in this system are in Microsoft Access format. (Not my choice.) Are some database-locking functions (ex. per-record vs. per-database, exclusive vs. non-exclusive) dependent on the database implmenetation? Is that why I'm having so many problems trying to obtain multiple reads from multiple servers at the same time?
Is a Connection created on a per-user basis? or a per-Servlet basis? or a per-Tomcat basis? i.e. If I have 2 users requiring access to the same database, are there 2 connections for the same database inserted into the pool? Or, is perhaps the same connection used, and it's just that the 2 users are able to get the same connection by way of that pool? I don't see the big picture yet.
How about per-record locking, as opposed to per-database or per-table locking? Can 2 users have the same database open at the same time if they are operating on different records? (Obviously this is true -- I know this from experience from a different project that used a MySQL database. That system is MUCH more stable.)
When I explicitly do a "Connection.close()", I get connection pool-related errors (shoot, 'don't have the wording with me). Does this mean that I'm doing my "close()" from the wrong place, and is that why I have this lingering-lock problem?
Do I still have to worry about the problem of handling the use of the same Connection over multiple Statements? Or should I always just do the Connection.close() at the end of every single Statement even if I know I'm going to need the same connection again right away, and let the pooling function handle the management?
Why am I doing this?
lol. I have about 4500 users -- maybe 1 Tomcat is enough...? With fewer contestants vying for these valuable Connections/Locks, then maybe if I have only 1 Tomcat then my lingering-lock problem can be swept under the rug because I only have 1 Tomcat that needs the lock anyway.
When I -am- faced with an out-of-control database lock, why doesn't re-starting my Tomcat service release that lock?
And why is the entire database *locked* when I explicitly programmed in a non-exclusive connection?
In Microsoft Access, is there even a difference between "having an Open Connection" and "having the Database Locked" ? According to my expericence, the answer is "No" but I'm still trying to figure all of this out and I'm probably wrong.
I think I am starting to repeat some questions here, so I'll quit. :-)
Well................. time to go do the laundry and make some banana bread. Bye!
Post a comment
Index to Steph's NotesFeb. 24th 2007 - Weee! This new part of my website is not an entry, but rather a permanent fixture whose purpose is to "Look Down on All Those Notes With Some Grand Vision of Organization". Wish me luck. LOL
- Representing meta-data (fuel) & the different kinds of "hooks" that intelligent systems can use (how fuel is injected into the motor of the engine)
- Motivation: Semantic net / Rationalizable to a machine
- Semantic network
- Genetic graph
- Prerequisite AND/OR graph
- Constraint Satisfaction Problems
- Bayesian networks / causal graphs
- Technology & Philosophy: RDF, modus ponens,
- Predicates, Logic & situation calculus
- What kinds of data? - What kinds of meta-data would an AIEd system possibly need, and how is it represented?
- task domain knowledge
- "is-prerequisite-to"-type knowledge
- interactions with learning objects & other learners - (location, composition is-a/part-of, sequencing by restricting navigation, personalization, ontologies for LO context)
- lesson plans, curriculum plans, practicing sessions (What is stored, what is generated on the fly? What is remembered?)
- How to organize it - When is it stored in a database? Meta-data? Agent memory banks? Protocols? Repositories? XML files? Home-servers? WSDL services? Frameworks? Portable banks? P2P access?
- Database of object-agent interactions
- Concept of "Home" on a P2P network -- maybe the bulk of a learning object's usage data is on its home server and can be queried using WSDL or something ? Similar homes for each student's usage history, etc. Baggage problem.
- Links to the ontologies
- referring to a concept/relationship - ex. AgentOwl?
- Generation of this data
- Rationalization: For use by other AIEd systems
- What is generated - discuss items under part I.C.
- When it's generated - describe procedural model, which parts of the engine generate what (isa-part-of data, XML feeds, web services, meta data bout groups and collaboration, protocols, examples Friend of A Friend FOAF project)
- Technical notes of HOW it's generated: JENA, issues of implementation demo, my Hermione & Ron agent examples, lol
- Usage of this generated data - see part IV. A.
- Given the engine, who uses it?
- Students / Learners / "Me"
- instructional planning, student model, pre-requisites, tutoring, coaching, collaboration,constructivism
- Teachers / Educators / "Me"
- putting together lessons
- be able to browse through task domain knowledge in an objective / encyclopaedia format, then be able to pick-and-choose what you need for your students
- compose examples, design explanations, pull together diagrams, learning objects, etc. Haystack Relo?
- Administration / Governement / Structure / Crowd Control
- as restrictions/obstacles/sand pit to the robot in agent environment
- can't just have a swarm of students and teachers out there -- need structure of courses, curriculum, objectives, requirements (at least, we do in this day and age!) - Report cards, evaluation, feedback
- government, marks, certificates, requirements, funding, curriclum, attendance, delinquent, non-attending, motivation
- school''s images, goals, strengths, payroll, HR, security, accounts, permissions, privacy
- registration, failed courses
- User Environment -- How does this engine work? What does the user see on the screen?
- Introduction - Given a background in educational psychology, how does the system present itself -- what does the user see, and were does this data come from? Links to thoughts from part I.)
- Task Domain Browsing - Suppose you're you're just idly browsing through the "raw" content. How would it look when it's not wrapped around a learning-context or lesson or tutorial or anything. 'Cross between browsing a raw task domain ontology and browsing a learning object repository.
- Cleaning up the data -- Visualizing the data for humans to pick through the task domain and work on it. Suppose the "Subject Expert" discovers an advancement in science and needs to update the "world's" domain knowledge. (I used the "Subject Expert" terminology from Ontologies to Support Learning Design Context - Thanks Chris) How would they make corrections to ontologies and learning objects, or at least point the users of "old" objects towards adopting the newer ones.
- "Modes" - Learning & Lessons / Checklist - Homework, Assignments, Courses being taken / Collaborative mode / Teaching mode / Calendar- email -adminisrative mode -- See also the different kinds of scenarios in the ActiveMath system
- Evolution of this engine
- target some key implementation hooks discussed in part I - design an experiment/demo
- scrape a page - (Note, scraping can only give objective data, not in-context dat)
- LO repository - related to browsing the task domain?
- a learners "To Do" list - where does it come from? Assignments, courses.
- sample group scenario
- sample teacher lesson planning
- sample data "left behind"
- sample use of that data
- Data mining (for what? lol )
- discovery / generation of ontologies - when do you need to hunt for them, and when do you have to have a solidly-known & predictable ontology?
- I/O - where it happens, which languages, protocols, which agents perform i/o and when, precepts, actuators
- Role Assignments
- My Environment Adapts to me
- Displaying feedback from the server on JSP pages (Software engineering considerations)
- Sketching out a design (Content planning vs. Delivery planning)
- agent negotiations / social structures / ummm... Web 2.0 ?
- garbage collection of meta data
- Artificial Intelligence & Evolution
- Memory Culling: Necessary part of intelligence? (artificial or human)
- Applications for the Genetic/Evolutionary algorithm
- open learning environments
- Agents, pets, grouping, Community modelling
- Protocols - finding groups, cyber dollars, state diagrams (?)
- "Community Studies" - graphs & communication hubs, types of communities (free-for-all, hierarchy of authority, etc.)
- implications of joining a community - what do you share, which parts of your student model are relevant
- Walls & sand traps -- deliberate restrictions as problem-solving for learning
- Communication channels - individual-to-individual, individual-to-community, chat channels, agent-only "administrative" communications, ex. requests for related learning objects in a particular community, etc.
- Educational/Pedagogical focus (this part probably shouldn't be its own section but rather incorporated into the whole picture, but it's separate for me right now because I'm still only just starting to learn about it.)
- Semantics - what there is to talk about in Education
- ex. Merril's First Principles of Instruction, linking educational terms to AI terms
- Pedagogical skills for tutors -- supporting human *and* artifical tutors
- Student modelling - what the machine needs to know about the student, pedagogically-speaking, about learning history/preferences
- Roles - Simulated students, Coaches, Tutors, Teachers,