Nicola Henze and Wolfgang Nejdl
University of Hannover
Lange Laube 3, 30159 Hannover, Germany
{henze,nejdl}@kbs.uni-hannover.de
Virtual Learning Environments that allow for active (constructivistic) learning lead to different adaptation requirements than learning environments based on more conventional teaching strategies. In this report we discuss our approach of building adaptive hyperbooks (adaptive extendible information resources on the internet). The adaptation techniques used in our hyperbooks are based on a goal-driven approach for selecting projects and for generating and presenting prerequisite knowledge necessary for a student project. The user model underlying the hyperbook is a kind of overlay model using a Bayesian network for inferring/estimating user knowledge. We propose a project selection algorithm based on user goals and previous knowledge and a constructive trail mechanism that generates guided tours through the hyperbook containing all prerequisites needed by a particular user to perform a specific project.
One of the main goals of student modeling in educational hypermedia is student guidance [4]. Students/users have learning goals and previous knowledge which should be reflected by the hyperbook, by adapting the content or the link structure of the hyper document.
In our KBS Virtual Classroom Project we follow a constructivistic paedagogic approach, building heavily on project based learning, group work and discussions [12]. Such an active learning environment leads to new requirements for adaptation, in order to adapt the project resources presented in a set of hypermedia documents to the student goals (for a specific project) and the student knowledge.
Our KBS hyperbook system therefore aims to support the student learner by implementing the following adaptation components:
In this report we will describe our approach for these adaptation components as well as their implementation. We have implemented an adaptive hyperbook for a CS1 course (introduction to programming using Java), and will use it in our examples.
Problem-oriented and inquiry-oriented learning are two main concepts of constructivist learning environments (see also [22, 14, 2, 13]), plus other equally important concepts like active construction of understanding, conceptual restructuring, social interactions, reflection and mentoring. In this report we concentrate on the problem-oriented and inquiry-oriented aspect as well as conceptual structuring, which we reflect in our hyperbook structures. Furthermore, we will not discuss specific pedagogical issues and concepts (for a short discussion of our ideas see e.g. [13]), but will concentrate on the question of how this pedagogical focus changes the structure of the learning materials (the hyperbook) and the requirements for adaptation.

Figure 1: Part of the meta model for modeling hyperbooks
Figure 1 shows part of the high level structure of our hyperbook and simultaneously the different learning strategies in our environment and the resulting link adaptation tasks. The notation we use in this figure is a kind of ER-Modeling notation, which shows concepts as boxes, relations (1:1, 1:n, m:n) as links, and two kinds of adapted relations. The main content of the hyperbook consists of semantic information units and project units. Both of these refer to actual content to be displayed over the WWW as pages of the hyperbook (see [9, 10] for a description of the basic principles and the implementation of the KBS hyperbook system).
All implemented adaption strategies in our hyperbook are based on
knowledge items. Such a knowledge item (
) denotes an elementary
knowledge concept of the application domain. The knowledge items are
used for indexing the contents of information units, project units and
for describing the range of goals. They are similar to the domain
model concepts used in [3].
Information units do not correspond to syntactical parts of a book (such as sections or chapters), but semantical parts (such as information units about ``Java Objects'', ``Iteration Constructs'', ``Parameters'', etc.). They are semantically related to other information units (i.e. ``object'' and ``object instantiation'' are related information units). These semantic relationships generate the navigational structure between the information units (which is done dynamically by the KBS hyperbook system), so each link between information units corresponds to some kind of semantic relationship between these units. We will not further discuss these semantic relationships in this report but instead refer to [18]. This navigational structure can be annotated (already known, suggested, too difficult) according to the current knowledge of the reader ( adaptive navigational structure). For this annotation, we use the well known traffic light metaphor (see e.g. [23, 3]). A red ball in front of the link indicates that the corresponding page requires some knowledge the user does currently not have and thus is not recommended for the user (too difficult), while a green ball denotes a recommended link (suggested), which should be understandable for the user. Finally, a grey ball (already known) denotes material which (according to the hyperbook's estimate of the user) is already known to the user.
Information units are indexed by knowledge items. As information units are already semantic entities, in many cases we have a one to one correspondence between information units and knowledge items. One or more of the knowledge items belonging to a page are the main knowledge items of this page, and for each knowledge item there is exactly one information unit, where it is a main knowledge item. So we have something like a knowledge item index, which gives for each knowledge item one main information unit, and some other information units where it occurs too, but not as main knowledge item.
Project units represent project descriptions, and are indexed by those knowledge items which the student needs to know in order to successfully work on these projects. Knowledge items containing required, prerequisite knowledge for this project need not to be included - the dependency between the knowledge is contained in Bayesian network of the user model (see section 3).
The relationship between project units and information units can be automatically derived (via the knowledge items) and shows the information units which are relevant for a given project. The links corresponding to this relationship can be adapted as well. This is done by annotating the links according to the user's knowledge (already known, suggested, too difficult), leading to an adaptive information resource for a given project. The annotated links are shown as an annotated index (from the project unit to the corresponding information units).
The system can also generate a sequential trail (guided tour) through these information units, leaving out already known information units, and ordering the remaining information units, such that difficult information units are suggested at a later stage, when the user knows enough in order to understand them (adaptive trail generation).
The user can select a set of knowledge items (called a goal), and the system can generate (according to the user's knowledge) an index of projects most useful for achieving the user's learning goal (adaptive project selection), a trail for learning these knowledge items (adapted to the user's knowledge) or an annotated index of information units for this goal. Finally, the hyperbook system can propose suitable learning goals for the user based on the user's current knowledge (adaptive goal selection), and then propose corresponding projects, trails or information units.
The indexing of semantic information units by knowledge items,
as described in the last section, can be considered as a kind of
overlay model [11]. Such a knowledge item
(
) denotes an elementary knowledge concept; the set of knowledge
items describes the knowledge of the application domain.
s are the
basic descriptors for the user model. Additionally, we need to model
learning dependencies between
s represented by a partial order
between these
s, where
1 <
2 denotes the fact, that
1 has to be learned before
2, because understanding
1 is a
prerequisite for understanding
2.

Figure 2: Measuring contents of HTML-pages
Therefore our user model contains the knowledge items also used in the
general hyperbook model, and adds a partial order between these
knowledge items to represent learning dependencies. This overlapping
between the hyperbook and the user model is shown in
figure 2. The user model also contains
descriptions of each users current knowledge in the form of a
vector. This decoupling between hyperbook model and user model has
advantages for authoring the hyperbook, as learning dependencies
between knowledge items are described once in the user model, and the
dependencies between information units of the hyperbook can be
inferred automatically from the
-dependencies and the indexing of
the information units by the
s.
In order to represent the partial order between the knowledge items,
as well as to facilitate the updating of user's knowledge depending on
new information, we have chosen to implement the user model of the KBS
hyperbook system as a Bayesian network
(BN). This
BN contains the knowledge items as network nodes. The dependencies
between
s are expressed by conditional probabilities between the
s. These conditional probabilities are very simple and express how
strong a dependency is. We assume for example, that a user familar
with ``sorting algorithms'' will know the ``quicksort
algorithm''. Thus the conditional probability table of the node
``quicksort'' reflects this kind of dependency by assuming an expert
(advanced, beginner, newcomer) in the parent node ``sorting'' to be an
expert (adcanced, beginner, newcomer respectively) in the child node
``quicksort'', with a fault tolerance. Thus to find such a conditional
probability table we investigated several distributions. Our result
can be seen in figure 3. In a simliar way we
found conditional probability tables for weaker dependencies and for
root nodes.

Figure 3: conditional probability table for the
node ``quicksort'', which is dependent on the node ``sorting''
The probabilities assigend to each node in the BN can be seen in figure 4. This figure shows the system's estimation about the knowledge of a new user who has no a priori knowledge about the topics of CS1.

Figure 4: Part of the Bayesian network for a
CS1-hyperbook
BNs are useful in user modeling, since they allow to describe the
application domain in a single dependency graph. This graph contains
all necessary prerequisites for a particular knowledge item, models
dependencies among knowledge items and is able to infer for example
that prerequisite knowledge of a
has already been acquired by a
user if the
itself is understood by the user.
By using a BN, it is possible to use observations about the user's work with the hyperbook and hyperbook projects to update the system's estimate of the user's knowledge. For example, if the system's estimate of the user's knowledge is too pessimistic, and the user solves an advanced project which the hyperbook had thought to be too difficult for him, the system can use this observation to update its estimate, based on the successful completion of the project unit and the indexing of project units by knowledge items (representing the necessary knowledge to successfully complete this project unit). On the other hand, if we observe an advanced user failing to understand some simple concepts, then the BN can selectively change its estimate of this user with respect to these concepts, without classifying him as a complete beginner, and can suggest specific project units for learning these concepts.
Another advantage of using BNs is the handling of uncertainty in our
observations. We can use every degree of information about the user's
knowledge, not only failed / not failed. Currently we use a
vector of four probability values (summing up to 1) describing our
estimate that a user understands a specific knowledge item to the
degrees excellent (expert user), some difficulties (advanced
user), many difficulties (beginner), not ready
(newcomer). This corresponds to using a random variable with four
discrete values. In order to simplify the construction of the
dependency graph, we stratify the
s into levels as shown in
figure 5, where the nodes in each level have a
dependency structure expressed by a tree. We currently use three
levels. The lowest one is the simple concept level containing
the concepts which need no prerequisite knowledge. The second
level contains more advanced concepts which need only the first level
concepts as prerequisite knowledge. The third level is the level
of the compound concepts which can only be understood by knowing
second- or third-level concepts. The second level is further divided
into two parts: one that is necessary for the understanding of some of
the third-level concepts (level 2 in figure 5) and
another part that is not required by any third-level concept (level 2'
in figure 5.
We developed a special clustering formalism for this stratified
modeling approach which enables us to generate a directed acyclic
graph out of the dependency graph describing the
s and the
dependencies among them (which has advantages for the performance of
the inference algorithms for the BN). The algorithm we use for BN
inference in this generated graph is an exact inferring algorithm for
directed acyclig graphs [19]. The current user model for our
CS1 hyperbook ``Introduction into Computer Programming'' contains
about 300 nodes.

Figure 5: Schematic model of a Bayesian
network underlying the user model
There are several systems which use the fact, that the user ``reads'' some information, to update the estimate of the user's knowledge (e.g. [3]), and also include reading time and/or the sequence of read pages to enhance this estimation. While this is a viable approach, it has the disadvantage, that it is difficult to measure the knowledge a user gains who ``reads'' a HTML-page [4]. The time spend for looking at a page may give more details but it is still uncertain if a particular user has really understood the contents of the page or if he just drank a cup of coffee and afterwards continued his reading process. We decided neither to take information about visited pages into account nor the user's path through the hypertext. Instead, in the current state of our development we ask the users for direct feedback after they had performed a project. As discussed in the next section the user is free to choose different kinds of answers including ``topic was easy - I mastered it effortless'', ``topic was okay - but some problems were arising'', ``topic was hard - I had a few ideas but could not get the thing right'' and ``no idea about this topic at all''.
Often a user needs information about specific topics but lacks prerequisite knowledge for these topics (e.g. a user wants to work on a project about algorithms but does not understand simple control structures or methods). In such circumstances it does not help to start reading the information unit about algorithms. To support the user in this situation, we compare the user's actual knowledge with the required knowledge needed to understand the requested topic. If the user's lacks some requirements we generate a sequence of information units (trail/guided tour) that guides his learning step by step towards his selected topic.
Generation of such a trail is implemented by a depth-first-traversal
algorithm which checks the system's estimate of the user's knowledge
of those
s that are prerequisites for the actual goal. The
algorithm checks if all prerequisite knowledge is sufficiently known
by the user - if not, the corresponding informations units of the
hyperbook are internally marked. Afterwards a sequence of all those
marked units is generated which leads from the simple to the
complicated topics towards the selected topic.
Furthermore, the hyperbook provides direct access to information resources needed for the actual task (information goal or project). This information resource is generated by the same depth-first-traversal algorithm as mentioned above but contains all found informations units and is displayed as a sorted index, each link annotated according to the user's knowledge using the traffic light metaphor (see section 4.4).
To be able to select suitable projects for a user the hyperbook
contains a project library. Each project in this library is indexed
by the
s, that have to be understood in order to successfully
complete the project. These
s are weighted due to their importance
for the project. As we use a Bayesian network for modeling the users
knowledge, we do not have to include prerequisite knowledge items,
because they are already taken care of by the dependency structure
expressed by the BN.
A project is useful for a user in his current knowledge state and his situation, if
These requirements determine the selection criteria for finding an appropriate project for a user that helps the user to achieve his learning goal and reflects his current knowledge state. They are implemented by two algorithms. The first one calculates how good a project matches the goal of a user ( project-goal-distance). The second one determines whether the actual knowledge of a user is sufficient for performing the suggested project without too many difficulties (fitness). For example, a user interested in learning simple control structures in Java will have difficulties with a project that uses control structures to build a graphical user interface if she/he has no or only beginner's knowledge about graphical user interfaces.
The hyperbook selects the best project(s) by comparing the weighted sums of these two measures. The weights allow to emphasize either one of the aspects matching and fitness, currently we use a factor of 0.5 for each aspect.
We implemented a matching algorithm that calculates the
project-goal-distance between a project and the actual goal based on
the
s contained in the goal and their relevance in the project.
Each
contained in the goal is assumed to have a relevance of 100.
The relevance of a
for a project is defined by its percentage in
relation to the whole project. The matching algorithm uses the
euclidean metric to calculate the distance between a
that belongs
to the user's goal and its relevance for the project. A short distance
means that this
is very important for performing the project while
a large value represents the fact that the
is not very relevant
for the project. For every
of the goal that is not contained in
the project, this distance is set to a maximum value of 100. Thus, for
all
, we have
![]()
The project-goal-distance for a project and a given goal is then calculated as the mean value of all these distances:

The second algorithm determines the fitness of a user for a project. To determine this fitness we evaluate the knowledge of the user concerning those parts of the project that do not belong to the user's goal. This enables us to select projects that are based on prerequisites already known by the user, and thus lead him as fast as possible to his goal.
![]()
where
index the project and ID is the
identity function that returns 1, if
, 0
otherwise.
If a user wants more guidance during his learning process he can ask the hyperbook for the next learning step. This request is resolved by determing a suitable learning goal for this user based on his current knowledge. Based on this goal, the hyperbook can propose a suitable project, a set of information units or a trail leading to that goal.
To determine the next suitable learning goal, a sequential trail covering the whole hyperbook is calculated. For each item of this trail the systems's estimate about the user's knowledge is checked - if the user fails to know some knowledge item, this item is proposed as the next suitable goal.
As discussed previously, the information units are linked based on their semantic relationships. Annotation of these links is very useful if a user wants just wants to browse through the hyperbook. Links are marked as ready_for_reading (green ball in front of a link), not ready_for_reading (red ball) or already_known (grey ball) to help the user select appropriate information units.
An information unit is ready_for_reading if all prerequisites are
known by the user. In terms of our BN this means that an information
unit indexed by a set of
s can be read by a user if all children
of these
s are sufficiently known. A child of a knowledge
item is sufficiently known, if it is known, well_known or
excellently_known. A
is excellently_known, if the probability,
that the user has expert knowledge about it, is greater than the sum
of the probabilities for advanced, newcomer and beginner's
knowledge. It is well_known if the sum of probabilities for expert
and advanced knowledge is greater than the sum of newcomer and
advanced knowledge. A
is known if the sum of probabilities for
beginner and advanced knowledge is greater than the sum of
probabilities for expert or newcomer knowledge. These definitions are
motivated by the distribution of the probability mass of the four
different values for estimating the user knowledge for a specific
knowledge item (expert, advanced, beginner, newcomer), see
figure 6

Figure 6: Interpretation of the
probability distributions given by the Bayesian Network
An information unit is not ready_for_reading, if the user has a gap
in some of the required knowledge of this page (i.e. at least one of
the
s expressing required knowledge for this information unit,
that means at least one of the child nodes of a
belonging to this
information unit, is not sufficiently known).
If the Bayesian network of the user model shows that all
s
belonging to a page are well_known or excellent_known by a user,
this page is marked as already_known for him.
Figure 7 shows an example page of the hyperbook.

Figure 7: Example page of the CS1-hyperbook
The proposed adaptation component for our hyperbooks is distinct to other approaches in student and user modeling as it uses a Bayesian network for modeling all relevant knowledge needed for adaption purposes. In addition, it is centered around active learning and thus defines and implements several different adaptation requirements and tasks for generating customized learning units with projects, information resources and sequentialized, individual learning paths through the hyperbook.
In this section we will specifically compare our system to other hyperbook-like approaches, to other systems which use similar techniques for indexing and describing relevant information, and to systems that use Bayesian probabilities for maintaining a users knowledge.
ELM-ART [23] and its successors implement episodic user modeling based on a hierarchically organized conceptual network for knowledge representation. Each unit of the network contains the text of the page, information to relate this page to other units and a description about incoming, outgoing and related concepts. Thus the conceptual network contains both information about the application domain and the reading sequence. We use two different models for describing the user and the application domain. Therefore the author does not need to explicitly model incoming and outcoming pages, but store dependency information in a separate user model (the Bayesian network). Observations about the learning progress of a particular user can be used to update the user model independently from particular pages, the system is able to infer for example that prerequisite knowledge of a knowledge item has already been acquired by a user if the knowledge item itself is understood by the user. The way ELM-ART uses learning units is very different to our system as we generate individualized learning units (projects plus trails and information units) for the users.
The technique of indexing every page with concepts learned by reading a page, prerequisite knowledge and, in addition, the prerequisite that makes this page superfluous, is also used by [7]. In addition, our links include a short abstract describing the page they point to. In contrast to the technique used in this approach of hiding superfluous links we display all links continuously and simply mark them as superfluous (which after all is just an estimated computed by hyperbook).
The authoring tool provided in Interbook [5, 3], which evolved from the ELM-ART tutoring system, uses a hierarchically organized domain model based on texts structured with sections, subsections, etc. Based on this domain model, pages of the electronic textbook are generated. Pop [15] uses a software object hierarchy for knowledge representation. These approaches of using an explicit domain model are similar to our approach. In addition we allow the user to structure the domain in any way, especially cross-relations not contained in a section-based or otherwise hierarchical structure can be used.
PT [17] uses three levels for a customized hypertext: A domain representation level, stereotypes and an individual model. A meta hypertext is used to cover examples of different sorts (mathematical, text-based, tiny-focussed, larger-complete) thus this meta model has a different role as the meta model used in our hyperbooks as it handles pages with different attributes. The used implementation technique (e.g. the use of preprocessor commands) is very different from our approach and handles page adaption. The use of knowledge components for structuring the domain is very similar to the way we model the conditional dependencies for our BN: more general concepts are splitted into refined concepts which themselves may be splitted into refined concepts, etc.
A comprehensive review of current work in using uncertainty management techniques in user modeling is given in [16]. Systems using Bayesian networks are for example [1, 6] that employ BNs for plan recognition and coached problem solving. EPI-UMOD [20] uses separate BNs for each of a number of concrete user categories in which special conditional dependencies between knowledge items for each stereotype are implemented. POKS [8] constructs a network of implication relations among knowledge units from a small sample of user data sets.
The use of BNs in our hyperbook is distinct from these approaches. We use a single, overall dependency graph for modeling the knowledge of the application domain. Clearly, this graph is not as fine grained as a graph that is suited for e.g. plan recognition as it serves different purposes. The BN used for our user model has to model dependencies among knowledge units which describe the application domain. User model and domain model are required to implement a hyperbooks. In addition, as we use a three-level model for finding dependencies among the knowledge units and a clustering mechanism for generating an acyclic graph, we implemented an exact inferring algorithm as proposed in [19].
Dynamic Bayesian networks are used in [21]. As we implement goal-driven learning, we have no time critical tasks (time-critical in the sense of Dynamic BNs). But we see a requirement in learning our BN out of data. This will be important as we use one overall Bayesian network for modeling the different users of our hyperbook that could be improved by learning strategies for BNs (see UAI).
The bayesian student modeling framework presented in this report contributes to approaches in student modeling in a number of ways. First, it identifies new requirements for learning environments that are based on active, project-based learning. Second, it proposes adaptation tasks such as adaptive information resources, an adaptive navigational structure, adaptive trail generation and adaptive project- and goal-selection. Third, it proposes an indexing strategy for WWW-pages of a hyperdocument and the construction of a user model build on these indices.