Bayesian Modeling for Adaptive Hypermedia Systems

Nicola Henze and Wolfgang Nejdl
Knowledge Based Systems Group
University of Hannover, Lange Laube 3, 30159 Hannover, Germany
Phone: +49 511 762 9711, Fax: +49 511 762 9712
e-mail: {henze | nejdl}@kbs.uni-hannover.de

Abstract: In this paper we show how Bayesian models can be used advantageously for various adaptation tasks in open WWW-based hypermedia systems. We discuss how to model user knowledge about different topics and learning dependencies between these topics, how to make inferences for calculating the system's belief on a user's knowledge based on observations from exercises/observations. We illustrate this discussion by examples from an educational hypermedia system we developed for our course "Introduction to Java Programming".
Keywords: User and discourse modeling and user adapted interaction.

Introduction

The KBS Hyperbook System aims to model, organize and maintain open hypermedia systems on the World Wide Web. Open in this context expresses that these hypermedia systems are able to integrate (possibly distributed) information resources based on World Wide Web standards using a hyperbook metaphor, displaying learning units plus their connections to other units (see for example [4,9]). In this paper we will discuss in detail, how Bayesian models can be used advantageously for various adaptation tasks in open WWW-based hypermedia systems, adapting to a user's needs, knowledge and goals.

As hyperbooks are web-applications, they are typically used in distance learning scenarios where a learner / user uses the information from the hyperbook on its own. Thus, it is really important to think about useful teaching strategies for enabling a learner / user to actively learn and not only to passively read or "consume" the information. For this purpose we emphasize constructivist learning strategies ([5,7]), for example by integrating problems or "real world tasks" (see [11,8]) in the curriculum of the hyperbooks, and by structuring the hyperbook based on problems/projects and their relationship to information units. Learners can reach learning goals or can receive answers to information requests by working on these problems, which introduce, explain and show the use of the learning items.

In section 2 we discuss the general adaptation requirements and tasks for such an adaptive, open hypermedia systems which enables active learning and the general approach for indexing learning materials within such a system. In section 3 we describe how to specify user knowledge vectors and topic dependencies and how to use a Bayesian network for inferences based on observations about user projects and exercises. In section 4 we discuss our Bayesian model in more detail, and use examples from our hyperbook "Introduction to Java Programming", which we use with undergraduate students in computer science.


Adaptation Tasks and General Framework

Adaptation Tasks

The user modeling component has to fulfill various adaptation tasks. First it has to adapt the hyperbook to a particular user by giving hints for navigation of the hyperbook, showing the educational state of links, guiding a user by showing the next steps to take depending on the chosen goal and project, etc. In the following we will describe the adaptation requirements we identified for our hyperbooks.

Suggest hyperbook units according to a user's knowledge state.

To fulfill this task the user modeling component has to infer an estimation of the user's knowledge and has to compare this estimation with the required knowledge for understanding the contents of this unit. Furthermore, it has to check whether the user might be interesting in reading this unit or not.

Support user's goals.

Different users have different learning interests. The hyperbook allows a user to define learning goals on her own or it proposes next, appropriate learning goals. After a learning goal has been chosen the hyperbook suggests relevant information, selects appropriate projects where this learning goal is illustrated by a project or example and generates trails (see requirement "generate reading sequences") of the hyperbook, which contain - in a teaching order - all information that is needed by this particular user to reach the learning goal.

Generate reading sequences.

Individual trails or guided tours through the hyperbook contain useful information for a specific goal or project, including all needed prerequisite knowledge. These trails must be ordered in a way a teacher would present this information. Furthermore it is important to show a user the generated trails as a whole to give her an impression of the trail and the units contained within this trail, and the user can also select some units besides the order if she likes to do so.

Glossary and navigation.

For browsing the hyperbook contents, it is useful to annotate the link structure of the hyperbook. This annotation gives a short content description of the hypertext pages the links leads to, and an additional graphical annotation according to the user's current knowledge (denoting difficult and suggested units for the particular user, for example). A glossary can - similar to a table of contents - give quick access to keywords and short descriptions. This glossary must be fully integrated into the hyperbook so that the user can go from the glossary directly to more detailed information.

Next best page.

In addition, browsing the hyperbook can be supported by showing which page is suggested as a next useful page to visit.

How to select projects.

As the hyperbook integrates projects in the curriculum of the hyperbook, it is important to find a selection strategy for useful projects. To determine suggested projects, we use two parameters: one for calculating how good this project fits to the users actual task and goal, and another one for taking only those projects into account which do not contain too much prerequisite knowledge that the user does not yet have. For example, if a project requires much knowledge that is new to the user but is not part of her actual learning goal this project is inappropriate and should not be selected.


Indexing Approach

The connection between the hyperbook and the user modeling component is based on indexing the information resources in the hyperbook. The index concepts are called knowledge items. Such a knowledge item (KI) denotes a knowledge concept of the application domain. These concepts could be elementary, for example the "if"- or "while"-concepts in a programming language, or compound, like "knowledge about flow control statements". All KIs are connected in a single dependency graph as described in section 3.2. The knowledge items are used for indexing the contents of information units, project units, and for describing the range of a user's goals. They are similar to the domain model concepts used in [1].

Information units do not correspond to syntactical parts of a book (such as sections or chapters), but semantical parts (such as information units about "Java objects", "iteration constructs", "parameters", etc.). At least one of the knowledge items indexing a page is the main knowledge item of this page, and for each knowledge item there is exactly one information unit, where it is a main knowledge item. This leads to a knowledge item index, which relates each knowledge item to one main information unit, and some other information unit where it occurs too, but not as main knowledge item. Therefore the set of all KIs can also be used for navigating the hyperbook contents. This navigation is comparable to glossary navigation [2]. For indexing projects contained in the hyperbook, each project is indexed by those KIs that have to be understood in order to successfully complete the project. In order to emphasize a subset of the index set of a project it is also possible to weight the KIs in the index (see [6]).

Information units do not have to reside on the same WWW server, but can be distributed (and are simply referenced by an URL). For example, the integration of pages of the Java tutorial [3] can be done by indexing the pages of the java tutorial by the appropriate KIs. Obviously, changes in units authored by different people have to be recognized and dealt with. Right now, most of our information units in our Java hyperbook reside locally on our server, though we are currently integrating more and more units from the SUN Java Tutorial and the Java API documentation.

In the following, we describe in more detail how a user model can be built which meets the above stated requirements for adaptation tasks and, on the other hand, supports a flexible and open hypermedia system.


Modeling a user's learning progress in a hyperbook

As discussed in the previous sections, the user modeling component in the KBS hyperbook system has to facilitate project-oriented learning by integrating problems with the appropriate learning units, it has to adapt the hyperbook to a user's particular needs and goals, and it has to give a user personalized access to information resources. Therefore a model of a user's knowledge of the application domain of the hyperbook has to be generated. This student model consists of

a knowledge vector:
used for modeling a user's knowledge of the topics of the application domain.
a model of the learning dependencies:
contains all (learning) dependencies between these topics.
an inferring mechanism:
for calculating the system's belief about a user's knowledge of these topics (Bayesian network).

A user's knowledge vector

The knowledge of a user is modeled in a user's knowledge vector ( KV). This vector consists of the topics (KIs) of the application domain. Every component of the vector is a conditional probability, describing the system's estimation that a user U has knowledge about topic KIi, i, $\in$ [1:n], on the base of all observations the system has about U :


\begin{displaymath}KV({\cal U}) = \left( P(KI_1 \vert E), P(KI_2 \vert E), \dots P(KI_n \vert E)
\right), \end{displaymath}

where KI1,...,KIn are the knowledge items of the application domain and E denotes the evidence the system monitors about a user's work with the hyperbook. Observations about the student's work with the hyperbook are stored in terms of KIs, too: Each observation expresses the grade of knowledge the user has on a KI. We use four grades: A student can have "expert's knowledge" on a KI, "advanced knowledge", "beginner's knowledge" or "novice's knowledge". Since we represent the user's knowledge on a KI as a probability distribution, finer grades are possible as well.


Modeling topic dependencies

The student modeling component used in the KBS hyperbook system is based on a knowledge model of the domain of a hyperbook. This model contains the knowledge items discussed in section 2.2 and adds a partial order between these KIs to represent learning dependencies, where KI1 < KI2 denotes the fact that KI1 has to be learned before KI2, because understanding KI1 is a prerequisite for understanding KI2.

For example, to understand the KI "control structures in Java" (con), it is necessary to know about the KIs "branching" (bran) and "looping" (loop), thus

loop < con     and     bran < con.

The separation of hyperbook model and knowledge model has advantages for authoring the hyperbook, as learning dependencies between knowledge items are described once in the knowledge model, and the dependencies between information units of the hyperbook can be inferred automatically from the KI-dependencies and the indexing of the information units by the KIs. This is especially important as the hyperbook allows different models of the application domain (see [7]), as several authors or teachers will structure the domain of one hyperbook in different ways. Furthermore, this distinction between the two models makes the hyperbook system robust against changes: If we add additional information units or change the content of them, we only have to (re-)index these pages accordingly (see section 2.2). No further work has to be spent on changing the user modeling component.


Using a Bayesian network for inferring the system's belief of a user's knowledge

In order to represent the partial order between the knowledge items as well as to facilitate the updating of user's knowledge depending on new information, we have chosen to implement the user model of the KBS hyperbook system as a Bayesian network[*] (BN). This BN contains the knowledge items as network nodes. The dependencies between KIs are expressed by conditional probabilities between the KIs (see section 4.3).

BNs are useful in user modeling, since they allow to describe the application domain in a single dependency graph. This graph contains all necessary prerequisites of a particular knowledge item, models dependencies among knowledge items and is able to infer, for example, that prerequisite knowledge of a KI has already been acquired by a user if the KI itself is understood by the user.

By using a BN, it is possible to use observations about the user's work with the hyperbook and hyperbook projects to update the system's estimation of the user's knowledge. For example, if the system's estimation of the user's knowledge is too pessimistic, and the user solves an advanced project which the hyperbook had thought to be too difficult for her, the system can use this observation to update its estimation, based on the successful completion of the project unit and the indexing of project units by knowledge items (representing the necessary knowledge to successfully complete this project unit). On the other hand, if we observe an advanced user failing to understand some simple concepts, then the BN can selectively change its estimation of this user with respect to these concepts, without classifying her as a complete beginner, and can suggest specific project units for learning these concepts.

Another advantage of using BNs is the handling of uncertainty in our observations. We can use every degree of information about the user's knowledge, not only failed / not failed. Currently we use a vector of four probability values (summing up to 1) describing our estimation that a user understands a specific knowledge item to the degrees excellent (expert user), some difficulties (advanced user), many difficulties (beginner), not ready (novice). This corresponds to using a random variable with four discrete values (see section 3.1).


Construction of the Bayesian network

For developing the Bayesian network for a hyperbook, we first specify the main topics of the book. The hyperbook for our course "Introduction to Java Programmming" contains the following main level topics, which we have related to the ACM Computing Classification System (1998 version)[*]:

Each of these topics is accompanied with a set of knowledge items describing the aspects of a topic more precisely. All KIs of a topic are connected in a polytree due to the learning dependencies between them. Thus leaf topics of this polytree are basic items while higher nodes of the tree are dependend on their child nodes, e.g. require knowledge about the contents of their child nodes.

Constructing a polytree based on learning dependencies

After having found the main topics of the hyperbook we categorized them based on the following aspects:

simple topics:
Topics, that need no further prerequisites to learn. Thus a novice user can start reading the hyperbook with each of these topics.
advanced topics:
A topic is advanced, if it requires knowledge about some of the simple topics.
compound topics:
To read about these topics a user needs knowledge about some simple and advanced topics.
optional topics:
These are topics that might be left out by some users. For example, the introduction to software engineering can be skipped by users of the "Introduction to Java Programmming" hyperbook.

Figure 1: Schematic model of a Bayesian Network underlying the user model
\resizebox {8cm}{!}{\includegraphics{schematicmBNMitKreisen.eps}}

This categorization is done in order to improve performance of the BN and to obtain a polytree for whom we can use an exact inference algorithm.

According to the categorization, we stratify the KIs into levels as shown in figure 1, where the nodes in each level have a dependency structure expressed by a tree. We currently use three levels.

The first level contains the simple topics, which need no prerequisite knowledge. The second level contains advanced topics and the third level is the level of the compound topics which can only be understood by knowing some first- or second-level topics. The second level is further divided into two parts: one that is necessary for the understanding of some of the third-level concepts (level 2 in figure 1) and another part that is not required by any third-level concept (level 2' in figure 1). Optional topics can only occur in level 2' and level 3. These topics are not contained in a user's knowledge vector as the user can also read the hyperbook by skipping these topics.

Figure 2: Stratified model of the Bayesian Network
\resizebox {8cm}{!}{\includegraphics{schematicmodelofBN.eps}}

We developed a special clustering formalism (see section 4.2) for this stratified modeling approach which enables us to generate a directed acyclic graph (DAG, see figure 2) out of the dependency graph. The algorithm we use for BN inference in this generated graph is an exact inferring algorithm for directed acyclic graphs [10].


Generating a directed, acyclic graph (DAG)

For generating the final graph for the bayesian network, we introduce additional cluster nodes. These cluster nodes only serve as distributors: Each child of this cluster node listens to exactly those parents of the cluster node, which are the parents of the child in the original graph. For example, in figure 3, C1 listens to the parent nodes P1, P2 and P3.

To distribute the information from parent nodes to a child node, we have to determine which values of a parent node belong to which values of a child node, and which value of parent nodes should be carried through the child node if the parent nodes this child listens to have different values. In our current approach, each node in the dependency graph is a random variable with four discrete values (see section 3.3) describing the grade of knowledge a user has on the topic of the corresponding node. We therefore distribute the "best" grade of knowledge from the parent nodes to their child node: For example, if we observe a user knowing P1 with grade "expert" and P3 with grade "novice" (figure 3), we then distribute the information "expert" to the child node C1 .

Figure 3: The role of cluster nodes in the stratified graph
\resizebox {11cm}{!}{\includegraphics{clusternode.eps}}

The algorithm to compute the conditional probability table (CPT) assigned to the cluster node is the following: Let $P_1, \dots , P_m$ be the parent nodes and $C_1, \dots ,C_n$ the child nodes. The conditional probability is a $\prod_{l=1}^m R(P_l) \times \sum_{i=1}^n
R(C_i)$ matrix, where $R(X)$ gives the number of values of a discrete random variable $X$. In this CPT, for all children $C_i$, $ i
\in [1:n]$, of the cluster node,


\begin{displaymath}P\left( C_i = V_{C_i} \left\vert P_1 = V_{P_1}, \dots , P_m =...
...eft(0, \dots , 0 , \frac{1}{n},0,
\dots , 0 \right) , \right. \end{displaymath}

where $V_X$ is a value of a random variable X, and exactly that value of the random variable $C_i$ is not equal to 0, which corresponds to the best grade of knowledge a user has on all parent nodes of $C_i$.

Defining the probability tables for the network nodes

Figure 4: Conditional probability table for the node "quick-sort", which is dependent on the node "sorting"
\resizebox {12cm}{!}{\includegraphics{probabilitytable.eps}}

After generating the DAG, we have to add for each node of the graph a probability table containing the conditional probabilities a child node has with respect to his parent node.

Figure 5: "weak" conditional probability table for the node "GridBag layout-manager", which is dependent on the node "layout-manager"
\resizebox {12.5cm}{!}{\includegraphics{weakprobabilitytable.eps}}

Currently we use a simple mechanism for assigning probability tables to a node. We provide some "blueprint" tables that express how strong a dependency between a parent node and a child node is. We assume for example, that a user familiar with "sorting algorithms" will know the "quick-sort algorithm". Thus the conditional probability table of the node "quick-sort" reflects this kind of dependency by assuming an expert (advanced, beginner, novice) in the parent node "sorting" to be - with high probability - an expert (advanced, beginner, novice respectively) in the child node "quick-sort". To find such a conditional probability table, we investigated several distributions. Our result can be seen in figure 4. In a similar way we found conditional probability tables for weaker dependencies, for example between the child node "GridBag layout-manager" (the most complicated layout-manager in Java 1.1x) and the parent node "layout-manager" (see figure 5) and for root nodes (see figure 6).

Figure 6: Conditional probability table for root nodes
\resizebox {12cm}{!}{\includegraphics{roottable.eps}}

The probability table assigned to a child node of a cluster node consists of two parts: a "blueprint" table for those values of the random variable "cluster node" which correspond to this particular child node, and an equal distribution for all other values, see figure 7. (Remark: the random variable of the cluster node has as many values as the sum of all values of the children of the cluster.).

Figure 7: Conditional probability table for child nodes of a cluster node
\resizebox {12cm}{!}{\includegraphics{childcluster.eps}}

A part of the BN of the hyperbook "Introduction to Java Programming" can be seen in figure 8. This BN contains about 270 nodes. The figure shows the system's estimation about the knowledge of a new user who has no a priori knowledge about the topics of Java. Probabilities assigned to the topics / nodes "algorithms", "searching", etc. can also be seen in this snapshot.

Figure 8: Part of the Bayesian Network for the hyperbook "Introduction to Java Programming"
\resizebox {12cm}{!}{\includegraphics{BNausschnitt3.ps}}

A typical page of a hyperbook is shown in figure 9. Links and information units are separated in our KBS hyperbook. All relations of an information unit are displayed on the left hand side, while the information itself is displayed on the right. The list of items on the upper left side is the choice list which allows a user to define a learning goal. The user has access to several areas grouping the hyperbook contents, to projects, and to several information units of the hyperbook. The system can also suggest a next best learning goal. The user can also read the whole hyperbook in a sequential order by clicking on the "dog's ear" on the bottom of the right side of the hyperbook.

Figure 9: Screenshot of the hyperbook "Introduction to Java Programming
\resizebox {12cm}{!}{\includegraphics{Einstiegsseite.ps}} "

Conclusion and Further Work

We have proposed a student modeling component for an open, adaptive hypermedia system. This student model enables several adaption features like proposing suitable information units, generating reading sequences, annotating the navigational structure and showing which learning steps to take next. By enabling these features, often based on the dependencies between project and information units, it supports project-oriented learning, adapted to the user's learning progress. We discussed a user modeling component for observing a user's learning progress (knowledge vector, section 3.1), for modeling learning dependencies (section 3.2) and showed how to build a bayesian model for making inferences about a user's learning (sections 3.3 and 4).

Further work will concentrate on an evaluation of our current prototype, and the integration of more external information units into our hyperbook (e.g. from the SUN Java tutorial), which will then be fully integrated into our adaptive hypermedia system.

Bibliography

1
P. Brusilovsky and E. Schwarz.
User as Student: Towards an Adaptive Interface for Advanced Web-Based Applications.
In Proceedings of the Sixth International Conference on User Modeling, UM97, Sardinia, Italy, 1997.

2
P. Brusilovsky, E. Schwarz, and G. Weber.
A Tool for Developing Adaptive Electronic Textbooks on WWW.
In Proceedings of WebNet'96 - World Conference of the Web Society, Boston, MA, USA, June 1996.

3
Mary Campione and Kathy Walrath.
The Java Tutorial, Second Ediion: Object-Oriented Programming for the Internet.
Addision-Wesley, 1998.

4
Peter Fröhlich and Wolfgang Nejdl.
Webrc: Configuration management for a cooperation tool.
In Proceedings of the Seventh Workshop on Software Configuration Management (SCM'7) (Springer LNCS 1235), Boston, 1997.

5
Nicola Henze and Wolfgang Nejdl.
Constructivism in computer science education: Evaluating a teleteaching environment for project oriented learning.
In Workshop on Interactive Computer Aided Learning - Concepts and Applications, Villach, Österreich, October 1998.

6
Nicola Henze and Wolfgang Nejdl.
Adaptivity in the KBS Hyperbook System.
In 2nd Workshop on Adaptive Systems and User Modeling on the WWW, Toronto, Canada, May 1999.

7
Nicola Henze, Wolfgang Nejdl, and Martin Wolpers.
Modeling Constructivist Teaching Functionality and Structure in the KBS Hyperbook System.
In AIED99 Workshop on Ontologies for Intelligent Educational Systems, Le Mans, France, July 1999.

8
Peter C. Honebein, Thomas M. Duffy, and Barry J. Fishman.
Constructivism and the design of learning enivronments: Context and authentic activities for learning.
In NATO Advanced Workshop on the Design of Constructivist Learning Environments, 1991.

9
Wolfgang Nejdl and Martin Wolpers.
KBS Hyperbook - a data-driven information system on the web.
In WWW 8, May 1999.
http://www.kbs.uni-hannover.de/paper/99/www8.

10
Stuart Russell and Peter Norvig.
Artificial Intelligence: A Modern Approach.
Prentice Hall, 1995.

11
Ernst von Glasersfeld.
A constructivist approach to teaching.
In Leslie P. Steffe and Jerry Gale, editors, Constructivism in Education. Lawrence Erlbaum Associates, 1995.



Footnotes

... network[*]
A BN is defined by as set of random variables $X = \{X_1, \dots X_n\}$ with associated probabilities and a labeled directed acyclic graph $G$ encoding conditional probabilities between these random variables. The vertices of $G$ correspond to the random variables $X_1, \dots X_n$ and the edges represent conditional dependencies between them. A conditional dependency links a child variable to a set of parent variables and is defined by the conditional distributions of the child variable given the configuration of its parent variables.
... version)[*]
http://www.acm.org/class/1998/ccs98.html
... recovery)[*]
would better be classified in D.3.3
... subroutines)[*]
thematically related to Methods, Classes and objects, and to Data types and operators


Nicola Henze
1999-07-06