Design

This section introduces the design of the solution and an online tool for acquiring, managing, and sharing the content of our site. Specifically we will do the following:

Provide a full list of the features we want to implement
Design the database tables for this module
Create a list and a description of the stored procedures needed to provide access to the database
Design the object models of the data and business layers
Describe the user interface services needed for content management, such as the site pages and reusable user controls
Explain how we will ensure security for the administration section and for other accessrestricted pages

Features to Implement

Let's start our discussion by writing down a partial list of the features that the article manager module should provide in order to be flexible and powerful, but still easy to use:

An article can be added to the database at any time, with an option to delay publication until a specified release date. Additionally, the person submitting the article must be able to specify an expiration date, after which the article will be automatically retired. If these dates are not specified, then the article should be immediately published and remain active indefinitely.
Articles can have an approved status. If an administrator or editor submits the article, it should be approved immediately. If you allow other people, such as staff or users of the site (we will call them contributors), to post their own news and articles, then this content should be added to the database in a "pending" state. The site administrators or editors will then be able to control this content, apply any required modifications, and finally approve the articles for publishing once they are ready.
The system must also track who originally submitted an article or news item. This is important because it provides information regarding whether a contributor is active, who is responsible for incorrect content, who to contact for further details if the article is particularly interesting, and so on.
The administrator/editor must be able to decide whether an article can be read by all readers or only by registered users.
There can be multiple categories, enabling articles to be organized in different virtual folders. Each category should have a description and an image that graphically represents it.
There should be a page with the available categories as a menu. Each category should be linked to a page that shows a short abstract for each published article. By clicking on the article's title the user can read the whole text.
Articles can be targeted to users from a specified location, e.g., country, state/province, or city. Consider the case where you might have stories about concerts, parties, and special events that will happen in a particular location. In Chapter 4, you implemented a registration and profiling system that includes the user's address. That will be used here to highlight events that are going to happen close to the user's location. This is a feature that can entice readers to provide that personal information, which you could use later for marketing purposes (ads can be geographically targeted also).
Users can leave comments or ask questions about articles, and this feedback should be published at the end of the article itself, so that other readers can read it and create discussions around it (this greatly helps to increase traffic). You might recognize this approach as being common with blogs, which are web logs in which an individual publishes personal thoughts and opinions and other people add comments. As another form of feedback, users can rate articles to express how much they liked them.
The module must count how many times an article is read. This information will also be shown to the reader, together with the abstract, the author name, the publication date, and other information. But it will be most important for the editors/administrators because it greatly helps them understand which topics the readers find most interesting, enabling administrators to direct energy, money, and time to adding new content on those topics.
The new content must be available as an RSS feed to which a reader can subscribe, and read through his or her favorite RSS aggregator.
Above all, the article manager and the viewer must be integrated with the existing site. In our case this means that the pages must tie in with the current layout, and that we must take advantage of the current authentication/authorization system to protect each section and to identify the author of the submitted content.

It's essential to have this list of features when designing the database tables, as we now know what information we need to store, and the information that we should retrieve from existing tables and modules (such as the user account data).

Designing the Database Tables

As described in Chapter 3 (where we looked at building the foundations for our site), we're going to use the tbh_ prefix for all our tables, so that we avoid the risk of naming a table such that it clashes with another table used by another part of the site (this may well be the case when you have multiple applications on the site that store their data on the same shared DB). We need three tables for this module: one for the categories, another one for the articles, and the last one for the user feedback. The diagram shown in Figure 5-1 illustrates how they are linked to each other.

Figure 5-1

Let's start by looking at these tables and their relationship in more detail.

The tbh_Categories Table

Unsurprisingly, the tbh_Categories table stores some information about the article categories.

Column Name	Type	Size	Allow Null	Description
CategoryID	int - PK	4	No	Unique ID for the category
AddedDate	datetime	8	No	Category creation date/time
AddedBy	nvarchar	256	No	Name of the user who created the category
Title	nvarchar	256	No	Category's title
Importance	int	4	No	Category's importance. Used to sort the categories with a custom order, other than by name or by date.
Description	nvarchar	4000	Yes	Category's description
ImageUrl	nvarchar	256	Yes	URL of an image that represents the category graphically

This system supports a single-level category, meaning that we cannot have subcategories. This is plenty for small to mid-size sites that don't have huge numbers of new articles on a wide variety of topics. Having too many categories in sites of this size can even hinder the user's experience, because it makes it more difficult to locate desired content. Enhancing the system to support subcategories is left as an exercise if you really need it, but as a suggestion, the DB would only require an additional ParentCategoryID column containing the ID of the parent category.

AddedDate and AddedBy are two columns that you will find in all our tables — they record when a category/article/comment/product/message/newsletter was created, and by whom, to provide an audit trail. You may have thought that instead of having an nvarchar column for storing the username we could use an integer column that would contain a foreign key pointing to records of the aspnet_Users table introduced in Chapter 4. However, that would be a bad choice for a couple of reasons:

The membership data may be stored in a separate database, and possibly on a different server.
The membership module might use a provider other than the default one that targets SQL Server. In some cases the user account data will be stored in Active Directory or maybe an Oracle database, and thus there would be no SQL Server table to link to.

The tbh_Articles Table

This table contains the content and all further information for all the articles in all categories. It is structured as follows.

Column Name	Type	Size	Allow Null	Description
ArticleID	int - PK	4	No	Unique ID for the article
AddedDate	Datetime	8	No	Date/time the article was added
AddedBy	Nvarchar	256	No	Name of the user who created the article
CategoryID belongs	int — FK	4	No	ID of the category to which the news item
Title	Nvarchar	256	No	Article's title
Abstract	Nvarchar	4000	Yes	Article's abstract (short summary) to be shown in the page that lists the article, and in the RSS feed
Body	Ntext		No	Article's content (full version)
Country	Nvarchar	256	Yes	Country to which the article (concert/event) refers
State	nvarchar	256	Yes	State/province to which the article refers
City	nvarchar	256	Yes	City to which the article refers
ReleaseDate	datetime	8	Yes	Date/time the article will be publicly readable
ExpireDate	datetime	8	Yes	Date/time the article will be retired and no longer readable by the public
Approved	bit	1	No	Approved status of the article. If false, an administrator/editor has to approve the article before it is actually published and available to readers.
Listed	bit	1	No	Whether the article is listed in the articles page (indexed). If false, the article will not be listed, but will be still accessible if the user types the right URL, or if there is a direct link to it.
Comments Enabled	bit	1	No	Whether the user can leave public comments on the article
OnlyFor Members	bit	1	No	Whether the article is available to registered and authenticated users only, or to everyone
ViewCount	int	4	No	Number of times the article has been viewed
Votes	int	4	No	Number of votes the article has received
TotalRating	int	4	No	Total rating score the article has received. This is the sum of all the ratings posted by users.

The ReleaseDate and ExpireDate columns are useful because the site's staff can prepare content in advance and postpone its publication, and then let the site update itself at the specified date/time. In addition to the obvious benefit of spreading out the workload, this is also great during vacation periods, when the staff would not be in the office to write new articles but you still want the site to publish fresh content regularly.

The Listed column is also very important, because it enables you to add articles that will be hidden from the main article list page, and from the RSS feeds. Why would you want to do this? Suppose that you have a category called Photo Galleries (we'll actually create it later in the chapter) in which you publish the photos of a past event or meeting. In such photo gallery articles you would insert thumbnails of the photos with links to their full-size version. It would be nice if the reader could comment and rate each and every photo, not just the article listing them all, right? You can do that if instead of linking the big photo directly you link a secondary article that includes the photo. However, if you have many photos, and thus many short articles that contain each of them, you certainly don't want to fill the category's article listing with a myriad of links to the single photos. Instead, you will want to list only the parent gallery. To do this, you set the Listed property of all the photo articles to false, and leave it true only on the article with the thumbnails.

The Country, State, and City fields enable you to specify an accurate location for those articles that refer to an event (such as parties, concerts, beer contests, etc.). You may recall that we created the same properties in Chapter 2 for the user's profile. If the location for the article matches a specific user's location, even partially, then you could highlight the article with a particular color when it's listed on the web page. You may be wondering why it was necessary to define the Country and State fields as varchar fields, instead of an int foreign key pointing to corresponding records of the tbh_Countries and tbh_States lookup tables. The answer is that I want to use the City field to support not only U.S. states, but states and provinces for any other country, so I defined this as free text field. It's also good for performance if we de-normalize these fields. Using a lookup table is particularly useful when there is the possibility that some values may change; storing the information in one location minimizes the effort to update the data and makes it easier to ensure that we don't get out-of-sync. However, the list of countries will not realistically change, so this isn't much of a problem. In the remote case that this might happen, you will simply execute a manual update for all those records that have Country="USA" instead of "United States", for example. This design decision can greatly improve the performance of the application.

You may be wondering why I decided to put the Votes and TotalRating columns into this table, instead of using a separate table to store all the single votes for all articles. That alternative has its advantages, surely: You could track the name and IP address of the user who submits the vote, and produce interesting statistics such as the number of votes for every level of rating (from one to 5 stars). However, retrieving the total number of votes, the total rating, and the number of votes for each rating level would require several SUM operations, in addition to the SELECT to the tbh_Articles table. I don't think the additional features are worth the additional processing time and traffic over the network, and thus I opted for this much lighter solution instead.

The tbh_Comments Table

This table contains the feedback (comments, questions, answers, etc.) for the published articles. The structure is very simple.

Column Name	Type	Size	Allow Null	Description
CommentID	int - PK	4	No	Unique ID for the comment
AddedDate	datetime	8	No	Date/time the comment was added
AddedBy	nvarchar	256	No	Name of the user who wrote the comment
AddedByEmail	nvarchar	256	No	User's e-mail address
AddedByIP	nchar	15	No	User's IP address
ArticleID	int	4	No	Article to which this comment refers
Body	ntext		No	Text of the comment

We will track the name of the user posting the comment, but she could even be an anonymous user, so this value will not necessarily be one of the registered usernames. We also store the user's e-mail address, so that the reader can be contacted with a private answer to her questions. Storing the IP address might be legally necessary in some cases, especially when you allow anonymous users to post content on a public site. In case of offensive or illegal content, it may be possible to geographically locate the user if you know her IP address and the time when the content was posted. In simpler cases, you may just block posts from that IP (not a useful option if it were a dynamically assigned IP, though).

Stored Procedures That Manage the Database

To manage the database, you will build a set of stored procedures that you'll use later in the data layer classes to do everything from the addition of categories, articles, and comments to the updating of single fields (such as the Approved status, or the ViewCount number). Procedures related to article management have the tbh_Articles_ prefix, so that their name won't clash with procedures of other modules of this site.

Important

A common practice (and error) is to prefix stored procedure names with "sp". You should not do that, however, because "sp" is the prefix for system procedures, and if you use it for your custom user procedures, then SQL Server will first try to find them among the system procedures, it will fail, and then it will fall back and find it among custom procedures. This will slow down performance a bit.

The stored procedures you need, and their parameters, are listed in the following table (you'll be writing the code later in the chapter in the "Solution" section):

Procedure	Description
tbh_Articles_ApproveArticle	Sets the Approved field of the specified article to true
tbh_Articles_DeleteArticle	Deletes the article identified by the specified ID
tbh_Articles_DeleteCategory	Deletes the category identified by the specified ID
tbh_Articles_DeleteComment	Deletes the comment identified by the specified ID
tbh_Articles_GetArticleByID	Retrieves all details (the complete row) of the article identified by the specified ID
tbh_Articles_GetArticleCount	Returns the total number of articles in any categories
tbh_Articles_GetArticleCountByCategory	Returns the total number of articles in the specified category
tbh_Articles_GetArticles	Retrieves a partial list of articles located in any category. The list is partial because there are two parameters: to specify the index of the page of articles to retrieve, and the number of articles per page. This is used to implement a custom pagination system (because we can't fit all the articles on one page).
tbh_Articles_GetArticlesByCategory	Returns all details for a specified category
tbh_Articles_GetCategories	Returns all details about all categories
tbh_Articles_GetCategoryByID	Retrieves all details of the category identified by the specified ID
tbh_Articles_GetCommentByID	Retrieves all details of the comment identified by the specified ID
tbh_Articles_GetCommentCount	Returns the total number of comments for any article
tbh_Articles_GetCommentCountByArticle	Returns the total number of comments for the specified article
tbh_Articles_GetComments	Retrieves a partial list of comments for any article. The list is partial because there are two parameters: to specify the index of the page of comments to retrieve, and the number of comments per page. This is used to implement a custom pagination system.
tbh_Articles_GetCommentsByArticle	Retrieves comments for the specified article
tbh_Articles_GetPublishedArticleCount	Returns the total number of published articles in any category. A published article is an article that is approved, listed, and whose ReleaseDate-ExpireDate interval includes the specified current date.
tbh_Articles_GetPublishedArticleCountByCategory	Returns the total number of published articles for the specified category
tbh_Articles_GetPublishedArticles	Retrieves a partial list of published articles located in any category. Similar to tbh_GetArticles, but gets only published articles, i.e., articles that are approved, listed, and are not expired (whose ReleaseDate-ExpireDate interval includes the specified current date).
tbh_Articles_GetPublishedArticlesByCategory	Retrieves a partial list of published articles located in the specified category, and only the ones that did not expire
tbh_Articles_IncrementViewCount	Increments the ViewCount field of the specified article by one
tbh_Articles_InsertArticle	Creates a new article record, and returns its ID as an output parameter
tbh_Articles_InsertCategory	Creates a new category record, and returns its ID as an output parameter
tbh_Articles_InsertComment	Creates a new comment record, and returns its ID as an output parameter
tbh_Articles_InsertVote	Increases for the specified article the Votes field by one, and the TotalRating field by the specified value
tbh_Articles_UpdateArticle	Updates some fields of the specified article
tbh_Articles_UpdateCategory	Updates some fields of the specified category
tbh_Articles_UpdateComment	Updates some fields of the specified comment

Many of these stored procedures are pretty standard — procedures to insert, update, return, and delete rows. However, it's worth noting some design decisions that could have an impact on the performance of the site:

All stored procedures that retrieve the list of articles and comments accept two input parameters: one that indicate the index of the page of records to retrieve, and one that indicates how many records there are per page. This is done to support a custom pagination mechanism in the administration and user pages where you will show the list of items. Potentially, there can be thousands of articles and comments, so implementing pagination is necessary both for performance and aesthetic reasons. On those occasions when you really want to retrieve all items (for example, when you want to show all comments for a specific article, below the article itself), you can just pass zero as the page index, and a large integer value as the page size.
The tbh_Articles_GetArticleByID procedure returns all the details (fields) of the specified article. This includes the whole Body text, so the procedure is used when you need to display the entire article's content in its own dynamically filled page. If you only want summary data for use on an administration page, you can use the tbh_Articles_GetArticles, tbh_Articles_GetArticlesByCategory, tbh_Articles_GetPublishedArticles, and tbh_Articles_GetPublishedArticlesByCategory procedures. You should not return the Body field if it's not needed in order to keep performance high and network traffic low.
The procedures that retrieve articles will be joined with the tbh_Categories table to retrieve the parent category's title, in addition to the ID stored in the tbh_Articles table. Similarly, the procedures that retrieve comments will be joined with the tbh_Comments table to retrieve the parent article's title. Returning this information together with the other data avoids running many separate queries when you have to list articles or comments on the page (an additional query would be needed for each article/comment otherwise).
The tbh_Articles_UpdateXXX stored procedures do not update all the fields. Fields such as AddedDate, AddedBy, AddedByEmail, and AddedByIP cannot be changed, as they are there to track who originally created the record and when it was created. If it were possible to change that historical data, the tracking would be useless. Other fields such as ViewCount, Votes and TotalRating are also not directly updateable, as there are specific stored procedures to update them in the proper way.

Designing the Configuration Module

Chapter 3 introduced a custom configuration section named <theBeerHouse> that you must define in the root folder's web.config file, to specify some settings required in order for the site's modules to work. In that chapter we also developed a configuration class that would handle the <contact> subelement of <theBeerHouse>, with settings for the Contact form in the Contact.aspx page. For the articles module of this chapter you'll need some new settings that will be grouped into a new configuration sub-element under <theBeerHouse>, called <articles>. This will be read by a class called ArticlesElement that will inherit from System.Configuration.ConfigurationElement, and that will have the public properties shown in the following table.

Property	Description
ProviderType	Full name (namespace plus class name) of the concrete provider class that implements the data access code for a specific data store
ConnectionStringName	Name of the entry in web.config's new <connectionStrings> section that contains the connection string to the module's database
PageSize	Default number of articles listed per page. The user will be able to change the page size from the user interface.
RssItems	Number of items returned by the module's RSS feeds
EnableCaching	Boolean value indicating whether the caching of data is enabled
CacheDuration	Number of seconds for which the data is cached

The settings in the web.config file will have the same name, but will follow the camelCase naming convention; therefore, you will use providerType, connectionStringName, pageSize, and so on, as shown in the following example:

<theBeerHouse>
      <contactForm mailTo="webmaster@effectivedotnet.com"/>
      <articles providerType="MB.TheBeerHouse.DAL.SqlClient.SqlArticlesProvider"
         connectionStringName="LocalSqlServer" pageSize="10" />
</theBeerHouse>

An instance of ArticlesElement is returned by the Articles property of the TheBeerHouseSection class, described in Chapter 3, that represents the <theBeerHouse> parent section. This class will also have a couple of other new properties, DefaultConnectionStringName and DefaultCacheDuration, to provide default values for the module-specific ConnectionStringName and CacheDuration settings. These settings will be available for each module, but you want to be able to set them differently for each module. For example, you may want to use one database for storing articles data, and a second database to store forums data, so that you can easily back them up independently and with a different frequency according to how critical the data is and how often it changes. The same goes for the cache duration. However, in case you want to assign the same settings to all modules (which is probably what you will do for small to mid-size sites), you can just assign the default values at the root section level, instead of copying and pasting them for every configuration sub-element.

In addition to the properties listed above, the ArticlesElement will have another property, ConnectionString. This is a calculated property, though, not one that is read from web.config. It uses the <articles>'s connectionStringName or the <theBeerHouse>'s defaultConnectionStringName and then looks up the corresponding connection string in the web.config file's <connectionStrings> section, so the caller will get the final connection string and not just the name of its entry.

Designing the Data Access Layer

Now that you have a clear picture in mind of the database tables and how to retrieve data through the stored procedures, you can now design the data services. As explained in Chapter 3, we'll be using a simplified version of the provider model design pattern for each module's data access layer (DAL). In practice, you will have an abstract class that implements some of the utility functions, and then a list of abstract methods having a signature, but no implementation. Then, you will have one or more provider classes that inherit from the abstract class and provide the concrete implementation of its abstract methods. The base class has also an Instance static property that uses reflection to create, and return, an instance of the provider class whose name is indicated in the web.config file. There is also a DataAccess class implemented in Chapter 3, which is the root of the inheritance tree and provides properties and methods useful to all data access classes, and a series of entity classes that encapsulate the data retrieved from the database. Generally, there is a one-to-one correspondence between the database tables and the DAL's entity classes. Figure 5-2 provides a graphical representation of these classes, with their inheritance relationships.

Figure 5-2

Chapter 3 showed how to implement the provider for SQL Server (having the name MB.TheBeer House.DAL.SqlClient.SqlArtcilesProvider), which wraps the calls to the stored procedures listed above. Some methods have a number of overloaded versions: for example, the GetArticles method has a version that takes the parent category's ID, the page index, and the page size, and another method that takes only the page index and size, and returns articles from any category. It's the same for GetPublishedArticles, and it's similar for GetArticleCount and GetPublishedArticleCount (two versions for each: one with the category ID and the other without it).

You won't use these classes directly from the presentation layer. Instead, we'll build a business layer that exposes a better object-oriented representation of the data, and the relationships between different entities (categories and articles, articles and comments). As I mentioned in Chapter 3, having a complete three-tier design improves maintainability and readability of code. These classes will be located under the App_Code folder of the main web project, and will be compiled together for use in the rest of the site. This is done for simplicity — not only does it enable us to avoid separately compiling and deploying multiple projects, but it also lets us take advantage of the new "edit and continue" compilation model that enables us to edit a class while the application is running, and have our changes be automatically compiled and utilized simply by refreshing the page in the browser! Alternatively, you could instead put it in a separate assembly if you prefer to have a physical separation between the layers to mirror the logical separation.

Designing the Business Layer

The data layer has a provider class with methods that call the stored procedures, but it doesn't represent an entity (or domain object) in a real object-oriented way. The entities are represented by the classes in the business layer, which use the data layer classes to access the database, representing the relationships between data elements. All the business classes indirectly descend from the BizObject class designed and implemented in Chapter 3, which provides information such as the name and IP of the current user and a reference to the current HttpContext's cache; it also exposes utility functions to encode HTML text, calculate the index of the page for a record with the given index, and clear from the cache items that start with a given prefix. The business classes of the articles module directly descend from a BaseArticle class, which has some instance properties that wrap the common DB fields that all entities (category, article, and comment) have: ID, AddedDate, and AddedBy. This class also has a method for caching data: It is put here instead of in the BizObject class because it actually caches the data only if the articles module is configured to do so by a custom setting in the web.config file. Each module (articles, forums, newsletters, etc.) will have a separate configuration, and thus a separate CacheData method. The configuration is read by the ArticlesElement class described earlier and is used as the return type for the BaseArticle's static and protected Settings property. Figure 5-3 is the UML diagram that describes the classes and their relationships.

Figure 5-3

The Article, Category, and Comment classes have a series of instance public properties that fully describe a single element. They also have some instance methods, such as Delete and Update, which work with the instance properties representing the current object. Additionally, there are a number of static methods to retrieve a list of instances, create new records, update or delete existing records, etc. Since these are static methods, they cannot access any instance properties, and therefore all their data must be passed to them as parameters. One important aspect of the Getxxx methods is that they use List<T> as the return type, where T is article, category, or comment. This list type belongs to the System.Collections.Generic.List class, which is a new generic collection provided in C# 2.0. This provides a strongly typed and specialized version of a collection class. Therefore, when you declare

List<Article> articles = new List<Article>();

you're creating a collection that can only return and accept objects of the Article type — no casting, boxing, and unboxing are required because the internal type is being set as Article by this code, which instantiates the collection. The built-in collections in previous versions of C# had to store everything as type System.Object, which required casting because they didn't hold objects of a known type. Generics are one of the best new features in version 2.0 and I strongly recommend that you study this subject by reading articles on the MSDN web site or Wrox's Professional C# 2005 (ISBN 0-7645-7534-1). The structure of the three main classes — Article, Category, and Comment — is very similar. Therefore, I'll only present the Article class in detail here, and just highlight the unique aspects of the others.

The Article Class

Some of the public properties of the Article class are just wrappers for the underlying DB fields, but others return calculated values. The following table lists all the properties.

Properties	Description
ID	Article's ID
AddedDate	Date/time the article was created
AddedBy	Name of the author
CategoryID	Parent category ID
CategoryTitle	Parent category title
Category	Reference to the article's parent Category object
Title	Title
Abstract	Abstract (short description)
Body	Body
Country	Country where the event described in the article will take place
State	State, region, or province where the event will take place
City	City where the event will take place
Location	Calculated read-only property that returns the full location of the event in the form: country, state, city
ReleaseDate	Date/time the article will be publicly readable by users
ExpireDate	Date/time the article will be retired and no longer readable by users
Approved	Whether the article is approved, or is waiting for approval
Listed	Whether the article is listed in the pages that list public articles
CommentsEnabled	Whether users can comment on this article
OnlyForMembers	Whether the article can only be read by registered and authenticated users, or by everyone
ViewCount	Number of times the article has been read
Votes	Number of votes received by the article
TotalRating	Total rating received by the article, i.e., the sum of all votes
AverageRating	Average rating (as a double value), calculated as TotalRating/Votes
Published	Calculated value indicating whether the article is published (meaning the article is approved, and the current date is between the ReleaseDate and the ExpireDate)
Comments	List of comments submitted by users

The methods listed in the following table are instance methods, and use the object's instance property values to update, delete, or approve an article, and other operations with it.

Instance Method	Description
Update	Updates the current article
Delete	Deletes the current article
Approve	Approves the current article
IncrementViewCount	Increment the ViewCount of the current article by one
Rate	Rate the current article; the rating value is passed as a parameter

Besides these instance members, there are several static methods that allow the caller to retrieve a list of articles or a single article, create a new article, delete, approve, update, or rate an existing article, and more. As they are static, they don't use any instance properties, and they get all the data they need as input parameters.

Static Method	Description
GetArticles	Returns a list of Article instances, and has eight overloads to wrap all stored procedures that retrieve the list of articles described above (to retrieve all articles, only published articles, articles from a specific category, etc.) Note: the Article instances returned by these methods do not have the Body property filled with the real value, because it was not retrieved by the stored procedures called by the DAL class. As soon as the Body parameter is read, the missing value for the specific Article is retrieved and stored into the instance.
GetArticleCount	There are four overloads of this method that return the number of articles given no constraints (all articles), the parent category, the published status (but not the category), or the parent category plus the published status
GetArticleByID	Returns an Article instance that fully describes the article identified by the input ID
InsertArticle	Takes all the data for creating a new article, and returns its ID
UpdateArticle	Updates data for an existing article, and returns a Boolean value indicating whether the operation was successful
DeleteArticle	Deletes the article identified by an ID and returns a Boolean value indicating whether the operation was successful
ApproveArticle	Approves the article identified by an ID
IncrementArticleViewCount	Increments the view count of the article identified by an ID
RateArticle	Rates the article identified by the ID, with a value from 1 to 5
GetArticleFromArticleDetails	Private method that takes an ArticleDetails object (from the DAL) and returns a new Article instance
GetArticleListFromArticleDetailsList	Private method that takes a list of ArticleDetails objects and returns a list of Article instances

This class uses the lazy load pattern, which means data is loaded only when requested and not when the object instance is created. There are quite a few variations on this pattern, and you should refer to a patterns book for complete coverage of the subject. For the Article class, you don't need the list of comments or the article's body unless you're inside the specific article page that shows all details of the article. In the page that lists the available articles, you don't need those details, so we won't waste time retrieving them from the DB. The Article class has the CategoryID and CategoryTitle instance properties, which are often all you need to know about the parent category. However, it also has a Category property that returns a full Category object with all the parent category's details. That object isn't retrieved when the Article object is created, but rather when that property is actually read. Also, once requested, we'll fetch the data and store it locally in case it is requested again from the same object instance. The implementation presented later in this chapter is very simple, but it can dramatically improve the performance of the application.

The GetArticles overloads that take parameters to specify the page of results do not expect the index of the page to retrieve. Rather, they take the number of the first row to retrieve, and the page size. I don't care for this personally, but it's a requirement of the ObjectDataSource (which will be used later in the user interface) to work with pagination. Because the DAL's Select methods expect the page index instead of the number of the first row to retrieve, we'll have to calculate the page index.

The Category Class

This class has instance properties that fully describe a category of articles, instance methods to delete and update an existing category, and static methods to create, update, or delete one category. I won't describe all of them here as they are pretty similar to the corresponding methods of the Article class. They're actually a little simpler because you don't need multiple overloads to support pagination and other filters. There are two properties, Articles and PublishedArticles, that use a couple of overloads of the Article.GetArticles static methods to return a list of Article objects. Like the Article.Comments property, these two properties also use the lazy load pattern, so articles are retrieved only once when the property is read for the first time.

The Comment Class

The Comment class has various overloads for the GetComments static method that, like Article.Get Articles, can take in input different parameters for the parent article ID and the pagination support. In addition to the data returned by the DAL it also exposes an Article property that uses the lazy load pattern to load all details of a comment's parent article as needed. Another property it exposes is EncodedBody, which returns the same text returned by the Body property, but first performs HTML encoding on it. This protects us against the so-called script-injection and cross-site scripting attacks. As a very simple example, consider a page on which you allow users to anonymously post a comment. If you don't validate the input, they may write something like the following:

<script>document.location = 'http://www.usersite.com';</script>

This text is sent to the server and you save it into the DB. Later, when you have to show the comments, you would retrieve the original comment text and send to the browser as is. However, when you output the preceding text, it won't be considered as text by the browser, but rather as a JavaScript routine that redirects the user to another web site, hijacking the user away from your web site! And this was just a basic attack — more complex scripts could be used to steal users' cookies, which could include authentication tickets and personal data, with potentially grave consequences. For our protection, ASP.NET automatically validates the user input sent to the server during a postback, and checks whether it matches a pattern of suspicious text. However, in that case it raises an exception and shows an error page. You should consider the case where a legitimate user tries to insert some simple HTML just to format the text, or maybe hasn't really typed HTML but only a < character. In that case, you don't want to show an error page, you only need to ensure that the HTML code isn't displayed in a browser (because you don't want users to put links or images on your site, or text with a font so big that it creates a mess with your layout). To do so you can disable ASP.NET's input validation (only for those pages on which the user is actually expected to insert text, not for all pages!), and save the text into the DB, but only show it on the page after HTML encoding, as follows:

&lt;script&gt; document.location = 'http://www.usersite.com'; &lt;/script&gt;

This way, text inserted by the user is actually shown on the page, instead of being considered HTML. The link will show as a link but it will not be a clickable link, and no JavaScript can be run this way. The EncodedBody property returns the HTML encoded text, but it can't completely replace the Body property, because the original comment text is still required in certain situations — for example, in the administration pages where you show the text into a textbox, and allow the administrator to edit it.

Note

Scripting-based attacks must not be taken lightly, and you should ensure that your site is not vulnerable. One good reference on the web is www.technicalinfo.net/gunter/index.html, but you can easily find many others. Try searching for XSS using your favorite search engine.

Sorting Comments

We will not implement sorting features for the categories and the articles. This is because categories will always be sorted by importance (the Importance field) and then by name, whereas articles will always be sorted by release date, from the newest to the oldest, which is the right kind of sorting for these features. However, comments should be sorted in two different ways according to the situation:

From the oldest to the newest when they are listed on the user page, under the article itself, so that users will read them in chronological order so they can follow a discussion made up of questions and answers between the readers and the article's author, or between different readers.
From the newest to the oldest in the administration page, so that the administration finds the new comments at the top of the list, and in the first page (remember that comments support pagination) so they can be immediately read, edited, and, if necessary, deleted if found offensive.

If you were to make a stored procedure that supports pagination of comments, it would become more complex than it needs to be. The alternative is to dynamically build the SQL SELECT statements, but you lose the advantages of stored procedures. I came to the following compromise: We can use the stored procedure to retrieve all the article's comments (instead of a stored procedure that uses pagination), and it can be sorted from the newest to the oldest; and we can invert the order on the client by reordering the collection programmatically. I came to this conclusion by considering that the pagination would only be used on the administration pages, a single article will not have more than a few dozen short comments, and it's acceptable to retrieve all of them together when the article must be displayed. You'll be using caching quite aggressively to minimize the overhead in collecting all the comments at once. You could have sorted the comments from the oldest to the newest directly from the stored procedure, but I prefer to do it from the client to make it consistent with the other stored procedure that uses pagination.

The List<T> generic collection class has a Sort instance method that takes an object implementing System.Collections.Generic.IComparer<T> as an input, and returns its collection of items with a different sort order. The logic to compare two objects (two Comment objects in this case) is put into the object that implements IComparer, and that, as you might expect, has to implement a method named Compare. This method takes as input two objects and returns -1, 0 or 1 according to whether the first parameter is less than the second, the two are equal, or the first is greater than the second. You can use any logic you want to compare the two objects, but in a simple case like ours it is sufficient to delegate the logic to the Compare method of the DataTime class, with the AddedDate value of the two comments as parameters. In the "Solution" section you will see how simple it is to implement this technique to obtain a flexible and dynamic sorting mechanism. You should always take advantage of functionality built into the Framework!

Designing the User Interface

The design of the ASP.NET pages in this module is not particularly special, so there's not much to discuss. We have a set of pages, some for the administrators and some for the end users, which allow us to manage articles, and navigate through categories and read articles, respectively. In the first edition of this book, the most important consideration for the UI section of the first chapters was the approach used to integrate the module-specific pages into the rest of the site. However, you're already seen from previous chapters that this is very straightforward in ASP.NET 2.0, thanks to master pages. Following is a list of pages we will code later:

~/Admin/ManageCategories.aspx: This lists the current categories, and allows administrators to create new ones and delete and update existing ones.
~/Admin/ManageArticles.aspx: This lists the current articles (with pagination support) and allows administrators to delete them. The creation of new articles and the editing of existing articles will be delegated to a secondary page.
~/Admin/AddEditArticle.aspx: This allows administrators to create new articles and update existing articles.
~/Admin/ManageComments.aspx: This lists the current comments for any article, has pagination support, and supports deletion and updates of existing comments.
~/ShowCategories.aspx: This is an end-user page that lists all categories, with their title, description, and image.
~/BrowseArticles.aspx: This end-user page allows users to browse published articles for a specific category, or for all categories. The page shows the title, abstract, author, release date, average rating, and location of the articles.
~/ShowArticle.aspx: This end-user page shows the complete article, along with the current comments at the bottom, and a box to let users post new comments and rate the article.
~/GetArticlesRss.aspx: This page does not return HTML, but rather the XML of the RSS feed for the "n" most recent articles of a specific category, or for any category. The "n" number is configurable with a setting in the web.config file. You don't want "n" to be too big because that would overwhelm users and could slow down your site because a lot of news aggregators download this feed list at regular intervals to determine whether you've published any new articles.

Writing Articles with a WYSIWYG Text Editor

The first and most important challenge you face is that the site must be easily updateable by the client herself, without requiring help from any technical support people. Some regular employees working in the pub must be able to write and publish new articles, and make them look good by applying various formatting, colors, pictures, tables, etc. All this must be possible without knowing any HTML, of course! This problem can be solved by using a WYSIWYG (the acronym for "what you see is what you get") text editor: These editors enable users to write and format text, and to insert graphical elements, much like a typical word processor (which most people are familiar with), and the content is saved in HTML format that can be later shown on the end-user page "as is." There are various editors available, some commercial and some free. Among the different options I picked up FCK Editor (www.fckeditor.net), mainly because it is open source and because it is compatible with most Internet browsers, including IE 5.5+, Firefox 1.0+, Mozilla 1.3+, and Netscape 7+. Figure 5-4 shows a screenshot of an online demo from the editor's web site.

Figure 5-4

The editor is even localizable (language packs for many languages are already provided), and its user interface can be greatly customized, so you can easily decide what toolbars and what command buttons (and thus formatting and functions) you want to make available to users.

Uploading Files

The editor must be able to upload files, typically images for an article, or publish in a photo gallery, and maybe upload documents, screen savers, or other goodies that they want to distribute to their end users. An administrator of a site would be able to use an FTP program to upload files, but an editor typically does not have the expertise, or the credentials, needed to access the remote server and its file system. An online file manager might be very helpful in this situation. In the first edition of this book, an entire chapter was devoted to showing you how to build a full-featured online file manager that would enable users to browse and remove folders and files, upload new files, download, rename, copy and delete existing files, and even edit the content of text files. However, this would be overkill in most situations, as the administrator is the only one who needs to have full control over the files and folders and structure of the site, and the administrator will presumably use an FTP client for this purpose. Editors and contributors only need the capability to upload new files. To implement this functionality, we will develop a small user control that allows users to upload one file at a time, and when done, displays the full URL of the file saved on the server, so the user can easily link to it using the WYSIWYG editor. The control will be used in various pages: in the page to add and edit an article, and in the page to manage categories (as each category can have an image representing it); and later in book we'll use this in the pages that send newsletters and submit forum posts.

This user control, named FileUploader.ascx, will utilize the new ASP.NET 2.0 FileUpload control to select the file, submit it, and save it on the server. This control simply translates to an <input type="file" /> control, with server-side methods to save the image. Under ASP.NET 1.x there was no such control; you had to add the runat="server" attribute to a plain HTML control declaration.

One important design decision we need to consider is how to avoid the possibility that different editors might upload files with the same name, overwriting previous files uploaded by someone else. A simple, but effective, solution is to save the file under ~/Uploads/{UserName}, where the {UserName} placeholder is replaced by the actual user's name. This works because only registered and authenticated users will have access to pages where they can upload files. We do want to let users overwrite a file that they uploaded themselves, as they might want to change the file.

Important

Remember that you will need to add NTFS write permission to the remote Uploads folder at deployment time, for the ASP.NET (Windows 2000 and XP) or Network Service user account (Windows Server 2003). It's easy to overlook this kind of thing, and you don't want to leave a bad impression with users when you set up a new site for them.

Article List User Control

You will need a way to quickly add the list of articles (with title, author, abstract, and a few more details) to any page. It's not enough to have entirely new articles; you also need to show them on existing pages so users will know about them! You'll need to show the list on the BrowseArticles.aspx page for end users and on the ManageArticles.aspx page for administrators. You may also want to show the article list on the home page. If you've got a good understanding of user controls, you may have already guessed that a user control is the best solution for this list because it enables us to encapsulate this functionality into a single code unit (the .ascx file plus the cs code-behind file), which enables us to write the code once and then place that user control on any page using one line of code.

This user control will be named ArticleListing.ascx. It produces different output according to whether the user is a regular user, an administrator, or an editor. If they belong to one of the special roles, each article item will have buttons to delete, edit, or approve them. This way, we can have a single control that will behave differently according to its context. Besides this, when the control is placed into an administration page, it must show all articles, including those that are not yet published (approved), or those that have already been retired (based on the date). When the control is on an end-user page, it must show only the active and published articles. The control will expose the following public properties (all Boolean), so that it's content and its behavior can be changed in different pages:

Property	Description
EnableHighlighter	Indicates whether articles referring to events in the user's country, state/province, or city are highlighted with different colors
PublishedOnly	Indicates whether the control lists only articles that are approved, and whose ReleaseDate-ExpireDate interval includes the current date
RatingLockInterval	The number of days that must pass before a user can again rate the same article
ShowCategoryPicker	Indicates whether the control shows a drop-down list filled with all article categories, which lets the user filter articles by category. If the property is false the drop-down list will be hidden, and the control will filter the articles by category according to the CategoryID parameter passed on the querystring.
ShowPageSizePicker	Indicates whether the control shows a drop-down list representing the number of articles to be listed per page. If the property is true, the user will be able to change the page size to a value that best meets his desires and his connection speed (users with a slow connection may prefer to have fewer items per page so that it loads faster).
EnablePaging	Indicates whether the control will paginate the collection of articles resulting from the current filters (category and published status). When false, the control will have no paging bar and will only show the first "n" articles, where "n" is the page size. This allows us to use the control on the home page, for example, to list the "n" most recent additions. When true, it will show only the first "n" articles but will also show an indication of which page is displayed, and the user can switch between Pages of articles.

Producing and Consuming RSS Feeds

You've already learned from the Introduction that we're going to implement a mechanism to provide the headlines of the site's new content as an RSS feed, so that external (online or desktop-based) aggregator programs can easily consume them, adding new content to their own site, but also driving new traffic to our site. This process of providing a list of articles via RSS is called syndication. The XML format used to contain RSS content is simple in nature (it's not an accident that the RSS acronym stands for "Really Simple Syndication"), and here's an example of one RSS feed that contains an entry for two different articles:

<rss version="2.0">
 <channel>
  <title>My RSS feed</title>
  <link>http://www.contoso.com</link>
  <description>A sample site with a sample RSS</description>
  <copyright>Copyright 2005 by myself</copyright>

  <item>
   <title>First article</title>
   <author>Marco</author>
   <description>Some abstract text here...</description>
   <link>http://www.contoso.com/article1.aspx</link>
   <pubDate>Sat, 03 Sep 2005 12:00:34 GMT</pubDate>
  </item>
  <item>
   <title>Second article</title>
   <author>Mary</author>
   <description>Some other abstract text here...</description>
   <link>http://www.contoso.com/article2.aspx</link>
   <pubDate>Mon, 05 Sep 2005 10:30:22 GMT</pubDate>
  </item>
 </channel>
</rss>

As you see, the root node indicates the version of RSS used in this file, and just below that is a <channel> section, which represents the feed. It contains several required sub-elements, <title>, <link>, and <description>, whose names are self-descriptive. There can also be a number of optional sub-elements, including <copyright>, <webMaster>, <pubDate>, <image>, and others. After all thosefeed-level elements is the list of actual posts/articles/stories, represented by <item> subsections. An item can have a number of optional elements, a few of which (title, author, description, link, pubDate) are shown in the preceding example. For details on the full list of elements supported by RSS you can check this link, http://blogs.law.harvard.edu/tech/rss, or just search for "RSS 2.0 Specification.

One important thing to remember is that this must be a valid XML format, and therefore you cannot insert HTML into the <description> element to provide a visual "look and feel" unless you ensure that it meets XML standards (XHTML is the name for tighter HTML that meets XML requirement). You must ensure that the HTML is well formed, so that all tags have their closing part (<p> has its </p>) or are self-closing (as in <img.../>), among other rules. If you don't want the hassle of making sure the HTML is XML-compliant, you can just wrap the text into a CDATA section, which can include any kind of data. Another small detail to observe is that the value for the pubDate elements must be in the exact format "ddd, dd MMM yyyy HH:mm:ss GMT", as in "Thu, 03 Jan 2002 10:20:30 GMT". If you aren't careful to meet these RSS requirements, your users may get errors when they try to view your RSS feed. Some feed readers are more tolerant than others so it's not sufficient to make sure it works in your own feed reader — you need to meet the RSS specifications.

The RSS feed will be returned by the GetArticlesRss.aspx page, and according to the querystring settings, it will return the "n" most recent entries for a specific category, or for any category ("n" is a value specified in web.config, as explained in the configuration module design). Per standard convention, we'll use the orange "RSS" image icon as a link to the GetArticlesRss.aspx page.

Once you have an RSS feed for your site, you can also consume the feed on this site itself, on the home page, to provide a list of articles! The ArticleListing.ascx user control we already discussed is good for a page whose only purpose is to list articles, but it's too heavy for the home page. On the home page you don't need details such as the location, the rating, and other information. The new article's title and the abstract is enough — when users click on the title, they will be redirected to the page with the whole article, according to the link entry for that article in the RSS feed. We'll build our own RSS reader user control to consume our own RSS feed. The control will be generic, so that it will be able to consume RSS feeds from other sources as well, such as the site forum's RSS, the products RSS, or some other external RSS feeds. Its public properties are listed in the following table:

Property	Description
RssUrl	Full URL of the RSS feed
Title	Title to be displayed
MoreUrl	URL of a page with the full listing of items (versus the last "n" items returned by the RSS items). In the case of the articles module, this will be the URL for the BrowseArticles.aspx page.
MoreText	Text used for the link pointing to MoreUrl

The only question left is how you can take the XML of the RSS feed and transform it into HTML to be shown on the page. Here are two possible solutions:

Use an XSL stylesheet to apply to the XML content, and use an XSLT transform to write the templates that define how to extract the content from the XML and represent it with HTML.
Dynamically build a DataTable with columns for the title, author, description, and the other <item>'s elements. Then fill the DataTable with the data read from the RSS feed, and use it as the data source for a Repeater, DataList, or other template-based control.

Personally, I strongly prefer the latter option, mostly because I find it much faster to change the template of a Repeater rather than to change the template in an XSL file. Additionally, with a DataTable I can easily add calculated columns, apply filters and sorting, and merge feeds coming from different sources (consider the case where you have multiple blogs or sources of articles and want to show their RSS feeds in a single box, with all items merged together and sorted by date). The DataTable approach is more flexible and easier to work with.

The Need for Security

The articles manager module is basically divided into two parts:

The administration section allows the webmaster, or another designated individual, to add, delete, or edit the categories, publish articles, and moderate comments.
The end-user section has pages to navigate through the categories, read the articles, rate an article or post feedback, and display the headlines on the home page.

Obviously, different pages may have different security constraints: An administration page should not be accessible by end users, and an article with the OnlyForMembers flag set should not be accessible by the anonymous users (users who aren't logged in). In the previous chapter, we developed a very flexible module that allows us to administer the registered users, read or edit their profile, and dynamically assign them to certain roles. For the articles manager module we will need the following roles:

Administrators and Editors: These users have full control over the articles system. They can add, edit, or delete categories, approve and publish articles, and moderate comments. Only a very few people should belong to this role. (Note that Administrators also have full rights over the user management system, and all the other modules of the site, so it might be wise if only a single individual has this role.)
Contributors: These users can submit their own articles, but they won't be published until an administrator or editor approves them. You could give this permission to many users if you want to gather as much content as possible, or just to a selected group of people otherwise.

Enforcing these security rules is a simple task, as you've learned in the previous chapter. In many cases it suffices to protect an entire page against unauthorized users by writing some settings in that page's folder's web.config file. Settings done in a configuration file are called declarative coding, and settings made with C# source code are called imperative coding. I favor declarative coding because it's easier to modify without recompiling source code, but in some more complex cases you have to perform some security checks directly from C# code. An example is the AddEditArticle.aspx page, which is used to post a new article or edit an existing one. The first action (post) is available to Contributors and upper roles, while the second (edit) is available only to Editors and Administrators. When the page loads you must understand in which mode the page is being loaded, according to some querystring settings, and check the user's roles accordingly.