1. Mixing JDOM 1 and 2 in a Maven project

    This article describes a question I asked on Stack Overflow. I'm including it here to keep all my technical articles in a single spot.

    In a Maven based project some of the JUnit tests failed during a Maven site build. When executing mvn clean package all tests pass. But when executing mvn clean site some tests produce errors:

    Could not initialize class org.jdom2.input.sax.XMLReaders

    These errors occur in a class that uses ROME to parse RSS data. The errors did not happen when the ROME dependency was not yet included in the project so I expect that is the cause. ROME has a dependency to JDOM 2.0.2.

    My project includes the cobertura-maven-plugin to generate a test coverage report and this plugin also has a dependency to JDOM. It uses Jaxenwhich needs JDOM 1.0

    My guess is that while executing the site goal, the Cobertura plugin is active and the JDOM version 1.0 is used by the class under test. This causes the errors in the ROME library because of the incorrect JDOM version.

    The problem was solved by setting a system value during program startup:

    System.setProperty("javax.xml.parsers.SAXParserFactory",
        "com.sun.org.apache.xerces.internal.jaxp.SAXParserFactoryImpl");

    I found this solution in this post: JDOM2: This parser does not support specification "null" version "null".

  2. Java's URL.equals() uses an internet connection

    Running FindBugs on some Java code warned me that the java.net.URL.equals() method performs domain name resolution. That means that running code will do a lookup to find the IP address of the URLs!

    It's documented, just like the same behavior of hashCode(), but that doesn't mean it's something I expected.

    To prevent the lookup FindBugs advices to use the URI class instead. Its equals()/hashCode() don't use the internet.

    Something to keep this in mind when using the URL class.

  3. Detecting Apache ErrorDocument redirection in PHP

    OK, this took me some time to figure out.

    Search engine spiders keep requesting resources that were removed from my site long ago. Since they keep coming back, they don't seem to process the repeated 404's they've been receiving. So to let hem know those resources will not return I want to send out HTTP response codes 410 (Gone) instead of the 404's (Not found).

    The Apache documentation describes this can be done using the RewriteRule directive combined with a [G] flag, like this:

    RewriteRule ^news/politics.* - [G]

    Together with the 410 response code I also configured Apache to send an error page explaining the error using a ErrorDocument directive:

    ErrorDocument 410 /error410

    404 instead of 410?

    Surprisingly, these changes in configuration result in 404's, together with the normal 404 error page, when requesting one of the removed resources.

    Checking the error page /error410 by directly requesting it in the browser returned the 410 page, so that seems to be OK.

    Rewriting requests

    One thing you need to know is that my website uses a PHP framework I wrote myself. This has a single entry point for nearly all requests. This script examines the incoming request and executes the corresponding script to render the page.

    The Apache configuration to send the requests to this script is also a RewriteRule:

    RewriteRule ^(.*)$ framework.php

    This rule is the last rewrite rule in the configuration, so it catches all requests not handled by any other specific rule. The main script uses the REQUEST_URI server variable to determine which page to render.

    To examine what happens I looked at the value of the $_SERVER['REQUEST_URI'] parameter during script execution of a normal request like http://kwebble.com/blog and a gone URL like http://kwebble.com/news/politics/no-longer-here.

    For the first one the value is /blog, as expected. For the other URL I expected /error410, the URI of the configured ErrorDocument. To my surprise it was /news/politics/no-longer-here, the original URI.

    I expected the error page because I thought the configured URI would be executed as a separate request by Apache. Here I was wrong, Apache internally redirects to the configured error document instead of making a separate request. This explains the 404, because this URL no longer points to a valid resource. But how to detect the error?

    Detecting redirects

    Looking at the server variables I noticed some differences between the 2 requests:

    • The values of REDIRECT_STATUS differ, with 200 for the normal URL and 410 for the incorrect URL.
    • A parameter called REDIRECT_REDIRECT_STATUS is only present for the incorrect URL. The value is 410.
    • The values of REDIRECT_URL differ, with /kwebble-site/blog and /kwebble-site/error410 for the incorrect URL.

    These REDIRECT_STATUS and REDIRECT_REDIRECT_STATUS server variables are created by Apache when doing an internal redirect. On an error this occurs 2 time: first when the error is detected and then again to create the error document.

    The solution

    To render the 410 page I changed the code to look for the REDIRECT_STATUS. If it's 200 the rendered page is based on the value of REQUEST_URI, otherwise use the value of REDIRECT_URL.

    Instead of always using the value from REDIRECT_URL an additional check on REDIRECT_STATUS is done. I added this because in my search for information I found several pages suggesting the REDIRECT_URL is not always present.

    This extra check makes sure a error page is always generated, and if possible reported as 410. Else it is reported like it always has with the 404, as next best alternative.

  4. Solving a maven-checkstyle-plugin ClassCastException for DetailAST

    When the maven-checkstyle-plugin generates errors like:

    antlr.CommonAST cannot be cast to com.puppycrawl.tools.checkstyle.api.DetailAST

    it may help to remove these artifacts from your local Maven repository:

    • antlr
    • commons-beanutils
    • maven-checkstyle-plugin

    At least it worked for me. Found this suggestion on the issue tracker of the plugin.

  5. Passing 'this' in jQuery event handlers

    Found this technique to pass this to an event handler on Stack Overflow. First define the handler:

    $('#the-element').on('click', {self:this}, this.onClick);

    Then inside the handler access this using the event data:

    var self = event.data.self;
  6. Random Walk in Color

    Here's a computer generated image of a random walk with changing colors. Starting from a random position, each next step moves the current point to a randomly chosen direction. Also with every step the current drawing color is adjusted. Combine a number of such walks and this is result:

    I made a page where you can generate your own images and adjust the configuration settings of the image generator.

    The script to generate such images is open source and available on GitHub.

  7. Icon fonts don't always work

    On more and more web pages my browser shows indicators of unknown characters. Here I've marked some examples:

    The page header on Twitter:

    Partly screenshot of the Twitter header bar with circled indicators for missing font characters

    Part of GitHub source menu:

    Partly screenshot menu on GitHub with circled indicators for missing font characters

    Homepage of the Mozilla Development Network:

    Partly screenshot of Moziall Developer Network page with with circled indicators for missing font characters in several places

    These boxes with character codes are caused by a combination of:

    • using icon fonts to create the images
    • my browser preference to prohibit the use of custom fonts. Which I use to make text on web pages easier to read.

    The problem is that to show the images a browser must apply the specified font. By not allowing custom fonts the browser uses the default font for the character, can't find it and decides to show a character code box.

    Now the problem would be limited if the characters were part of the Unicode standard. Then the font my browser uses might include a representation of the character. But these characters are so called private-use characters. Which means there's no standardized meaning of what they represent. So only the font mentioned in the web page includes the desired rendering of the character.

    The end result is that when I see the page I see these boxes with codes inside.

    Not pretty, not useful and not user-friendly.

  8. Use GitHub to host your own Maven repo

    A Maven based Java project I'm working on depends on a JAR file not hosted on Maven Central, or other public repo. Using GitHub, I created a personal Maven repository that allows the dependency to be declared in the POM file like any other dependency.

    Of course, if you can you should add the JAR to the Central Repository.

    Note: before you begin, make sure the license of the software allows you to publish it, because you are going to make the software publicly available.

    Create a GitHub repository

    Start with creating a GitHub repository that will become your personal Maven repo. This will be visible in the URL of the Maven repository. I use the name maven-repo.

    Clone

    Clone the GitHub repository to your computer.

    Install JAR

    Place the JAR file in the local Git repository by letting Maven perform an install:

    mvn install:install-file
     -DgroupId=[group-id]
     -DartifactId=[artifact-id]
     -Dversion=[version]
     -Dpackaging=[packaging-format]
     -Dfile=[path-to-file]
     -DlocalRepositoryPath=[path-to-git-repo]
    • [group-id], [artifact-id], [version] and [packaging-format] define the Maven properties of the file to install.
    • [path-to-file] is the path to the JAR file to install.
    • [path-to-git-repo] is the path to the local Git repository on your computer.

    After successful execution of the command a folder structure is created in the local Git repository. This structure and the files in it make it usable as a Maven repo.

    Commit & Publish

    Commit the changes, that were made by executing the Maven install command, to the local Git repository.

    Publish the updated repository to GitHub. The JAR file is now ready to be used in a Maven POM file.

    Use

    Add the GitHub repository to the POM file of the project:

    <repository>
        <id>git-[username]</id>
        <name>[username]'s Git based repo</name>
        <url>https://github.com/[username]/[repo-name]/raw/master/</url>
    </repository>
    • [username] is your GitHub user name
    • [repo-name] is the name of the GitHub repository you created.

    If the POM file does not include a definition of repositories, put the XML above inside a <repositories> element.

    The id and name of the repository is not important, so use different values if you want. Just make sure the id is unique.

    Declare the dependency in the POM as you do for every dependency.

    That's it! Maven should now be able to use the JAR file.

  9. Eclipse tip: add Import Group "*" to sort imports of unconfigured classes last

    When sorting Java import statements Eclipse puts the imports from packages not specifically configured between the others.To move those statements after the ones you did configure, open Window | Preferences | Java - Code Style - Organize Imports , create a New Import Group called * and make sure it's the last entry.

    This fixes situations like this where the import from the package de.l3s.boilerpipe.document is placed between imports of java.io. and org.xml.sax:

    import java.io.StringReader;
    
    import de.l3s.boilerpipe.document.TextDocument;
    
    import org.xml.sax.InputSource;
    import org.xml.sax.SAXException;
    

    After adding the * import group and sorting the import from the package de.l3s.boilerpipe.document is placed at the end:

    import java.io.StringReader;
    
    import org.xml.sax.InputSource;
    import org.xml.sax.SAXException;
    
    import de.l3s.boilerpipe.document.TextDocument;
    
  10. Android developer options: only for the owner

    Only the owner account can access the developer options on a device running Android 4.2 or newer. Enabling them on other accounts, by tapping the build number seven times, will only display the message that you've become a developer. It will not make the options appear in the Settings menu.

  11. Web development tip: auto-refesh a page to show code changes

    This is a tip to automatically show effects of changes to the source code of a web page. Normally I have to perform these steps to see the results of code changes:

    1. save the source code
    2. switch to the browser
    3. do a page refresh
    4. switch back to the editor

    When I find myself do this a lot while working on the HTML and CSS of a page I add a auto refresh meta tag to the head of the page:

    <meta http-equiv="refresh" content="5">
    

    Now the page refreshes itself every 5 seconds, so changes are picked up automatically. To adjust the time between refreshes change the number.

  12. Android Java-JavaScript bridge

    As a proof of concept I made a small Android app to exchange data between the Java code and JavaScript code running in a WebView. The project source code is available on GitHub.
  13. Browsers should display <time> content in local time

    Here is a suggestion for browser makers: browsers should display a timestamp marked up with the HTML5 <time> element according to the timezone of the user and locale settings of the computer.

    This is the text from a post that triggered me :

    NASA is making final preparations to launch a probe at 11:27 p.m. EDT Friday, Sept. 6th.

    EDT and AM/PM suffixes are not things I commonly use so I need to interpret their meaning and calculate the time of launch in my local time. This turns out to be 5:27 Saturday, Sept 7th for me in The Netherlands.

    Wouldn't it be great when the author marked up the time of launch with the <time> element like this:

    <p>NASA is making final preparations to launch a probe
    at <time>11:27 p.m. EDT Friday, Sept. 6th</time>.</p>
    

    the browser rendered the text as:

    NASA is making final preparations to launch a probe at 5:27 Saturday, Sept 7th.

    The timestamp is adjusted to my local time, and formatted it as I'm used to, making it easier for me to understand.

  14. PHP array filtering with anonymous functions

    Here's a technique from the blog software I wrote for this site. This software is structured in 3 separated layers, implemented in different classes:

    1. The user interface (UI) that displays blog posts in different pages.
    2. A service that exposes business methods to get the posts.
    3. A data access object (DAO) that interacts with the database to retrieve, and store, post data.

    The blog has 4 pages: my articles, text links, image thumbnails and videos. And there are RSS feeds exposing the same information. The UI classes use specific service methods to get the required posts for each page or feed.

    The service has methods like getLatestArticles(), getLatestLinks() etc.

    These methods all have the same structure:

    1. get the posts from the database
    2. filter the posts
    3. return the posts that match

    The code I wrote has the criteria declared in variables in the form of anonymous functions:

    $this->latestArticlesFilter = function(BlogPost $post) {
        return $post->published > 0
            && !$post->isLink()
            && !$post->hasVideo();
    };
    

    The public service methods passes these variables as a parameter to the filter() method:

        public function getLatestArticles() {
            return $this->filter($this->getPosts(), $this->latestArticlesFilter);
        }
    

    The filter method applies the filter using the PHP array_filter() function :

        private function filter($posts, $filter) {
            return array_filter($posts, $filter);
        }
    

    Here's a simplified version of the service class, combining the parts:

    // The service class to access blog posts.
    class BlogPostService {
    
        // The DAO for blog posts.
        private $dao;
    
        // Filter that recognizes the latest articles.
        private $latestArticlesFilter;
    
        // Filter that recognizes the latest text link posts.
        private $latestLinksFilter;
    
        // Filter that recognizes the latest image link posts.
        private $latestImagesFilter;
    
        // Filter that recognizes the latest video link posts.
        private $latestVideosFilter;
    
        // Creates the blog post service.
        public function __construct($dao) {
            $this->dao = $dao;
    
            // Create filters in constructor, it's not allowed in the variable definition.
            $this->latestArticlesFilter = function(BlogPost $post) {
                return $post->published > 0
                    && !$post->isLink()
                    && !$post->hasVideo();
            };
    
            $this->latestLinksFilter = function(BlogPost $post) {
                return $post->published > 0
                    && $post->isLink()
                    && !$post->hasImages()
                    && !$post->hasVideo();
            };
    
            $this->latestImagesFilter = function(BlogPost $post) {
                return $post->published > 0
                    && $post->isLink()
                    && $post->hasImages();
            };
    
            $this->latestVideosFilter = function(BlogPost $post) {
                return $post->published > 0 && $post->hasVideo();
            };
        }
    
        // Returns the latest published articles.
        public function getLatestArticles() {
            return $this->filter($this->getPosts(), $this->latestArticlesFilter);
        }
    
        // Returns the latest published text link posts.
        public function getLatestLinks() {
            return $this->filter($this->getPosts(), $this->latestLinksFilter);
        }
    
        // Returns the latest published image link posts.
        public function getLatestImages() {
            return $this->filter($this->getPosts(), $this->latestImagesFilter);
        }
    
        // Returns the latest published video link posts.
        public function getLatestVideos() {
            return $this->filter($this->getPosts(), $this->latestVideosFilter);
        }
    
        // Filters posts.
        private function filter($posts, $filter) {
            return array_filter($posts, $filter);
        }
    
        // Returns the latest blog posts.
        private function getPosts() {
            return $this->dao->getStoredPosts();
        }
    
    }
    

    The advantage of this technique is that the selection criteria are defined and contained in variables that can be passed around and used where necessary. I find it a clean solution that is easy to read.

    I understand there are other ways to filter data. The most obvious one is to put the selection in SQL queries on the database and directly select the specific posts.

    But at the moment this blog software uses a storage mechanism that stores PHP objects in the database. There are no specific tables describing the posts, it's just serialized objects. This is easy to use since it doesn't require database changes when the classes change, but performance may not be optimal for all applications. But for now it's good enough for this application.

  15. kwebble.com: the next generation

    Welcome to the new version of kwebble.com, with a complete new look. But the biggest change is the software running this site.

    The blog is no longer powered by WordPress. The software I wrote to replace it makes it easier to keep things like navigation and page layout consistent across the site. Based on this framework I wrote my own blog software.