- Change the XHTML header to a HTML5 header - there seem to be no major compatibility issues between XHTML and HTML5 (see http://coding.smashingmagazine.com/2009/07/29/misunderstanding-markup-xhtml-2-comic-strip/ !)
- Make sure all the XML parsing done internally is using the right character set.
- Remove all the bit's of code designed to stop UTF-8 characters getting into the DB!
- Change php::tidy config to output UTF-8.
- Dump the DB out of mysql and change all the DEFAULT CHARSET=latin1 -> DEFAULT CHARSET=utf8
- do a "ALTER DATABASE nrichdb charset = 'utf8';" on the main database
- Reimport the DB
- Pull all the content out of the DB put thru utf8_encode() add the UTF-8 encoding to the XML processing instructions and then resave.
- ...
- $PROFIT$ well OK maybe not...
ogs22's blog
Random technical stuff, mostly relevant to http://maths.org/ and its subdomains
Wednesday, February 01, 2012
Fun in HTML5 and UTF-8 land
Friday, August 20, 2010
Importing old html content into Drupal 6
Importing old content into Drupal is a fairly painless process once you know:
- How to create Nodes
- How to pull the old pages into php
After spending far too much time of mine own trying to get to grips with the node creation process I found http://acquia.com/blog/migrating-drupal-way-part-i-creating-node which pretty much answers most of the basic node creation questions.
So I created my function to create the nodes from my old pages
It's probably worth me describing the format of the old pages, I had 200 html 'index' pages in a database which linked to somewhere between 0 and 20 html,image or other documents per 'index' grouped as Talk,Project or Additional Material files, so a total of just under 1000 html pages.
I wanted to create the index files with links to the files and backlinks from the files to the index
<?php
require 'includes/bootstrap.inc';
require 'modules/node/node.pages.inc';
drupal_bootstrap(DRUPAL_BOOTSTRAP_FULL);
function mcreate($title,$body,$path,$tagmap=false,$tlink=false,$plink=false,$alink=false,$type="resources") {
$node = new stdClass();
$node->title = $title;
$node->body = $body;
$node->type = $type;
$node->path = $path;
$node->changed = time();
$node->status = 1;
$node->promote = 1;
$node->sticky = 0;
$node->format = 3; // unFiltered HTML
$node->uid = 1; // UID of content owner
node_object_prepare($node);
//topic link
if ($tlink) {
$i=0;
foreach ($tlink as $title=>$url) {
$node->field_tlink[$i]['title']=$title;
$node->field_tlink[$i]['url']=$url;
$node->field_tlink[$i]['attributes']=array();
$i++;
}
}
//project link
if ($plink) {
$i=0;
foreach ($plink as $title=>$url) {
$node->field_plink[$i]['title']=$title;
$node->field_plink[$i]['url']=$url;
$node->field_plink[$i]['attributes']=array();
$i++;
}
}
//addition link
if ($alink) {
$i=0;
foreach ($alink as $title=>$url) {
$node->field_alink[$i]['title']=$title;
$node->field_alink[$i]['url']=$url;
$node->field_alink[$i]['attributes']=array();
$i++;
}
}
if ($tagmap) {
$node->taxonomy = $tagmap;
}
$node->created = $unixdate;
node_save($node);
return $node;
}The field_tlink field_plink field_alink are http://drupal.org/project/link CCK fields which link to the talk/project/addition files
Taxonomy Gotcha - Unless you add the the tid and a value to the term_hierarchy database table the taxonomy terms will not show up
mysql> insert into term_hierarchy select tid,0 from term_data
did the job for me
the Main body of the script to process the files is fairly normal php stuff I'll publish here for reference and the include with the functions here
Here is how the script basically works:
- get the index files links and taxonomy from DB
- get the html using curl
- clean up and change the img links
- Create new nodes for each link (if it's html)
- create the index node with links to the files
- edit the link files 'linked_from' with the UID of the index file. and repeat....
The functions explained:
function setlinkfrom($subnode,$linkfrom)
send an array of nodes (the linked to pages) with the nid of the index node, update the nodes with the backlink and save
function fetchpluspage($domain,$url)
Uses PHP Curl to fetch http:// pages
function cleanuppage($page)
Uses PHP tidy to clean up dodgy html
function tagmap($tags)
send this function an array of taxonomy terms and it will insert then into taxonomy vocab 1 if they aren;t there already and return their IDs for insertion into the node
function getguts($data)
Horrible function to pull relevant bits out of html pages
Thursday, April 29, 2010
Thursday, March 11, 2010
finding additional line returns at the end of php files
The Unix command
grep -rP "\?>\n\n+$" *
run in the same directory as the php should locate the offending files
Wednesday, November 25, 2009
Slow X11 startup on new Snow leopard install
In /var/log/system.log there were entries like:
Nov 25 14:01:29 lapc-br1-2 org.x.startx[366]: xauth: timeout in locking authority file /Users/ogs22/.serverauth.366
Nov 25 14:01:49 lapc-br1-2 org.x.startx[366]: xauth: timeout in locking authority file /Users/ogs22/.Xauthority
Checking the file permission on my home directory showed:
drwxr-xr-x 210 temp 502 7140 25 Nov 14:01 ogs22
temp being the user I created to transfer my files/settings using migration assistant, so a quick
sudo chown ogs22:ogs22 ogs22
and X11 starts in about 2 seconds now.
Thursday, November 12, 2009
Shakespeare Letter Frequency
9.0881% : e
6.7498% : t
6.3808% : o
5.9213% : a
5.1092% : i
5.0632% : s
4.9656% : n
4.8777% : h
4.7921% : r
3.4463% : l
3.0274% : d
2.6422% : u
2.2643% : m
1.9180% : y
1.8387% : w
1.7362% : c
1.7181% : ,
1.6475% : f
1.6063% : .
1.3916% : g
1.2293% : b
1.1645% : p
0.7681% : v
0.7281% : k
0.6476% : '
0.3584% : ;
0.2184% : ?
0.1840% : !
0.1635% : -
0.1042% : x
0.0945% : j
0.0746% : q
0.0432% : [
0.0430% : ]
0.0377% : :
0.0339% : z
0.0094% : "
0.0050% : 1
0.0034% : )
0.0034% : (
0.0026% : 2
0.0021% : 3
0.0018% : 4
0.0016% : 5
0.0014% : _
0.0012% : 6
0.0011% : 9
0.0009% : 0
0.0008% : 7
0.0007% : |
0.0007% : 8
0.0006% : <
0.0004% : &
0.0000% : }
0.0000% : `
Friday, June 26, 2009
Entities, ext/xml and libxml 2.7 - CDATA Zone
This just cost me an afternoon - portupgrade libxml2 ; portupgrade -f php5 saved me though :)
So if you find ampersand and other entities disappearing in your php scripts it's one to look out for