Thursday, 18 August 2016

Weird matlab behaviour for empty matrices.

I've noticed that (at least in Matlab 2015b) a weird behaviour with Matlab matrices.

Here are two ways of making different empty matrices:
>> a=[]
>> size(a)
ans=
    0  0
>> isempty(a)
ans=
    1


>> b=find(1==[0 0])
>> size(b)
ans=
    1  0
>> isempty(b)
ans=
    1


They both look empty, but have different sizes. But critically for some code, they behave very differently in a for loop
>> for ind=a
>>    ...this never gets executed!
>> end;

>> for ind=b
>>    ...this gets executed once, with ind=[];
>> end;

This is a pernicious problem. Is there any reason for it to be like this?

Monday, 21 April 2014

Illustrator revalation

I often use Adobe Illustrator to draw illustrations or finesse graphics. While I'd prefer a cheaper package, there doesn't currently seem to be anything that comes close to the sophistication of this tool. Like Photoshop (or Flash) it isn't the most intuitive thing, though. You really need expertise. But first, you need to know what the feature is that you need. As soon as you know this, Google is your friend.

So, today, I thought "Wouldn't it be cool if there were a way to align multiple objects not just to each other, but to the fixed position of a single object?"

It turns out this is dead simple - you shift-click to select all the objects you wish to align, and then do an extra click on one of these selected items. This becomes the reference to which everything else is aligned.

I was so excited I had to share.

Thursday, 3 April 2014

Sunday, 27 October 2013

meld

Both for maintaining github repositories, and assorted code merging, I've been enjoying using the tool "meld". For adhoc merging, if you have two different versions of a file, do

meld [filename1] [filename2]

You'll then see the two files in parallel, with MELD highlighting new, changed or deleted code in each file. You can then click the arrows to transfer the change in one direction or the other.


You may also merge more than two versions.

If you use git, you may set this to use meld by default:

git config --global merge.tool meld

Then type

git mergetool

will resolve all of the conflicts that have arisen with an update.


Saturday, 5 October 2013

Benchmarking ZFS against LVM

ZFS is a filing system that offers deduplication (i.e., two copies of the same file in different parts of the file system are only stored once) and compression with two algorithms (lzjb or gzip). A trial dataset of typical neuroimaging data from our system (59GB ) gave a 1.5 deduplication ratio and compression of 1.25 with lzjb and 1.32 with gzip. These are tests of ZFS backed by EBS on an Amazon EC2 cc2.8xlarge server.

Following a clear of the memory cache with:
sync; echo 3 > /proc/sys/vm/drop_caches

I then benchmarked a directory listing (~1000 nifti files) and read/write (10 random EPI files per subject). Access times (ms)  with statistics over 10 subjects of 4 sessions:

dir zfs-nocompress 58.8 +/- 42.0 zfs-lzjb 38.5 +/- 16.2 zfs-gzip 53.9 +/- 47.6 lvm 146.8 +/- 480.6
read zfs-nocompress 576.3 +/- 277.0 zfs-lzjb 629.7 +/- 301.7 zfs-gzip 523.5 +/- 82.1 lvm 499.8 +/- 377.1
write zfs-nocompress 207.1 +/- 13.1 zfs-lzjb 207.0 +/- 12.6 zfs-gzip 205.5 +/- 6.5 lvm 188.3 +/- 1.3

 On a second run immediately succeeding the one above (to maximize caching)

dir zfs-nocompress 6.0 +/- 0.3 zfs-lzjb 5.8 +/- 0.4 zfs-gzip 5.9 +/- 0.3 lvm 4.8 +/- 0.2
read zfs-nocompress 549.8 +/- 150.6 zfs-lzjb 601.4 +/- 292.7 zfs-gzip 544.2 +/- 210.8 lvm 461.2 +/- 200.0
write zfs-nocompress 207.8 +/- 17.3 zfs-lzjb 206.9 +/- 16.4 zfs-gzip 203.7 +/- 1.9 lvm 189.8 +/- 3.1

Directory listing has got faster, and reads a bit faster.

ZFS seems slower for reading (1-2.5 times, depending on conditions), partly due to decompression latencies. It also uses more CPU, but from watching the system monitor, this was barely noticeable here as the disk access to EBS is still the bottleneck.

ZFS has other benefits - it supports caching (on SSD drives for cc2.8xlarge ephemerals); and snapshots for backup, which don't use memory except to store changes in files.

In summary, for our data deduplication and compression look valuable, but it does come with a performance cost. But, this can be made up for by more parallelisation of jobs. gzip looks the better option for our data.


Update: After copying in ~3TB of data, in lots of small files, ZFS appears to be slowing down for both directory listing and reading. The files had probably fallen out of the cache (unless ZFS had kept them in because of their frequency of access).
dir zfs-nocompress 35.8 +/- 33.6 zfs-lzjb 86.2 +/- 51.6 zfs-gzip 127.7 +/- 179.2 lvm 5.1 +/- 0.2
read zfs-nocompress 792.1 +/- 415.7 zfs-lzjb 822.9 +/- 372.7 zfs-gzip 887.2 +/- 475.5 lvm 390.0 +/- 36.3
write zfs-nocompress 209.5 +/- 2.6 zfs-lzjb 208.5 +/- 2.1 zfs-gzip 208.1 +/- 3.7 lvm 192.5 +/- 1.4

And a repeat, where the files will have been in the cache:dir zfs-nocompress 19.2 +/- 8.2 zfs-lzjb 23.3 +/- 14.2 zfs-gzip 22.0 +/- 19.5 lvm 5.0 +/- 0.2
read zfs-nocompress 781.9 +/- 429.4 zfs-lzjb 698.2 +/- 179.7 zfs-gzip 788.0 +/- 320.0 lvm 409.9 +/- 135.7

write zfs-nocompress 207.8 +/- 3.3 zfs-lzjb 206.4 +/- 2.2 zfs-gzip 207.5 +/- 5.0 lvm 193.3 +/- 2.5
ZFS is doing even worse. This seems worrying... does anyone have a solution?

Update: Under reasonable load (here 5 simultaneous copies from lvm to zfs) things really seem to fall apart with ZFS:
dir zfs-nocompress 4438.2 +/- 20030.3 zfs-lzjb 453.5 +/- 554.9 zfs-gzip 732.9 +/- 530.3 lvm 14.3 +/- 10.5
read zfs-nocompress 8622.1 +/- 5483.3 zfs-lzjb 5494.3 +/- 4014.2 zfs-gzip 1485.2 +/- 824.7 lvm 445.9 +/- 198.3
write zfs-nocompress 4400.4 +/- 4677.5 zfs-lzjb 1946.9 +/- 3082.8 zfs-gzip 2445.3 +/- 2301.6 lvm 201.5 +/- 13.5
I've checked the L2ARC cache and it is nowhere near full - 8 GB out of 32 GB max. What's the bottleneck, and why does it degrade so dramatically?

Update: Part of the bottleneck is de-duplication, as described in this post:
http://utcc.utoronto.ca/~cks/space/blog/solaris/ZFSDedupMemoryProblem?showcomments#comments
Switching this off speeds things up, but not enough under reasonable load - ZFS is still unacceptably slow.

Conclusion: Abandoned as a solution for now, until I can identify the cause of the slowing.


Monday, 5 August 2013

For UWO employees... accessing journal or book subscriptions from outside the office

If you've gone through pubmed or Scholar and arrived at a paper you'd like to obtain, but you've been rejected because the site doesn't recognize your subscription, then just put "proxy2.lib.uwo.ca" onto the end of the site name, between the URL and its parameters. For example...
http://www.sciencedirect.com/science/article/pii/S1053811910010062
becomes...
http://www.sciencedirect.com.proxy2.lib.uwo.ca/science/article/pii/S1053811910010062

You'll then need your Western ID and password, of course.

Magic, huh?

Tuesday, 30 July 2013

Setting up your .bashrc for Mturk on cercimage

Add these two lines on the end:

export PATH=/home/rcusack/software/aws-mturk-clt-1.3.0/bin:$PATH
export MTURK_CMD_HOME=/home/rcusack/software/aws-mturk-clt-1.3.0