need to keep caption in UTF8 always because it might contain