action #743

find a tool to convert mediawiki to pdf

Added by lnussel over 6 years ago. Updated over 6 years ago.

Status:ClosedStart date:02/09/2013
Priority:NormalDue date:25/10/2013
Assignee:alarrosa% Done:

0%

Category:Development
Target version:13.1 RC2
Duration: 40

Description

Jos needs an easy way to get from the features page in the opensuse wiki to a nice pdf document for the press.
It should be possible to retrieve the raw mediawiki source and convert to a markup language that is understood by office programs.

https://en.opensuse.org/index.php?title=openSUSE:Major_features&action=raw

test.html Magnifier - asciidoc output (91.1 KB) -miska-, 03/09/2013 06:57 am

test.odt - Libre Office output (37.7 KB) -miska-, 03/09/2013 06:57 am

Feature Guide.pdf - That's what it should be... (501 KB) Anonymous, 06/09/2013 01:46 pm

copy-paste.odt - plain copy-paste into LO. Beats the script output... (1.13 MB) Anonymous, 06/09/2013 01:46 pm

index.pdf - deplate generated file (after manual tweaking) (210 KB) alarrosa, 22/10/2013 04:33 pm

History

#1 Updated by -miska- over 6 years ago

Results of my first tests with pandoc:

  • pictures are screwed up
  • I prefer asciidoc output over the LibreOffice ** with little bit of sed, we should be able to fix that one...

#2 Updated by toscalix over 6 years ago

  • Status changed from New to In Progress

Michal, when you finish this task, assign it to Jos so he tries it.

#3 Updated by -miska- over 6 years ago

  • Status changed from In Progress to Feedback
  • Assignee changed from -miska- to Anonymous

Does either of it looks at least a little bit helpful? Or should I continue trying? Afaik, pictures will have to be redone manually anyway, but looks like LO stuff uses styles :-/

#4 Updated by Anonymous over 6 years ago

If I have to do the pictures by hand, I gain little over copy-pasting it all by hand into LibreOffice... At least then I get the images, even though I still have to re-arrange many of them and clean up lots of the text/formatting. See the plain copy-paste in attached odt, imho better than what you created (but still a lot of work to get to attachment two, the final pdf).

If this takes more than an hour now it's not worth it, however - within 1-2 hours I can turn the wiki into this document, so I'll do it by hand. Unless you think the script can handle images too I think we should just close this.

#5 Updated by -miska- over 6 years ago

  • Status changed from Feedback to Rejected

Images formating/allignement is difficult to handle automatically.

#6 Updated by lnussel over 6 years ago

  • Due date set to 07/10/2013
  • Status changed from Rejected to New
  • Target version changed from 13.1 Beta 1 to 13.1 RC1

#7 Updated by -miska- over 6 years ago

No update, haven't tried anything new, still think that if we need it to look pretty with pictures formating, manual interaction will be needed.

#8 Updated by lnussel over 6 years ago

  • Due date changed from 07/10/2013 to 16/10/2013
  • Target version changed from 13.1 RC1 to 13.1 RC2

Maybe those tools can do it. Worth a try IMO.

#9 Updated by lnussel over 6 years ago

Note this is not just useful for this time's release, it might be useful for documentation in general. Some things make sense to have in the wiki as well as pdf.

#10 Updated by lnussel over 6 years ago

  • Due date changed from 16/10/2013 to 25/10/2013
  • Assignee changed from -miska- to alarrosa

#11 Updated by alarrosa over 6 years ago

After installing haskell and a few other packages to try wb2pdf from sources, there are some dependencies that are not available in suse. The author provides only windows and ubuntu binaries, and recommends that if you want to try wb2pdf in another distro, you should install virtualbox with an ubuntu guest ... so I don't think it's worth the time installing it just to try it.
I'll try now javalatex and deplate ( http://deplate.sourceforge.net/ )

#12 Updated by alarrosa over 6 years ago

The latex output of deplate gave errors when compiling it to a pdf, but after some manual tweaking adding the following lines to the header,

\usepackage[utf8]{inputenc}
\usepackage[T1]{fontenc}
\DeclareUnicodeCharacter{00A0}{~}

I managed to make it generate a pdf. The awful attached pdf, to be more concrete.

#13 Updated by alarrosa over 6 years ago

I couldn't manage to get a correct latex document generated with javaLatex even after spending some time trying to manually fix it, so I think I'll close this issue and if someone thinks it's worth to spend more time to find or create a tool to convert mediawiki format to pdf, just reopen it or create a new one.

#14 Updated by toscalix over 6 years ago

  • Status changed from New to Rejected

Enough time spent already.

#15 Updated by toscalix over 6 years ago

  • Status changed from Rejected to Closed

Also available in: Atom PDF