nemozone

Websites for data visualization

February 1, 2021

https://infogram.com/

https://knoema.com/

https://carto.com/

https://public.tableau.com/s/download/

https://coda.io/welcome

https://knoema.com/

OpenRefine

February 1, 2021

A free, open source, powerful tool for working with messy data

Website: https://openrefine.org/download.html Source: https://github.com/OpenRefine/OpenRefine

How To Spot Fake News

February 1, 2021

Critical thinking is a key skill in media and information literacy, and the mission of libraries is to educate and advocate its importance.

Discussions about fake news has led to a new focus on media literacy more broadly, and the role of libraries and other education institutions in providing this.

When Oxford Dictionaries announced post-truth was Word of the Year 2016, we as librarians realise action is needed to educate and advocate for critical thinking – a crucial skill when navigating the information society.

IFLA has made this infographic with eight simple steps (based on FactCheck.org’s 2016 article How to Spot Fake News) to discover the verifiability of a given news-piece in front of you. Download, print, translate, and share – at home, at your library, in your local community, and on social media networks. The more we crowdsource our wisdom, the wiser the world becomes.

https://www.ifla.org/publications/node/11174

How to spot fake news!

Here, also a good reference, but this is in german, I'm sorry

Here are some additional sources

https://www.hoax-slayer.net/ https://www.snopes.com/ https://www.klartext-nahrungsergaenzung.de/ https://kit.exposingtheinvisible.org/en/ https://www.politifact.com/ https://www.newsguardtech.com/ https://datadetoxkit.org/en/misinformation/steerclear/ https://toolsforreporters.com/2020/11/11/the-media-manipulation-casebook/ https://expertisefinder.com/ Find some experts and ask them direct https://misinfocon.com/ https://datajournalism.com/read/handbook/verification-3 https://www.allsides.com/media-bias/media-bias-chart https://mediabiasfactcheck.com/ https://adfontesmedia.com/ https://www.lobbyregister.bundestag.de/startseite https://www.lobbycontrol.de/ https://www.abgeordnetenwatch.de/ https://fragdenstaat.de/ https://www.countrycode.org/ https://der-newstest.de/ https://correctiv.org/en/ https://uebermedien.de/ https://dpa-factchecking.com/

https://addons.mozilla.org/en-US/firefox/addon/newsguard/ https://chrome.google.com/webstore/detail/official-media-bias-fact/hdcpibgmmcnpjmmenengjgkkfohahegk?hl=en-US

Tabula

February 1, 2021

How Can Tabula Help Me? If you’ve ever tried to do anything with data provided to you in PDFs, you know how painful it is — there's no easy way to copy-and-paste rows of data out of PDF files. Tabula allows you to extract that data into a CSV or Microsoft Excel spreadsheet using a simple, easy-to-use interface. Tabula works on Mac, Windows and Linux. Who Uses Tabula? Tabula is used to power investigative reporting at news organizations of all sizes, including ProPublica, The Times of London, Foreign Policy, La Nación (Argentina), The New York Times and the St. Paul (MN) Pioneer Press. Grassroots organizations like SchoolCuts.org rely on Tabula to turn clunky documents into human-friendly public resources. And researchers of all kinds use Tabula to turn PDF reports into Excel spreadsheets, CSVs, and JSON files for use in analysis and database applications…

https://tabula.technology/

https://github.com/tabulapdf/tabula-java/

https://cran.r-project.org/web/packages/tabulizer/vignettes/tabulizer.html

Secure Drop

February 1, 2021

Share and accept documents securely.

SecureDrop is an open source whistleblower submission system that media organizations and NGOs can install to securely accept documents from anonymous sources. It was originally created by the late Aaron Swartz and is now managed by Freedom of the Press Foundation. SecureDrop is available in 20 languages

https://securedrop.org/

An addition to secure drop

Thanks to GlobaLeaks everybody can easily setup a secure and anonymous whistleblowing initiative[…]

https://www.globaleaks.org/about/

https://www.globaleaks.org/

What is metadata?

February 1, 2021

Everything you wanted to know about media metadata, but were afraid to ask

Metadata comes in handy sometimes, like when you’re flipping through old pictures by date, or by location. But in the wrong hands, this same information could be damaging.

https://freedom.press/training/everything-you-wanted-know-about-media-metadata-were-afraid-ask/

https://freedom.press/training/

Image “Cloaking” for Personal Privacy

February 1, 2021

Check out our MacOS/Windows Software on our official webpage. Fawkes is a privacy protection system developed by researchers at SANDLab, University of Chicago. For more information about the project, please refer to our project webpage. Contact us at fawkes-team@googlegroups.com. We published an academic paper to summarize our work “Fawkes: Protecting Personal Privacy against Unauthorized Deep Learning Models” at USENIX Security 2020.

https://sandlab.cs.uchicago.edu/fawkes/

https://github.com/Shawn-Shan/fawkes

tesseract (software) optical character recognition (OCR) software

February 1, 2021

https://en.wikipedia.org/wiki/Tesseract_(software)

https://github.com/tesseract-ocr/tesseract

About This package contains an OCR engine – libtesseract and a command line program – tesseract. Tesseract 4 adds a new neural net (LSTM) based OCR engine which is focused on line recognition, but also still supports the legacy Tesseract OCR engine of Tesseract 3 which works by recognizing character patterns. Compatibility with Tesseract 3 is enabled by using the Legacy OCR Engine mode (—oem 0). It also needs traineddata files which support the legacy engine, for example those from the tessdata repository. The lead developer is Ray Smith. The maintainer is Zdenko Podobny. For a list of contributors see AUTHORS and GitHub's log of contributors.

Toolbox: Texterkennung mit Tesseract OCR

February 1, 2021

Anleitung: Tesseract OCR auf Ubuntu 22.04 installieren Tesseract OCR ist eine Open-Source-Software zur optischen Zeichenerkennung (Optical Character Recognition, OCR), die auf Ubuntu 22.04 installiert werden kann. Mit Tesseract OCR können Sie Bilder und PDF-Dateien in Text umwandeln, auch für die deutsche Sprache.

Tesseract OCR installieren Öffnen Sie die Befehlszeile und geben Sie folgenden Befehl ein: sudo apt-get install tesseract-ocr tesseract-ocr-dev libleptonica-dev libtesseract-dev
German language-Pack herunterladen Um die deutsche Sprache zu unterstützen, müssen Sie das German language-Pack herunterladen. Geben Sie dazu den Befehl sudo apt-get install tesseract-ocr-script-deu ein.
Tessdata Ordner erstellen Erstellen Sie einen Ordner namens “tessdata” in Ihrem Home-Verzeichnis. Geben Sie dazu den Befehl mkdir ~/tessdata ein.
German language-Pack in den Tessdata Ordner kopieren Kopieren Sie das heruntergeladene German language-Pack in den Tessdata-Ordner. Geben Sie dazu den Befehl cp /usr/share/tesseract-ocr/tessdata/script/deu.traineddata ~/tessdata ein.
Tesseract OCR ausführen Sie können jetzt Tesseract OCR auf ein Bild oder eine PDF-Datei anwenden. Geben Sie dazu den Befehl tesseract image.png image.txt -l deu ein, wobei “image.png” durch den Namen Ihres Bildes oder PDF-Datei ersetzt werden sollte.

Beachten Sie, dass Tesseract OCR möglicherweise nicht perfekt ist und Fehler bei der Erkennung von Texten machen kann, insbesondere bei schlechter Bildqualität oder ungewöhnlichen Schriftarten. Es gibt jedoch viele Tools und Methoden, um die Genauigkeit von Tesseract OCR zu verbessern.

Schritte um die Genauigkeit von Tesseract OCR zu erhöhen

Es gibt mehrere Möglichkeiten, um die Genauigkeit von Tesseract OCR zu verbessern:

Bildvorbereitung: Stellen Sie sicher, dass das Bild, das Sie verwenden möchten, ausreichend hell und kontrastreich ist. Eine Skalierung oder Beschnitt des Bildes kann auch helfen, um die OCR-Genauigkeit zu verbessern.

Schriftart- und Schriftgrößen-Trainingsdaten: Tesseract OCR kann trainiert werden, um bestimmte Schriftarten und Schriftgrößen besser zu erkennen. Durch Hinzufügen von Trainingsdaten für diese Schriftarten und Schriftgrößen kann die Genauigkeit verbessert werden.

Benutzerdefinierte Wörterbücher: Tesseract OCR kann auch mit einem benutzerdefinierten Wörterbuch trainiert werden, um bestimmte Wörter oder Abkürzungen besser zu erkennen.

Konfigurationsoptionen: Tesseract OCR hat verschiedene Konfigurationsoptionen, die verwendet werden können, um die Genauigkeit zu verbessern. Dazu gehören die Verwendung von spezifischen OCR-Engines, die Anpassung von Schwellenwerten für die Texterkennung und die Verwendung von spezifischen Wörterbüchern.

Verwendung von OCR-Optimierer: es gibt auch OCR-Optimierer Tools, die man verwenden kann um die Erkennungsgenauigkeit zu verbessern, wie z.B. Image cleaning tools die helfen können um Störfaktoren wie Rauschen, Schatten usw. zu eliminieren.

Post-Processing: Nachdem die OCR ausgeführt wurde, kann man mithilfe von Text-Processing-Tools, wie z.B. Regex, die Erkennungsgenauigkeit verbessern.

Es ist wichtig zu beachten, dass die beste Methode zur Verbesserung der Genauigkeit von Tesseract OCR von den spezifischen Anforderungen und Eigenschaften des Bildes oder der PDF-Datei abhängt. Es ist ratsam, mehrere Methoden auszuprobieren und die Ergebnisse zu vergleichen, um die beste Lösung zu finden.

https://www.heise.de/ct/artikel/Toolbox-Texterkennung-mit-Tesseract-OCR-1674881.html

https://wiki.ubuntuusers.de/tesseract-ocr/

How To Run Tails Linux Inside Virtualbox

January 28, 2021

Tails Linux, or just Tails, is a live operating system based on Debian designed to protect your privacy and anonymity. You can boot it from your DVD and USB thumb drive or run it inside a virtual machine. It routes all your data through the Tor Network. In this tutorial, I’m going to show you how to run the Tails live image inside Virtualbox. Tails is an abbreviation of The Amnesic Incognito Live System…

The benefit of using a live operating system such as Tails is that they preconfigured to make sure that there are no trace left on your device

– Linux Babe

source: https://www.linuxbabe.com/desktop-linux/how-to-run-tails-linux-inside-virtualbox