Skip to main content
Advanced Search
Search Terms
Content Type

Exact Matches
Tag Searches
Date Options
Updated after
Updated before
Created after
Created before

Search Results

156 total results found

Large Scale Multi-Label Learning

AI and ML Tasks

The Keras website has a tutorial on how to do multi-label learning with a large number of labels: https://keras.io/examples/nlp/multi_label_classification/ (mirror)

Online Reading and Feeds

Devices and Tech

RSS I use FreshRSS to manage my feeds for me and the associated Android client for on the go. On the desktop I use NewsFlash reader which can also subscribe to FreshRSS Read it Later App I use Wallabag to store articles I want to read later and the accompanyin...

Data Wrangling

Data Engineering and MLOps

DuckDB DuckDB is a lightweight OLAP type database system written in C++ and designed to be used for EDA style activities: From their website: advice on when to use and not to use DuckDB Polars Polars is a rust-based data frames library with Python bindings H...

ML Best Practices

AI and ML

Machine learning is a complex and multifaceted activity that requires the combination of a number of success factors in order to work. In order to execute machine learning well, it is important to have a good understanding of the processes and variables that f...

Hugo Static Site Generation

Devices and Tech

I use Hugo to maintain most of my websites. Extended Edition Hugo has an extended version which includes hooks for building SASS. Hugo recommend using snaps to manage and install versions of their tool rather than relying on debian packages since these can oft...

Core Scientific Concepts (CoreSC)

AI and ML

Core Scientific Concepts (CoreSC) is an annotation scheme used to delineate different parts of scientific discourse in a scientific paper. There are 11 categories: Background Conclusion Experiment Goal Hypothesis Method Model Motivation Object Observation Res...

Gaming

🌱 Seed Propagator

One of my hobbies is video gaming. I recently got my hands on a Steam Deck which I would describe as a "Nintendo Switch Pro". I've been very impressed with the capabilities of the system. Watch List The Store is Closed: Infinite Furniture Store Survival Game...

Times and Dates in Python

Python

The built in datetime library in Python can be a bit rubbish/difficult to use. Pendulum provides an API kind of similar to moment.js although the parsing of text dates is not quite as flexible/powerful.

Webmentions

🌱 Seed Propagator

Webmentions are a way for IndieWeb folks to notify each other that something has happened, they use microformats internally. WebMention.App provides an API for sending web mentions automatically but you have to know which page you want to send them from. I wil...

Learning In Public

PKM

Learning Exhaust This blog post by swyx highlights the benefits of learning in public: You already know that you will never be done learning. But most people “learn in private”, and lurk. They consume content without creating any themselves. Again, that’s fin...

Batch Iterating in Pandas

Python

BATCH_SIZE=32 for k,grp in df.groupby(np.arange(len(df))//BATCH_SIZE): # grp is a tiny dataframe BATCH_SIZE rows long print(k,grp) References python - How to iterate over consecutive chunks of Pandas dataframe efficiently - Stack Overflow

Logging and Winston

Node and Typescript

Winston is a fancy logging library for node. Using Common Loggers Between Packages As per this stackoverflow post (mirror): Declare and export your winston logger object and from different locations within your app.

Stratified Sampling in Pandas

AI and ML

Use groupby on the label column to create sub-frames for each label and then use the sample() function. Passing an integer gives an exact sample (e.g. sample(5) gives 5 rows). Passing frac=0.1 gives a percentage (i.e. 10%) Remember to set random_state for rep...

From Crowd Ratings to Predictive Models of Newsworthiness to Support Science Journalism

AI and ML

Paper Link Authors: Sachita Nishal Nicholas Diakopoulos Notes Their work comes at the problem from the scientific paper - essentially they are trying to predict whether or not a scientific article might make an interesting news article (as opposed to ...

Steam Deck

🌱 Seed Propagator

Proton Use ProtonUp to install custom versions of proton on the deck. You can find this in the Software store on the deck's KDE desktop (open up Discover and search protonup-qt) Heroic Game Launcher Heroic is a GUI that wraps both Epic and GOG allowing install...

CRON No MTA installed discarding output

Software Engineering Misc

Answer from here Linux uses mail for sending notifications to the user. Most Linux distributions have a mail service including an MTA (Mail Transfer Agent) installed. Ubuntu doesn't though. You can install a mail service, postfix for example, to solve this pr...

IndieWeb

🌱 Seed Propagator

I've been interested in IndieWeb since I encountered the concept and owning your own data for a long time. My own site Brainsteam uses micropub and microsub and can receive webmentions. I use my own hand-rolled micropub endpoint in combination with the Hugo st...

Hypothes.is

🌱 Seed Propagator

Hypothesis is a web annotation tool - you can annotate any page and your comments are then public for others to see (or you can privately annotate stuff) Data Ownership Hypothes.is is an open source project run by a non-profit. They consider all annotations ma...

DBT

Data Engineering and MLOps

DBT is a data transformation tool with a SaaS platform and an open-core command line tool. The tool is widely used to put the T in ELT. Robin Moffat has written a walkthrough/guide on how he used DBT with DuckDB

Model Quantization

AI and ML

Deploying models that are performant (obviously statistically but in this context I primarily mean computationally) is challenging when you are working with large models such as BERT etc. Quantization involves compressing model weights into smaller, more effi...