parse dml to markdown

dml file comments
markdown syntax based on dml file comments
#!/bin/bash

directory=$PWD
for file in $(find $directory -type f -name 'als.dml' | sort)
do
  echo "$file" >> doc.md
  # remove first two characters in the string
  echo "$(sed -n 22,26p $file)" | cut -c 3- >> doc.md 
  echo "$(sed -n 27,47p $file)" | cut -c 3- >> doc.md
done
exit $?

# ref https://tldp.org/LDP/abs/html/loops1.html

Output

/mnt/f/Repo/P005/sml/systemds/scripts/builtin/als.dml

This script computes an approximate factorization of a low-rank matrix X into two matrices U and V
using different implementations of the Alternating-Least-Squares (ALS) algorithm.
Matrices U and V are computed by minimizing a loss function (with regularization).

INPUT   PARAMETERS:
---------------------------------------------------------------------------------------------
NAME    TYPE     DEFAULT  MEANING
---------------------------------------------------------------------------------------------
X       String   ---      Location to read the input matrix X to be factorized
rank    Int      10       Rank of the factorization
reg     String   "L2"        Regularization: 
                          "L2" = L2 regularization;
                             f (U, V) = 0.5 * sum (W * (U %*% V - X) ^ 2)
                                      + 0.5 * lambda * (sum (U ^ 2) + sum (V ^ 2))
                          "wL2" = weighted L2 regularization
                             f (U, V) = 0.5 * sum (W * (U %*% V - X) ^ 2)
                                      + 0.5 * lambda * (sum (U ^ 2 * row_nonzeros) 
                                      + sum (V ^ 2 * col_nonzeros))
lambda  Double   0.000001 Regularization parameter, no regularization if 0.0
maxi    Int      50       Maximum number of iterations
check   Boolean  TRUE     Check for convergence after every iteration, i.e., updating U and V once
thr     Double   0.0001   Assuming check is set to TRUE, the algorithm stops and convergence is declared 
                          if the decrease in loss in any two consecutive iterations falls below this threshold; 
                          if check is FALSE thr is ignored
---------------------------------------------------------------------------------------------

SystemDS devs work on parser - https://github.com/apache/systemds/commit/0ad0c991904cb6d84d62ce2c2a2d6077b4b3973d

.https://github.com/apache/systemds/blob/master/scripts/staging/python_script_generator/Generator-Design%20Document-v2.md

Last updated

Was this helpful?