main
  • About
  • Civil Engineering
    • Interview questions
    • Bridge design
  • Google Cloud
    • Code samples
    • kafka
    • Cloud Run
    • persistent disks
    • Spinnaker
    • Assessment questions
    • IAM
    • Cloud Storage
    • VPC
    • Cost optimization
    • Compute Engine
    • App Engine
    • Cloud Vision
    • Spanner
    • Cloud SQL
    • Solutions
      • Static IP - WIP
      • Network performance
      • Building a VPN
      • Build a streaming app
      • ML train with taxi data
    • Dataproc
    • Dataprep
    • BigTable
    • Cloud Fusion
    • Data flow
    • CloudFront
    • APIGEE
    • BigQuery
    • Cloud logging
    • Pubsub
    • Identity Aware Proxy
    • Data center migration
    • Deployment Manager
    • Kubeflow
    • Kubernetes Engine
    • Istio
    • Read the following
    • Storage for cloud shell
    • kms
    • kpt
    • Hybrid cloud with Anthos
    • helm
    • Architecture
    • terraform
    • Network
    • Data studio
    • Actions
    • Jenkins
  • Data Processing
    • Data Lake
    • Data ingestion
    • Data Cleaning - Deduplication
    • Data Cleaning - Transformation
    • Data cleaning - rule definition
    • ETL
  • Machine Learning
    • Tensorflow
    • Tensorflow tips
    • Keras
    • Scikit-learn
    • Machine learning uses
    • Working with Pytorch
    • Federated learning
  • AWS cloud
    • Billing
    • Decrease volume size of EC2
    • Run CVE search engine
    • DataSync
    • EC2 spot instances
  • Java
    • Java
    • NIO
    • System Design
      • Zero trust framework
    • Collections
  • Azure
    • Enterprise Scale
    • API
    • Resource group
    • Create an sql database
  • UBUNTU
    • No Release file
    • STRATO blockchain
    • iperf
    • Rsync
    • curl
    • Shell
    • FAQ - git
  • PH test
    • Syllabus
    • Opportunities
    • Aptitude test
  • Development
    • Course creation
    • web.dev
    • docfx template
  • npm
  • Docker Desktop
  • Nginx
  • English rules
  • Confluent
  • sanity theme
  • Java Native Interface tutorial
  • Putty
  • Personal website host
  • Google search SEO
  • Reading a textbook
  • DFCC Progress
  • STORAGE
    • Untitled
  • Services Definition
    • Cloud VPN and routing
  • Microservices design and Architecture
    • Untitled
  • Hybrid network architecture
    • Untitled
  • Deployment
    • Untitled
  • Reliability
    • Untitled
  • Security
    • Untitled
  • Maintenance and Monitoring
    • Peering
  • Archive
    • parse dml to markdown
Powered by GitBook
On this page

Was this helpful?

  1. Archive

parse dml to markdown

PreviousArchive

Last updated 4 years ago

Was this helpful?

#!/bin/bash

directory=$PWD
for file in $(find $directory -type f -name 'als.dml' | sort)
do
  echo "$file" >> doc.md
  # remove first two characters in the string
  echo "$(sed -n 22,26p $file)" | cut -c 3- >> doc.md 
  echo "$(sed -n 27,47p $file)" | cut -c 3- >> doc.md
done
exit $?

# ref https://tldp.org/LDP/abs/html/loops1.html

Output

/mnt/f/Repo/P005/sml/systemds/scripts/builtin/als.dml

This script computes an approximate factorization of a low-rank matrix X into two matrices U and V
using different implementations of the Alternating-Least-Squares (ALS) algorithm.
Matrices U and V are computed by minimizing a loss function (with regularization).

INPUT   PARAMETERS:
---------------------------------------------------------------------------------------------
NAME    TYPE     DEFAULT  MEANING
---------------------------------------------------------------------------------------------
X       String   ---      Location to read the input matrix X to be factorized
rank    Int      10       Rank of the factorization
reg     String   "L2"        Regularization: 
                          "L2" = L2 regularization;
                             f (U, V) = 0.5 * sum (W * (U %*% V - X) ^ 2)
                                      + 0.5 * lambda * (sum (U ^ 2) + sum (V ^ 2))
                          "wL2" = weighted L2 regularization
                             f (U, V) = 0.5 * sum (W * (U %*% V - X) ^ 2)
                                      + 0.5 * lambda * (sum (U ^ 2 * row_nonzeros) 
                                      + sum (V ^ 2 * col_nonzeros))
lambda  Double   0.000001 Regularization parameter, no regularization if 0.0
maxi    Int      50       Maximum number of iterations
check   Boolean  TRUE     Check for convergence after every iteration, i.e., updating U and V once
thr     Double   0.0001   Assuming check is set to TRUE, the algorithm stops and convergence is declared 
                          if the decrease in loss in any two consecutive iterations falls below this threshold; 
                          if check is FALSE thr is ignored
---------------------------------------------------------------------------------------------

SystemDS devs work on parser -

.

https://github.com/apache/systemds/commit/0ad0c991904cb6d84d62ce2c2a2d6077b4b3973d
https://github.com/apache/systemds/blob/master/scripts/staging/python_script_generator/Generator-Design%20Document-v2.md
dml file comments
markdown syntax based on dml file comments