Maximum Length of String in Java


When you are creating a String by using StringBuilder to read from a BufferedReader in Java, have you ever wondered if the string will overflow since the length of the file content we are reading may be bigger than String’s maximum length? So what’s the maximum length of String in Java?

In Java, String is stored internally by char[], and it is known that array is indexed by int(you may have noticed that among the methods of String, there only exists charAt(int index), while no charAt(long index)). Therefore, String’s max length is Integer.MAX_VALUE, 2147483647 (231 – 1).

However, you will also need to account for the limitation of heap size on char, which is heap size divided by 2(each char is 2 bytes). And this is not just a theoretical issue. For example, you’ll need 2 GB to store a  String with the length of Integer.MAX_VALUE, while it’s reasonable that some old laptops only has memory of 2 GB. In other words, if a String is very long, you are likely to meet heap overflow exception before the length reachs Integer.MAX_VALUE.

In summary, max length of String in Java is Math.min(Integer.MAX_VALUE, heap size / 2).

 

© 2017 HAOTIAN SHI ALL RIGHTS RESERVED.


Dynamic Programming: top-down vs bottom-up


1. Recap

Dynamic programming is all about ordering your computations in a way that you avoid recalculating duplicate work. You have a main problem (the root of your tree of subproblems), and subproblems (subtrees). The subproblems typically repeat and overlap.

For example, consider your favorite example of Fibonnaci. This is the full tree of subproblems, if we did a naive recursive call:

TOP of the tree
fib(4)
 fib(3)...................... + fib(2)
  fib(2)......... + fib(1)       fib(1)........... + fib(0)
   fib(1) + fib(0)   fib(1)       fib(1)              fib(0)
    fib(1)   fib(0)
BOTTOM of the tree

(In some other rare problems, this tree could be infinite in some branches, representing non-termination, and thus the bottom of the tree may be infinitely large. Furthermore in some problems, you might not know what the full tree looks like ahead of time, and thus you might need a strategy/algorithm to decide which subproblems to reveal.)

2. Memoization(top-down) vs Tabulation(bottom-up)

There are at least two main techniques of dynamic programming, which are not mutually exclusive:

  • Memoization – This is a laissez-faire approach: You assume you have already computed all subproblems. You have no idea the optimal evaluation order. You typically perform a recursive call (or some iterative equivalent) from the root, and either hope you will get close to the optimal evaluation order, or you have a proof that you will get the optimal evaluation order. You ensure that the recursive call never recomputes a subproblem because you cache the results, and thus duplicate sub-trees are not recomputed.
    • example: If you are calculating the Fibonacci sequence fib(100), you would just call this, and it would call fib(100)=fib(99)+fib(98), which would call fib(99)=fib(98)+fib(97), …etc…, which would call fib(2)=fib(1)+fib(0)=1+0=1. Then it would finally resolve fib(3)=fib(2)+fib(1), but it doesn’t need to recalculate fib(2), because we cached it.
    • This starts at the top of the tree, and evaluates the subproblems from the leaves/subtrees back up towards the root.
  • Tabulation – You can also think of dynamic programming as a “table-filling” algorithm (though usually multidimensional, this ‘table’ may have non-Euclidean geometry in very rare cases). This is like memoization but more active, and involves one additional step: You must pick, ahead of time, exactly in which order you will do your computations (this is not to imply the order must be static, but that you have much more flexibility than memoization).
    • example: If doing fibonacci, you choose to calculate the numbers in this order: fib(2),fib(3),fib(4)… caching every value so you can compute the next ones more easily. You can also think of it as filling up a table (another form of caching).
    • I personally do not hear the word ‘tabulation’ a lot, but it’s a very decent term. Some people consider this “dynamic programming”.
    • Before running the algorithm, the programmer considers the whole tree, then writes an algorithm to evaluate the subproblems in a particular order towards the root, generally filling in a table.

(At its most general, in “dynamic programming” I would say the programmer considers the whole tree, then writes an algorithm that implements a strategy for evaluating subproblems (which optimizes whatever properties you want, usually a combination of time-complexity and space-complexity). The strategy must start somewhere, some particular subproblem, and perhaps adapts itself based on the results of those evaluations. In “dynamic programming” in the most general sense, you generally try to cache these subproblems (and more generally, also avoid revisiting subproblems… a subtle distinction perhaps in the case of graphs) in various data structures. Very often these data structures are at their core like arrays or tables. Solutions to subproblems can be thrown away if we don’t need them anymore.)

[Previously this answer made a statement about the top-down vs bottom-up terminology; there are clearly two main approaches called Memoization and Tabulation that may be in bijection with those terms though not entirely. The general term most people use is still “Dynamic Programming” and some people say “Memoization” to refer to that particular subtype of dynamic programming. This answer declines to say which is top-down and bottom-up until the community can find proper references in academic papers. Ultimately it is important to understand the distinction rather than the terminology.]

3. Pros and cons

Ease of coding

Memoization is very easy to code (you can generally* write a “memoizer” annotation or wrapper function that automatically does it for you), and should be your first line of approach. The downside of tabulation is that you have to come up with an ordering.

*(this is actually only easy if you are writing the function yourself, and/or coding in an impure/non-functional programming language… for example if someone already wrote a precompiled fibfunction, it necessarily makes recursive calls to itself, and you can’t magically memoize the function without ensuring those recursive calls call your new memoized function (and not the original unmemoized function))

Recursiveness

Note that both top-down and bottom-up can be implemented with recursion or iterative table-filling, though it may not be natural.

Practical concerns

With memoization, if the tree is very deep (e.g. fib(10^6)), you will run out of stack space, because each delayed computation must be put on the stack, and you will have 10^6 of them.

Optimality

Either approach may not be time-optimal if the order you happen (or try to) visit subproblems is not optimal, specifically if there is more than one way to calculate a subproblem (normally caching would resolve this, but it’s theoretically possible that caching might not in some exotic cases). Memoization will usually add on your time-complexity to your space-complexity (e.g. with tabulation you have more liberty to throw away calculations, like using tabulation with Fib lets you use O(1) space, but memoization with Fib uses O(N) stack space).

Advanced optimizations

If you are also doing a extremely complicated problems, you might have no choice but to do tabulation (or at least take a more active role in steering the memoization where you want it to go). Also if you are in a situation where optimization is absolutely critical and you must optimize, tabulation will allow you to do optimizations which memoization would not otherwise let you do in a sane way. In my humble opinion, in normal software engineering, neither of these two cases ever come up, so I would just just use memoization (“a function which caches its answers”) unless something (such as stack space) makes tabulation necessary.

 

This blog is reprinted from Stack Overflow, 312-605-9488‘s answer!

Original url: (405) 309-5264

Thanks to the original author’s creation of this article!


Fluent Interface


A fluent interface is a method for constructing object oriented APIs that provides more readable code. A fluent interface is normally implemented by using method cascading(method chaining).

On using fluent interface, method names usually represent operations. One method returns an object that executes the next method(operation), thus shaping a chain.

All the operations can be completed in this chain without being splited, and coders can understand the workflow easily by reading this chain.

Example 1


List<Integer> nums = Lists.newArrayList(1,1,null,2,3,4,null,5,6,7,8,9,10);
List<Integer> numsWithoutNull = nums.stream().filter(num -> num != null).
                                collect(() -> new ArrayList<Integer>(),
                                (list, item) -> list.add(item),
                                (list1, list2) -> list1.addAll(list2));

Example 2


Splitter.on('.').trimResults().omitEmptyStrings().split("mark.dual. .line.");

Conversion between ASCII and character in Python


ord()     character -> ASCII

chr()     ASCII -> character

 

Examples:

Idempotent Operation for RESTful service


Idempotence is talked about a lot in the context of “RESTful” web services. From a RESTful service standpoint, for an operation (or service call) to be idempotent, clients can make that same call repeatedly while producing the same result. In other words, making multiple identical requests has the same effect as making a single request. Note that while idempotent operations produce the same result on the server (no side effects), the response itself may not be the same (e.g. a resource’s state may change between requests).

The PUT and DELETE methods are defined to be idempotent. However, there is a caveat on DELETE. The problem with DELETE, which if successful would normally return a 200 (OK) or 204 (No Content), will often return a 404 (Not Found) on subsequent calls, unless the service is configured to “mark” resources for deletion without actually deleting them. However, when the service actually deletes the resource, the next call will not find the resource to delete it and return a 404. However, the state on the server is the same after each DELETE call, but the response is different.

GET, HEAD, OPTIONS and TRACE methods are defined as safe, meaning they are only intended for retrieving data. This makes them idempotent as well since multiple, identical requests will behave the same.

 

This blog is reprinted from REST API Tutorial.

Original url: (253) 347-9636

Thanks to the original author’s creation of this article!


Operations on Binary Search Tree


Binary Search Tree is a sorted tree structure. An inorder traversal of a BST will result in a sorted sequence.

For each node in BST, nodes in its left subtree have lower values, nodes in its right subtree have higher values.

Three operations on BST:

1.search:

if(target < node.val) search left subtree

else if(target > node.val) search right subtree

2.insertion:

search for target, add target to the end of this search

3.deletion:

  • target is leaf -> delete directly
  • target only has one subtree on left / right -> target = target.left / right
  • target has subtrees on both sides -> target = target’s previous node(right most one in left subtree) or target’s next node(left most one in right subtree) then delete the used node  in the corresponding subtree

Merits of doubly linked list compared with singly linked list and array


An example of double linked list issue on leetcode: /leetcode.com/problems/lru-cache/description/

Gavin’s solution about the above issue:

/bitbucket.org/gavin030/leetcode/src/78be975bc0c65f22d0b9528d71070f14f32ea1e5/Java/146.%20LRU%20Cache.java?at=master&fileviewer=file-view-default

array: recording location with index

singly linked list: recording location with a pointer to the next node

doubly linked list: recording location with two pointers, one pre one next

In double linked list, a node can locate itself, that is, knowing about its pre and next without traversing the list. While in array, location is decided by index among all elements(has dependency on all); in singly linked list, a node can only know about its location incompletely since it doesn’t know its previous node.

Thus in conclusion, only double linked list can do delete operation with O(1).

array: get O(1), add / remove O(n)(locating O(1), refreshing indexs O(n))

linked list + map: get / add O(1), remove O(n)(locating O(1), locating previous node for refreshing pointers O(n))

double linked list + map: get / add / remove O(1)

(The time complexity above is only about locating by index or by the node, not locating from the start node.)

Summary of merits:

1.In doubly linked list, we can start searching from a random node in both directions; while in singly linked list, we can only start searching from a random node in the direction from head to tail.

2.A node in doubly linked list can update pointers on itself after delete operation, while a node of singly linked list can not.

3.Doubly linked list can do both pre insertion and next insertion, while singly linked list can only do next insertion.

 

© 2017 HAOTIAN SHI ALL RIGHTS RESERVED.


Useful settings and shortcut keys in Eclipse


1.shortcut keys(for Windows):

  • ctrl + left click on class              open source code of the selected class
  • ctrl + left click on object             two selections: locate the declaration place or open source code of the class of selected object
  • ctrl + ‘/’                                      comment the current selected lines(single line comment)
  • ctrl + shift + ‘/’                            comment the current selected lines(section comment)
  • alt + up/down direction              move current line up/down
  • ctrl + alt  + up/down direction    copy current line up/down
  • syso                                          System.out.println()
  • main                                          main method

2.useful settings:

auto-prompt: Window -> Perferences -> Java -> Editor -> Content Assist -> Auto-Activation, replace content in Auto

activation triggers for Java with “abcdefghijklmnopqrstuvwxyz.”


A comparison between Stack and LinkedList in Java


It is known that Java has a class Stack, which implemented stack data structure. However, during the process of using LinkedList, you would possibly come up with a question: now that stack data structure can be achieved simply using LinkedList, where does Java has Stack class?

Stack is a subclass of Vector. It is an old class, which started from Java 1.0.

According to Java SE7 doc of Stack:

A more complete and consistent set of LIFO stack operations is provided by the Deque interface and its implementations, which should be used in preference to this class.

Deque interface has two common implementing classes:

ArrayDeque, implemented on array; and LinkedList, implemented on link list.

There for, when we need to achieve a stack, Stack is a bad choice. Instead, we should prefer using Deque. To choose ArrayDeque or LinkedList is based on our demand of data structure, which is decided by the main operations on it.

 

© 2017 HAOTIAN SHI ALL RIGHTS RESERVED.

If you want to reprint this blog, please notify the author firstly.


A comparison between Vector and ArrayList in Java


Vector is widely used in C++. While in Java, ArrayList.is the most common choice. But do you know that Java also has Vector class?(This blog doesn’t compare the Vector in C++ and Java)

Vector and ArrayList are roughly equal in view of methods. They both implemented List interface, and they were both established based on array. Below are their differences:

1. Vector is thread safe while ArrayList isn’t. Thus ArrayList has a better time performance.

2. When automatically enlarging size, Vector: newSize = 2 * oldSize, while ArrayList: newSize = 1.5 * oldSize + 1. Thus ArrayList saves memory.

If a thread-safe implementation is not needed, it is recommended to use ArrayList in place of Vector.(cited from JavaSE 8 Vector doc)

In short, normally we use ArrayList instead of Vector in Java.

 

© 2017 HAOTIAN SHI ALL RIGHTS RESERVED.

If you want to reprint this blog, please notify the author firstly.